Record-to-record variability and code-compatible seismic safety-checking with limited number of records

The Italian code requires spectrum compatibility with mean spectrum for a suite of accelerograms selected for time-history analysis. Although these requirements define minimum acceptability criteria, it is likely that code-based non-linear dynamic analysis is going to be done based on limited number of records. Performance-based safety-checking provides formal basis for addressing the record-to-record variability and the epistemic uncertainties due to limited number of records and in the estimation of the seismic hazard curve. “Cloud Analysis” is a non-linear time-history analysis procedure that employs the structural response to un-scaled ground motion records and can be directly implemented in performance-based safety-checking. This paper interprets the code-based provisions in a performance-based key and applies further restrictions to spectrum-compatible record selection aiming to implement Cloud Analysis. It is shown that, by multiplying a closed-form coefficient, code-based safety ratio could be transformed into simplified performance-based safety ratio. It is shown that, as a proof of concept, if the partial safety factors in the code are set to unity, this coefficient is going to be on average slightly larger than unity. The paper provides the basis for propagating the epistemic uncertainties due to limited sample size and in the seismic hazard curve to the performance-based safety ratio both in a rigorous and simplified manner. If epistemic uncertainties are considered, the average code-based safety checking could end up being unconservative with respect to performance-based procedures when the number of records is small. However, it is shown that performance-based safety checking is possible with no extra structural analyses.

of comparison-which lends itself well to code implementation. DCFD considers various sources of uncertainty including those related to the earthquake occurrence and intensity, record-to-record variability, the estimation of the structural capacity for various damage levels, and finally the epistemic uncertainties (e.g., O'Reilly and Calvi 2020). DCFD has been incorporated in the FEMA 350 (2000) guidelines for new steel buildings (a.k.a., "second generation" procedures for performance-based design). As a more rigorous performance-based safety-checking procedure, also the fragility and hazard convolution using numerical integration is considered. Aiming at defining a safety-ratio for safety-checking based on numerical integration, hazard curve for the damage measure, DM, is defined herein. Once inverted at the value of the acceptable probability, the DM hazard curve yields a safety factor analogous to that of DCFD. Moreover, a "robust" estimate of the hazard curve for DM is introduced to formally consider the effect of limited ground motion records on the fragility parameters. Finally, the epistemic uncertainty in the hazard curve is propagated to the level of DM hazard curve.
This works employs three seismic safety checking procedures of different levels of sophistication, namely: (1) the code-based procedure (semi-probabilistic, partially prescriptive); (2) DCFD format (simplified performance-based); (3) numerical integration of fragility and hazard curves (performance-based). It is to mention that the three-level safety-checking comparison has been facilitated to a great extent by adopting a system-level critical demand-tocapacity ratio as the DM. This makes it possible to do both global and component-level safetychecking as recommended by the code. It also provides an integrated probabilistic model of demand over capacity where the correlations between demand and capacity are incorporated. By adopting this DM, the damage measure can be conditioned directly on the seismic intensity. Finally, to be able to make comparisons between code-based and performance-based safety-checking, the code spectrum-compatible record selection criteria have been rendered more restrictive. This is done to make the linear logarithmic regression of DM given seismic intensity (a.k.a. Cloud Analysis) statistically meaningful.
The scope of this work is limited and specific to consideration of record-to-record variability. In fact, we do not consider the issues related to limited knowledge of existing buildings. For a more in-depth consideration of structural modelling uncertainties for Italian existing RC buildings, the reader can refer to Jalayer et al. (2010Jalayer et al. ( , 2011Jalayer et al. ( , 2015,  and Franchin et al. (2010Franchin et al. ( , 2018, O'Reilly and Sullivan (2018). It is assumed that the level of knowledge of the building in question is maximum (i.e., level 3, "comprehensive"). Moreover, we have set the partial safety factors related to the mechanical material properties to unity. This has been done with the purpose of avoiding confusion and double counting of various sources of uncertainty and does not limit the generality of the results.
As it is demonstrated in the road map in Fig. 1, the paper is organized as follows: Sect. 2 provides an overview of the performance-based and code-based methods used for estimating the safety factors. Section 3 provides a proof of concept by performing safety checking for an existing RC moment resisting frame.

Methods
As shown in Fig. 1, this section introduces the DM used for safety-checking (Sect. 2.1), performance-based safety-checking (Sect. 2.2), code-based safety checking (Sect. 2.3), and the comparison between performance-based and code-based safety ratios (Sect. 2.4). The performance-based safety checking (Sect. 2.2) is organized as follows: DCFD safety checking (Sect. 2.2.1), safety-checking using numerical integration and the DM hazard curves (Sect. 2.2.2), Cloud Analysis (Sect. 2.2.3), and building confidence intervals for fragility curves (Sect. 2.2.4). The code-based safety checking section is organized as follows: spectrum-compatible record selection (Sect. 2.3.1) and the code-based safety ratio (Sect. 2.3.2).

Choice of DM: the demand-to-capacity ratio (DCR)
The recent Italian building code (NTC 2018, §7.3.6) asks for safety-checking of structural elements for the ultimate limit states of life safety (SLV) and collapse prevention (SLC). Safety-checking for SLV is specified to be based on resistance, whereas for SLC, the safety checking is specified to be based on ductility. In both cases, the code asks for element-level and global assessment. The critical demand-to-capacity ratio for a prescribed limit state, denoted as DCR LS (Jalayer et al. 2007(Jalayer et al. , 2015 later in this paper we have used the notations DCR LS and DCR interchangeably), lends itself particularly well to code-based safety checking since it provides the possibility of checking the demand to capacity ratio both on the element-level and on the structural-level. In the performance-based terminology, the DCR LS is a damage measure (DM) as it has both the seismic demand and seismic capacity incorporated. Another advantage of adopting DCR LS as the DM is that it has the correlation between seismic demand and capacity embedded. In fact, considering the uncertainties in this ratio, one also considers the uncertainties in both demand and capacity in an integrated manner. DCR LS is formally defined as the demand-to-capacity ratio for the component or mechanism that brings the system closer to the onset of limit state LS (the LS subscript is later dropped in a number of expressions for convenience). Herein, a deformation-based (ductility-based) weakest link formulation is adopted for finding the critical DCR among all structural elements: (1) where N is the number of components (e.g., structural elements, plastic hinges, etc.); D j is the demand evaluated for the jth component; C j (LS) is the capacity for the jth component for the limit state LS. If the structural model incorporates also fragile mechanisms such as the shear failure and/or the loss of load-bearing capacity and the possible interactions between ductile and fragile mechanisms (e.g., the interaction between shear, axial force, and flexure), the deformation-based DM reported above is going to also encompass the resistance-based safety-checking (e.g., checking the deformation at peak strength). It should be noted that the reverse is not true; that is, a resistance-based safety checking is not necessarily going to encompass deformation-based safety checking in the nonlinear range of behavior. As a result, it can be argued that DCR LS (in Eq. 1) is suitable for safety-checking for both SLC and SLV. This is the case for the proof of concept presented hereafter. If the structural model considers only ductile non-linear behavior (i.e., axial forces and flexure), then the DCR formulation can be extended to consider the demand-to-capacity ratios also in resistance terms (see e.g., Jalayer et al. 2007Jalayer et al. , 2015. It is to note that even a global ductility-based measure such as maximum-inter story drift ratio can be considered as a demand to capacity ratio-by normalizing it to the corresponding drift thresholds for the limit state in question (Jalayer et al. 2015; see also Ebrahimian et al. 2013; for an application to aftershocks). Finally, it is to note that adopting DCR as a DM bypasses the need for specifying an engineering demand parameter (EDP) in the PEER performance-based framework equation (Cornell and Krawinkler 2000). In other words, this DM is going to be convoluted directly with the seismic intensity measure (IM) to estimate the seismic risk in the performance-based earthquake engineering framework. Herein, the 5% damped first-mode spectral acceleration (also in line with the NTC 2018 specifications in §3.2.3.6.) denoted by S a (T 1 ) or simply S a is adopted as the IM.

The demand and capacity factor design (DCFD)
Being formulated in an LRFD-like manner, the DCFD safety-checking format compares the seismic demand and capacity in probabilistic terms; the seismic demand is increased to account for the uncertainty in predicting the demand for an acceptable risk level (factored demand) and the seismic capacity is decreased (factored capacity) to consider the uncertainty in predicting the seismic capacity for a given limit state LS (or performance level). The DCFD format is derived from a risk-and performance-based safety-checking criterion for a given limit state: where λ LS is the seismic risk expressed in terms of the mean annual frequency of exceeding a specific limit state LS, and λ o is an acceptable risk level (e.g., 0.0021 or 10% probability of exceedance in reference life V R of 50 years or one over return period of 475 years for the SLV according to NTC 2018 §3.2.1). F D (λ o ) is the factored demand at λ o and F C is the factored capacity (see Appendices A1 and A2).
A revised version of DCFD (Jalayer et al. 2007) is adopted herein, where DM is conditioned directly on IM (see Appendices A1 and A2 for the derivation). The revised DCFD safety checking format is derived into a closed-and analytical form by making a series of assumptions: (1) limit state exceedance can be expressed as a homogenous Poisson process; (2) the hazard curve (the mean annual frequency of exceeding a given seismic intensity level) denoted as λ IM is approximated by a power-law curve (i.e., linear approximation in the logarithmic scale) of the form k o × IM −k ; (3) the damage measure DCR LS conditioned on IM (which is S a herein) is expressed as a lognormal distribution with median DCR|S a = a × (S a ) b and constant logarithmic standard deviation DCR|S a (i.e., the standard deviation of the logarithm); (4) the epistemic uncertainty in the estimation of the seismic hazard and the epistemic uncertainty in the estimation of the damage measure DCR LS can be represented-as a first-order approximation-as lognormal deviations from the median hazard curve and the median damage measure DCR LS given S a curves, respectively (Jalayer and Cornell 2003). It worth noting that, if the curvature of the hazard curve is large (generally not the case in Italian hazard curves), the DCFD might lead to inaccurate results under certain conditions (see e.g., Bradley et al. 2007;Vamvatsikos 2013;Romão et al. 2013;O'Reilly and Monteiro 2019). Nevertheless, it is shown that for mono-curvature concave hazard curves in the logarithmic scale (as is usually the case), the DCFD format provides estimates that are on the safe side . It should also be noted that the linear logarithmic relation specified in point (3) above is generally suitable for deformationbased demand parameters in regular structures with medium to long period. The DCR LS however, can also include other strength-based demands in the original DCFD formulation (see e.g., Jalayer et al. 2007). Otherwise, alternative formulations can be used (e.g., Romão et al. 2013;O'Reilly and Monteiro 2019). Moreover, it should be noted that the range of applicability of the linear logarithmic relation in point (3)  In general, DCFD can consider epistemic uncertainties in the assessment of structural demand and capacity (e.g., Cornell et al. 2002;Jalayer and Cornell 2003;Jalayer et al. 2007Miano et al. 2019). Given that we are focused here on the consideration of record-to-record variability and the uncertainty in the ground motion representation, the revised DCFD format here is expressed neglecting epistemic uncertainties related to the structural model parameters and structural analysis. Nevertheless, DCFD can consider the epistemic uncertainty in the estimation of the seismic hazard curve and the epistemic uncertainty in the estimation of the median damage measure DCR LS given S a due to limited sample size (i.e., small number of ground motion records): where F D is the factored demand; F C is the factored capacity which is equal to unity herein; S o a is the IM corresponding to the acceptable risk level λ o through the (median) hazard curve (Fig. 2a); DCR|S o a is the median DCR corresponding to S o a ; k is the slope of the linear approximation of (median) hazard curve in the logarithmic scale (see Fig. 2a, estimated as the slope of the line fitted in the log scale at spectral acceleration value corresponding to onset of limit state DCR = 1 denoted as S a |DCR=1 = (1∕a) 1∕b , see ; β UH is the epistemic uncertainty in the hazard curve (visualized as the confidence band in Fig. 2a, measured in hazard terms at S a |DCR=1 = (1∕a) 1∕b ; β UDCR is the epistemic uncertainty in the estimation of the median DCR given S a (i.e., the epistemic uncertainty due to limited sample size). The derivation of Eq. (3) is shown in Appendices A1 and A2 of the electronic supplementary material of this manuscript. The difference between S o a extracted from the hazard curve and from the code-based uniform hazard spectrum is discussed in detail later in this paper (Sect. 3.3).
Safety Ratio denoted as SR, which expresses the ratio of F D to F C (note that F C = 1 here), can be interpreted as a probabilistic quantification of the safety margin between system-level demand and capacity. The DCFD format can also be expressed in terms of a safety ratio SR less than unity: A "confidence-based" version of the DCFD safety-checking (Cornell et al. 2002;Jalayer and Cornell 2003;Jalayer 2003) can be presented as the following (see Appendix A2 of the electronic supplementary material of this manuscript for its derivation): where SR x% DCFD is the safety at x% confidence level; K x is the standard Gaussian variate associated with the probability x/100 of not being exceeded (e.g., for x = 84, K x = Ф −1 (x/100) ≅ 1.0) and β UT is the total epistemic uncertainty calculated as: In this context, the safety ratio expressed in Eq. (4) can be interpreted as a mean safety ratio.

The numerical integration: DCR hazard curves
DCFD is a simplified performance-based format which has been proved to provide very close approximations to more rigorous safety-checking based on Eq. (1), where λ LS is obtained through numerical integration (NI):  where P(DCR LS > 1|S a = x) is the probability of exceeding the limit state given seismic intensity-better known as the fragility curve and λ Sa (x) is the mean annual frequency of exceeding intensity level x-better known as the hazard curve. Like the hazard curve for S a , we can also define the hazard curve for DCR: It is clear that λ DCR (1) = λ LS . The concept of DCR hazard curve (Fig. 2b, the blue dashed-dot line) will prove useful for formulating an equivalent DCFD format based on NI: That is, the equivalent factored demand through NI denoted as DCR(λ o ) is obtained by finding the DCR value that corresponds to the acceptable level through the DCR hazard curve (see Fig. 2b). Being defined in terms of DCR, the equivalent factored demand is not necessarily confined by the assumptions that underlie the derivation of DCFD (Jalayer and Cornell 2009). Consequently, we can define the safety ratio based on NI as: The same notions of epistemic uncertainty in the seismic hazard and the epistemic uncertainty in the median DM due to limited sample size can be also reflected in the fragility/hazard integration in Eqs. (7) and (8). The epistemic uncertainty due to limited sample size can be considered by calculating the robust DCR hazard curve and its plus/minus one standard deviation confidence interval (see also Beck and Au 2002;Taflanidis et al. 2008; for updated robust reliability). Like DCFD, also the robust DCR hazard curve can consider the epistemic uncertainty in the seismic hazard curve as a unit-median lognormal deviation from the median denoted by ε UH with logarithmic standard deviation equal to β UH . As a result, the robust DCR hazard considers the uncertainty in the fragility model parameters due to limited sample size and the epistemic uncertainty in the median seismic hazard curve: and, where E[·] is the expected value operator; χ is the vector of fragility parameters (χ is a realization of vector χ); D (standing for data) is the set of (S a , DCR LS ) pairs obtained for a selection of ground motion records; f(χ|D) can be derived by employing a non-linear UH dynamic analysis procedure (see next section and Appendices A4 and A5 in supplementary material file). Please see Appendix A3 of the electronic supplementary material of this manuscript for the derivation of Eqs. (11) and (12). Therefore, the DCR hazard curve with x% confidence can be calculated as: where η(.) is the median robust DCR hazard curve (i.e., the exponential of the expected value of the logarithm of DCR hazard curve, shown in Fig. 2b, thin solid black line) and β(·) is the logarithmic standard deviation in the robust DCR hazard curve (i.e., the square root of the variance in Eq. 12). In case the epistemic uncertainties are considered, the safety ratio corresponding to confidence level x% can be calculated as: where x% DCR LS is the DCR hazard corresponding to probability x/100 of not being exceeded from Eq. (13). Figure 2b demonstrates 84%

DCR LS
in thick solid lines. The figure also shows the points SR 50% NJ = 1.15 and SR 84% NJ =1.40 corresponding to an acceptable risk level equal to λ o = 0.00075.

Implementing performance-based safety checking using recorded un-scaled ground motion
As mentioned before, the nonlinear dynamic analyses provide suitable means for performance-based safety-checking. With specific focus on un-scaled real ground motion records, the simple "Cloud Analysis" (CA) is a non-linear dynamic analysis procedure that lends itself reasonably well to performance-based safety-checking for low-to midrise regular moment resisting frames (Shome and Cornell 1999). More specifically, the conditional median DCR|S a = a(S a ) b (a line in the log scale, the thick line in Fig. 3) and constant standard deviation of the log denoted as DCR|S a (the standard deviation of normal probability density function, PDF, of the regression residuals in the logarithmic scale also known as the standard error of regression; see the standard deviation of the solid orange PDF in Fig. 3) can be derived by subjecting the pairs of (S a , DCR LS ) calculated for selected un-scaled ground motion records to linear regression in the logarithmic scale (e.g., Elefante et al. 2010;Jalayer et al. 2015): where N is the number of records, lna and b (the slope in log scale, Fig. 3) are coefficients of logarithmic linear regression. The structural fragility using CA can be expressed as (see : where Φ(·) is the standard normal cumulative distribution function. Previous works of the authors have laid out criteria for record selection for CA Miano et al. 2018;Ebrahimian and Jalayer 2020). In summary, these criteria ensure that: both sides of DCR LS = 1 are populated with ground motion records for the limit state of interest (to avoid extrapolation), the ground motion records span a wide range of intensities (to improve the estimate of regression slope in log scale), and that a limited number of records are selected from the same earthquake (to reduce correlation between CA data). It turns out that careful record selection for CA for the ultimate limit states may often lead to selection of collapse-inducing ground motion. A modified version of the CA is introduced in , referred to as Modified Cloud Analysis (MCA), which considers explicitly the structural response to collapse-inducing records. Herein, we focus on the original simple CA which does not consider the so-called "collapse cases". This is because the code's provisions seem to exclude such cases as they recommend taking the average value; something that would not be possible in the presence of collapse cases. Focusing on non-collapse-inducing records, we expect the simple CA to work reasonably well for safety-checking purposes.
A "rough" estimate of the epistemic uncertainty in the median DM due to limited number of records can be obtained as a function of DCR|S a (e.g., Jalayer and Cornell 2003): where N is the number of records, and the β UDCR is the standard deviation for the log of the mean (median) used herein as a proxy for epistemic uncertainty due to limited sample size. The PDF of the median DCR given S a is shown in orange dashed line in Fig. 3. The vector of fragility parameters χ presented in the previous section for calculating the robust DCR hazard curve, is equal to [lna, b, β DCR|Sa ] for CA (for derivation of f(χ|D) see the Appendices A4 and A5 in supplementary metarial and Box and Tiao 1992).

Building confidence interval for fragility: the robust fragility
An alternative way for estimating the equivalent lognormal statistics and equivalent epistemic uncertainty statistics is to derive them "visually" from the robust fragility curve ). The robust fragility curve and its confidence interval, which are derived through explicit consideration of the uncertainties in fragility model parameters, are described in detail in previous works of authors (Jalayer et al. 2015 Figure 4 shows how the equivalent lognormal statistics median and logarithmic standard deviation can be read as the values corresponding to 0.50 probability and half (log) distance between 0.84 and 0.16 probabilities, respectively: where DCR|S a is half of the loagrithmic distance (in terms of S a ) between the 84th and 16th percentile S a values at DCR = 1 (see Fig. 4) multiplied by coefficient b (the slope of the regression line in log scale). Moreover, the epistemic uncertainty denoted as β UF can be read as half the (log) horizontal distance between the 16% and 84% robust fragility curves =0.34g Robust fragility 50% Confidence interval 16%-84% Sample fragilities Median hazard curve (a) (b) Fig. 4 Reading equivalent lognormal statistics from the robust fragility curve and its plus/minus one standard deviation interval: a N = 40 ground motion records; b N = 7 ground motion records. The fragility shown in light grey colour correspond to various realizations χ of vector χ at 50% probability (median). Figure 4 shows the robust fragility curve and its 16%-84% confidence interval for a suite of code-spectrum compatible N = 40 ( Fig. 4a) and N = 7 ( Fig. 4b) ground motion records. It is interesting to note that the 16%-84% confidence interval reduces significantly in the case of N = 40 ( Fig. 4a) respect to the case of N = 7 (Fig. 4b). β UF is roughly comparable to β UDCR /b. However, β UF represents the overall effect of uncertainty in all the fragility curve parmeters (including the standard error of regression DCR|S a ). This is while β UDCR roughly represents the epistemic uncertainty in the median DCR given S a . Therefore, it is expected that b × β UF be larger than β UDCR . In the proof of concept section, we have reported both β UDCR and b × β UF . Although we have used β UDCR for DCFD safety checking (due to its simplicity), it is to keep in mind that β UF captures the uncertainty in all the fragility parameters. It is to mention that the numerical inegration procedure herein uses the robust DCR hazards curves directly and does not employ b × β UF .

Code-based safety-checking: The NTC 2018 procedure
The Italian Building Code (NTC 2018) envisions the use of non-linear time-history analysis based on recorded, synthetic, and simulated ground motion.

Spectrum-compatible record selection
With specific reference to recorded ground motions, the following conditions should be respected in the selection of the ground motion time-histories (NTC 2018, §3.2.3.6): 1. Use of at least 7 ground motion record groups (i.e., two horizontal components and one vertical for spatial structural models) if the mean response is going to be used. 2. The mean elastic spectrum (5% damping) of the selected records should be larger than 90% of the code design spectrum and should not exceed 130% of the code design spectrum in the period range of interest interpreted as [0.15 s, max(2T 1 , 2 s)], where T 1 is the fundamental structural period. 3. Use of recorded ground motion is permitted only if these records are compatible with the site's seismicity and the potential seismogenic source characteristics. 4. The code design spectrum is specified in NTC ( §3.2.3.2). It should be noted that the anchor point for the elastic spectrum's horizontal component on rock (soil type A) is equal to the PGA associated with the hazard level of interest from national hazard maps for the site of interest ).
Once the ground motion records are selected (at least seven), the mean response to the set of selected records is used for safety-checking of the structure (NTC 2018, §7.3.5). Figure 5 shows a set of spectrum-compatible records for N = 40 and N = 7 ground motion records for life safety limit state (SLV, see Sect. 3.2 for more details). The anchor PGA for NTC design spectrum, after being amplified by the soil factor, is shown in the Fig. 5a, b.

The code-based safety ratio
Based on the requirements summarized in the previous section and the DM adopted herein, the code-based safety ratio, SR Code , can be interpreted and expressed as following: where DCR LS is the mean response expressed in terms of the mean critical demand-tocapacity ratio calculated based on code-compatible non-linear time-history analyses.

Ratio between performance-based and code-based safety ratios
Assuming that DCR is lognormal, it can be shown that the following relationship between the mean and median for DCR holds (see also D'Ambrisi and Mezzi 2005; the LS subscript has been dropped for simplicity): where η DCR is the median and β DCR is the standard deviation of the natural logarithm of DCR values (β DCR is reported on Fig. 3) calculated for code-compatible non-linear dynamic analyses. Based on the underlying assumptions of the linear logarithmic regression, it can be shown that the following relationship holds for the set of records: where S a is the median spectral acceleration value for the set of selected records. Since the code-enforced spectrum compatibility is expressed in terms of mean, the median of S a values can be expressed as a function of the mean value (assuming lognormality):  where S a is the standard deviation of the logarithm of the set of records, and δ is the deviation of the mean spectrum for the suite of records from the code spectrum following NTC 2018 specifications (see Fig. 5c). In the absence of information, δ can be modelled as a uniform variable in the range between 0.9 and 1.3; otherwise, it can be modelled as a truncated lognormal distribution based on available information. Therefore, η DCR in Eq. (21) can be expressed as: Substituting in Eq. (19), we can have a closed-form estimate for code-based safety ratio, SR Code,CF , as follows: where the subscript CF indicates "closed form". Now we can derive the ratio of DCFDbased and code-based safety ratios for confidence level x% denoted as α x : The close-form relationship obtained in Eq. (25) can be useful for predicting the ratio of performance-based DCFD to code-based safety checking. For example, assuming some typical values, x = 50 (K x = 0), k = 2.5, b≃1, DCR|S a = 0.3, β DCR ≃ S a , δ ≃ 1.10, α x is going to be equal to 1.02. It is clear that depending on the level of confidence x%, the value of total dispersion due to epistemic uncertainty β UT (Eq. 5), the slope b (Fig. 3), the slope k (Fig. 2a), the deviation from the code design spectrum δ (Fig. 5c), S a the dispersion in the spectral acceleration S a (Fig. 3), β DCR the dispersion in DCR (Fig. 3), and DCR|S a the conditional dispersion in DCR given Sa (Fig. 3), the ratio α x is going to shift from values larger than unity (suggesting that code is un-conservative), to values smaller than unity (suggesting that code is conservative). It is important to mention that we have set the partial safety coefficients and the confidence factor to unity. The closedform safety-ratio in Eq. (24) indicates that, following the code provisions, a performance-based safety assessment is within reach. More specifically, this closed-form ratio can be used (1) as a corrective coefficient (in Eq. 25) to be applied to code-based safety ratio; (2) to show that the semi-prescriptive code-based procedure has a certain synergy with a performance-based safety checking procedure such as DCFD; (3) the closed-form code safety ratio (Eq. 24) can be calculated even in the presence of the so-called "collapse cases" (as opposed to the average value in Eq. 19).
Alternatively, the ratio between safety-ratios obtained based on NI and the codebased procedure is:

Proof of concept
As a proof of concept, we perform safety-checking for an intermediate frame of an existing school building in Avellino (Campania Region, Italy), in the zone damaged by the Irpinia 1980 earthquake. As methods of safety-checking with increasing level of sophistication, we consider: (1) safety-checking based on NTC 2018 nonlinear time-history analysis; (2) DCFD safety-checking; (3) the full numerical integration (NI). As mentioned in Sect. 2, methods (2) and (3) are performance-based. Two different levels of confidence are considered for performance-based safety-checking: (a) x = 50%, K x = 0, β UDCR and β UH = 0, no epistemic uncertainties; (b) x = 84%, K x = 1, the epistemic uncertainties are considered. We have considered the same number of un-scaled recorded ground motion records for the three safety-checking procedures mentioned above: namely, N = 7, 10, 12. Spectrum-compatibility is ensured with NTC 2018's horizontal elastic design spectrum. It is important to mention that we have considered that there are no collapse-inducing records in the selected suite of records. As mentioned before, NTC 2018 recommends using the mean value of the structural response to the set of spectrum-compatible records (see also Eq. 19). This implies that the collapse-inducing records should not be present. That is, the mean value cannot be calculated if even one collapse-inducing record is selected. Given the focus on record-to-record variability, we have not considered the uncertainties in structural modelling parameters herein. Previous manuscripts (Jalayer and Ebrahimian 2020; Ebrahimian and Jalayer 2020) provide a detailed study of structural modelling uncertainties for the same structure.

The structural model
A complete description of the structural model used in this study is reported in  and Ebrahimian and Jalayer (2020). The case-study frame belongs to a four-storey school building located in Avellino, Italy. The average shear wave velocity of the upper 30 m, V S30 , at the building site is calculated around 470 m/s. Hence, the structure lies on soil type B (according to NTC 2018 site classification, 360 m/s < V S30 < 800 m/s). This building, which is built in 1960s, can be classified as an older RC frame designed only for gravity loading. Although, the school building has undergone various seismic upgrading operations such as steel jacketing of columns, we have modelled it in its original (non-upgraded) configuration. The frame in question is modelled considering the nonlinear behaviour using the software OpenSees (version 2.5.0, http://opens ees.berke ley.edu).
Since it is an intermediate frame, we have not considered the internal walls as structural elements. The plasticity in beam and column elements is modelled using the force-based beam-column element called "Beam with Hinges element" (Scott and Fenves 2006). The locations and weights of the element integration points are based on so-called plastic hinge integration, which allows the user to specify plastic hinge lengths at the element ends. The nonlinear behaviour is modelled using the Pinching4 material, which allows modelling of four points on the monotonic backbone curve; namely, cracking, yielding, spalling and ultimate. The monotonic backbone behaviour in the member ends is obtained by superposition of flexural-normal behaviour, shear behaviour and fixed end rotation due to bar slip as springs in series (see Fig. 6). The spring capturing the flexural-normal backbone behaviour is obtained by pre-processing the RC section as a fibre section. This permits explicit modelling of confinement and the consideration of constitutive behaviour of both steel rebars and the concrete. As the concrete monotonic compressive stress-strain relationship, the constitutive relations proposed by Mander et al. (1988) are used. The tensile strength of concrete is not considered in the constitutive relation. The concrete confined strength is calculated according to the Chang and Mander (1994) model, where the confinement effectiveness of the transverse reinforcement is modelled according to the procedure described in Moehle (2015). The monotonic stress-strain curve of the steel rebar with strain-hardening is modelled according to the relationships in Mander et al. (1988) and Chang and Mander (1994). The buckling of steel rebar is considered through the formulation of Dhakal and Maekawa (2002). The shear backbone is a piecewise linear lateral load-shear displacement envelope estimated at certain critical stages (see Fig. 6d, e). This envelope captures the first shear cracking, maximum shear strength, and onset of strength degradation. The shear cracking which signals the initiation of diagonal cracks is modelled according to Sezen (2008). The maximum shear strength of a RC member is calculated according to the model by Biskinis et al. (2004) (see also Eurocode 8, Part 3, 2005). According to the Biskinis model, the shear resistance decreases as a function of the plastic part of ductility demand. The displacement at the peak strength for columns is calculated as the maximum value between the following models: (1) elastic theory with the effective shear modulus at maximum strength approximated as one half the elastic value (Elwood and Eberhard 2009); (2) displacement at shear strength according to Sezen (2008); (3) member displacement at yield rotation according to Eurocode 8 part 3 (2005). For beams, the displacement at peak shear strength is calculated as the maximum between models (1) and (2) described above. The shear displacement at the onset of shear strength degradation is determined using the model proposed by Gerin and Adebar (2004). The fixed-end rotation due to bond slip is modelled based on an analytical macro-level approach using a stepped bond-stress model (Sezen and Setzler 2008). The average bond value for smooth bars is calculated based on CEB-FIP (2010) which suggests a value of 0.15(f′ c ) 1/2 MPa to 0.30(f′ c ) 1/2 MPa depending on the quality of different bond conditions (see also Arani et al. 2014). We have assumed a uniform bond stress of 0.30(f′ c ) 1/2 MPa in the elastic range, and a uniform bound stress of 0.15(f′ c ) 1/2 MPa for the portion of bar in which the yield strain is exceeded. The total lateral deformation at axial failure (total loss of load bearing capacity for the member) for columns is calculated as the maximum between: (1) the axial capacity model by Elwood and Moehle (2005a); (2) the total displacement at peak strength plus the total post-capping displacement calculated according to the relationship furnished in (Haselton et al. 2008). The ultimate total lateral deformation for beams is calculated according to model (2) discussed right above.
To model the interaction between shear and flexural behaviour after flexural yielding, two limit state functions are considered. These limit state function aim to find the onset of strength degradation due to shear failure. The first limit state function is strength-based and considers the shear capacity degradation due to plastic rotation according to the shear strength formula of Biskinis et al. (2004) drawn as a red solid line in Fig. 6d, e. The second limit state function is deformation-based and considers the onset of shear failure based on data available from columns that have already experience flexure-driven yielding (Elwood and Moehle 2005b, the thin red dashed-dot lines labelled as EM2005 in Fig. 6d, e). However, in our case, the strength-based limit state governs; the deformation-based model seems (as it can be seen in the figure) to offer un-conservative estimates of the onset of shear failure deformation. It is important to note that the structural modelling approach adopted herein, which incorporates the code-based criteria for strength-based safety checking for fragile mechanisms such as shear failure, enables fully deformation-based safetychecking. In other words, the DCR LS formulation (see Eq. 1) adopted herein performs both ductility-based and strength-based safety checking based on code criteria. We did not model the brittle shear failure in the joint panel zone herein. Nevertheless, the code's strength-based safety-checking for the panel zone can be considered (as post-processing) in the DCR LS formulation (not done here). Safety-checking herein is done for the life safety limit state (SLV) according to NTC 2018, whose onset is defined on the member force-deformation curve as point (the magenta squares, Fig. 6b, c) with a deformation equal to 3/4th of that of the onset of collapse prevention limit state (SLC). The SLC limit state for ductile failure mechanisms is defined herein as the point (the red triangles, Fig. 6b, c) on the lateral force-deformation relationship in which a 20% drop in resistance has taken place. This condition is demonstrated on Fig. 6b, in which the force-deformation curve for the flexure-driven Column 12 is demonstrated. However, for shear-critical members (as demonstrated in Fig. 6c, Column 2), the onset SLC is defined as the intersection of the shear-resistance limit state (Biskinis et al. 2004) and the member back-bone curve.

The pool of ground motion records
We have selected a large ground motion set of 160 records from NGA West2 Database (Ancheta et al. 2014), ITACA (Italian Accelerometric Archive, http://itaca .mi. ingv.it/ last accessed 03/2019), and recent Iranian recordings (International Institute of Earthquake Engineering, IIEES, personal communication). More details about the suite of records is reported separately in Appendix A8 of supplementary electronic material of this manuscript. We have followed the record selection criteria for Cloud Analysis (see Sect. 2.2.3). The suite of records turns out (without enforcing it) to be more-or-less spectrum-compatible for the SLV (Fig. 7a). The selected records have 180 m/s < V S30 < 720 m/s (corresponds to NTC soil types B and C), moment magnitude greater than 5 (no limits on the source-to-site distance is considered). The records correspond to crustal focal mechanisms (reverse, strike-slip and normal faulting styles) and no more than 6 recordings from the same earthquake events have been chosen. We have been particularly careful in maximizing the dispersion around the mean spectrum (as one of the criteria for record selection for CA). It is to note that the code design spectra are derived for Class III (schools). Out of the original 160 records selected, 68 of them have been identified as "collapse-inducing". That is, they have led to one of the following: (1) axial failure in more than half of the columns in one floor (Galanis and Moehle 2015); (2) numerical non-convergence and/or DCR value larger than 10; (3) maximum inter-story drift value larger than 10% (accounting for global dynamic instability). This means that 92 records are non-collapse-inducing. Figure 7b shows the mean spectrum for the 92 non-collapse-inducing records (these 92 records are highlighted in the list of 160 records in Appendix A8 of the electronic supplementary material of this manuscript). This set is not spectrum compatible as it does not remain between pre-established 90% and 130% intervals in the prescribed values of period. Later, for comparison, we have selected a subset of 40 spectrum-compatible records (see Fig. 5a).

The bootstrap procedure
We have used a bootstrap procedure (Efron and Tibshirani 1993) to sample suites of N ≥ 7 ground motion records from the pool of 92 non-collapse inducing records (see Fig. 7b). The bootstrap method relies on sampling with replacement. The following criteria are used for accepting a bootstrap sample of N records: (1) the records need to be spectrum-compatible according to NTC 2018 requirement described in the previous section for SLV. (2) The ratio β DCR|Sa /β DCR is checked to be smaller than unity. This is roughly equivalent to checking the R 2 (coefficient of determination) of the linear logarithmic regression to be greater than zero. A positive R 2 coefficient indicates that the regression manages to fulfil its primary objective of variance reduction. (3) The slope of the linear logarithmic regression is checked to be positive (b > 0). (4) At least one records needs to lead to DCR LS greater than unity and at least one record needs to lead to DCR LS less than unity. This last criterion ensures that both sides of DCR LS = 1 are populated with data (i.e., no extrapolation needed). Essentially, we have integrated the code spectrum compatibility criteria with those necessary for having a statistically meaningful log regression of DCR LS versus IM. Since the Italian code asks for the mean spectrum (and not the individual records) to respect spectrum compatibility, the spectrum-compatibility is expected to mainly affect the mean intensity of the records across the window of periods (and not the record-to-record variability). Figure 5c shows the mean spectra for n = 500 bootstrap extractions of N = 7 ground motion records compatible with the code design spectrum in the period range [0.15 s, max(2T 1 , 2 s)] = [0.15 s, 2 s] (where T 1 = 0.84 s, referring to the first-mode period of the moment-resisting frame shown in Fig. 6). It should be noted that, according to the code requirements, the code design spectrum is anchored to PGA value corresponding to the allowable probability λ o from the national seismic hazard curves for the site of the study on rock (before being amplified by the site amplification factors; see Ebrahimian et al. 2014Ebrahimian et al. , 2019. Figure 8a shows the uniform hazard spectrum (UHS) for allowable probability λ o extracted from the national seismic hazard curves (multiplied by the site amplification factors and referred to as "INGV Spectrum" ) for the building site in the period range [0.15 s, 2 s]. The value of S o, Code a (T 1 ) = 0.40 g shown in Fig. 8a is read from the code design spectrum (NTC spectrum), where λ o,Code = 0.0014 (i.e., 10% probability of exceedance in 75 year lifetime). However, it can be seen from the hazard curve in Fig. 8b that the value S o, Code a (T 1 ) = 0.40 g corresponds to λ′ o = 0.00075 from the site-specific hazard curve. Vice versa, the corresponding spectral acceleration from the UHS (i.e., exceeded with mean annual rate equal to λ o,Code = 0.0014) is equal to S o, UHS a (T 1 ) = 0.31 g. To be able to compare the code-based and performance-based procedure, we used the same spectral acceleration value as S o, Code a (T 1 ) = 0.40 g also for performance-based safety-checking. This means that the allowable probability level has been decreased to λ′ o = 0.00075 for performance-based safety-checking (DCFD and NI).

The results
The results of safety-checking are presented in this section. In Sect. 3.4.1, we discuss the base case with N = 7 in more detail. Similar results for N = 10 and N = 12 discussed later in Sect. 3.4.2. and are reported in detail in Appendices A6 and A7 of the electronic supplementary material.  . 9 a Histogram of S a , the standard deviation (of the log) of S a ; b histogram of β DCR , the standard deviation (of the log) of DCR c histogram of DCR|S a , the standard deviation (of the log) of DCR given S a ; d histogram of DCR , the standard deviation (of the log) of median DCR (a proxy for the epistemic uncertainty due to limited sample size); e histogram of β UH , the standard deviation (of the log) of the hazard curve (the epistemic uncertainty in the hazard curve); f histogram of β UT , a proxy for the total epistemic uncertainty due to limited sample size and the uncertainty in the hazard curve; g histogram of b, the slope of the regression line in the log scale; h histogram of k, the slope of the hazard curve in the log scale. The red star shows the same results for a set of N = 40 spectrum-compatible records analysis with n = 500 samples. The red stars on the diagrams are the values calculated for a suite of 40 spectrum-compatible records (see Fig. 5a). Note that the pool of N = 92 non-collapse inducing records are not spectrum compatible (see Fig. 7b) and therefore not suitable for comparison. Figure 9a, b shows the histograms of S a , the logarithmic standard deviation in S a , and β DCR , the standard deviation of the DCR, respectively. It can be observed that they have quite similar distributions, following a lognormal trend with a central value roughly around 50%. The histogram in Fig. 9c shows the distribution of DCR|S a , the conditional standard deviation of DCR given S a (a.k.a., the standard error of the regression). We can see that DCR|S a has a bell-shaped distribution with mean/median around 30% (a reasonable value for Cloud Analysis). The decrease of marginal standard deviation value (β DCR ) from, on average, around 50% to 30% for the conditional standard deviation DCR|S a is to be expected; given the role of regression in variance-reduction. The histogram in Fig. 8d shows the distribution of β UDCR (which is related to DCR|S a through Eq. 17) with mean/median equal to 10%. In fact, this dispersion value provides a rough measure of the standard deviation of the median DCR given S a and serves as a proxy (in DCFD format) for measuring the epistemic uncertainty due to limited sample size. Figure 8e shows the histogram of β UH , which measures the epistemic uncertainty in the hazard curve. The histogram has median/mean around 20%. It should be noted that β UH measures the epistemic uncertainty in the hazard curve and is extracted from the 84% and 16% hazard curves reported in the INGV hazard curve ). The value of β UH is measured at the median spectral acceleration at the onset of limit state DCR LS = 1 ( S a |DCR=1 , see Fig. 2a). This explains why different bootstrap samples register different values of β UH . Figure 9f shows the distribution of β UT which is calculated from Eq. (6) and reflects the total epistemic uncertainty due to limited sample size and the uncertainty in the hazard curve. It can be observed that β UT has a central value around 13%. The histogram in Fig. 9g shows the distribution of the b value, the slope of the regression line in the log scale. This slope parameter has a mean/median value equal to 0.80 (note that a value of b = 1 indicates equal displacement rule). Finally, Fig. 9h shows the distribution of k, which is the slope of the hazard curve in the log scale. k has been calculated as the slope of the tangent line at the median spectral acceleration at the onset of limit state DCR LS = 1 ( S a |DCR=1 , see Fig. 2a). Therefore, it registers some slight variations. The median value is equal to 2.40. Figure 10 shows the safety ratios for the base case. For each plot, the 5th (p05) and 95th (p95) percentiles, the median (p50), the mean, and the logarithmic standard deviation (β) are reported. Like the previous figure, the values calculated for a suite of 40 spectrumcompatible records (see Fig. 5a) are shown as red stars on the diagrams. It can be observed that the safety ratios are all greater than unity and the structure does not verify for the SLV limit state. The histogram in Fig. 10a shows the safety ratio based on code from Eq. (19); i.e., the mean value. The mean safety ratio is equal to 1.35 and the 5th and 95th confidence interval spans between 1.2 and 1.5. The histogram in Fig. 10b shows the code safety ratio from the closed-form formula in Eq. (24) with the deviation δ randomized (uniform distribution between 0.9 and 0.13, assuming no prior information, shown in Figure 13e). The mean safety ratio is equal to 1.34 and the 5th and 95th confidence interval spans between 1.1 and 1.60. Note that the dispersion increases with respect to the histogram on the left because δ is randomized and its real dispersion is smaller than that of a uniform distribution (see Figs. 5c,13e). Figure 10c shows the safety ratio based on DCFD with 50% confidence (Eq. 4, no epistemic uncertainties). The DCFD safety ratio varies between the 5th percentile equal to 1.17 to 95th percentile equal to 1.66 with the mean value being equal to 1.40. The histogram in Fig. 10d shows the safety ratio based on NI with 50% confidence (Eq. 14, no epistemic uncertainties). This histogram reports safety ratios very close to those obtained based on DCFD. The safety ratios obtained for the set of 40 spectrumcompatible records based on DCFD and NI are equal to 1.28 and 1.26, respectively (lower than the mean values obtained based on N = 7). The bottom row in Fig. 10 shows the histograms for the safety ratio with 84% confidence (with epistemic uncertainties) based on DCFD (from Eq. 5, Fig. 10e) and NI (Eq. 14, Fig. 10f). The results based on NI are much more scattered and have a 5th and 95th percentiles interval spanning between 1.41 and 2.90 (the mean safety ratio is equal to 2). However, DCFD has a narrower interval 5th and 95th percentiles interval between 1.33 and 1.93 (the mean safety ratio is equal to 1.60). It is very interesting that the results of safety checking with confidence 84% based on the 40 spectrum-compatible records for DCFD and NI are equal (SR = 1.38). This shows that for very small number of records, DCFD might underestimate the epistemic uncertainties due to limited sample size with respect to NI. However, this effect is reduced considerably when the number of records is larger (N = 40).

The base case
We can see this more clearly in Fig. 11 by looking at the 84% robust DCR hazard for N = 40 (Fig. 11a) and N = 7 (Fig. 11b) obtained based on DCFD (dashed red line, from Equation A12 electronic supplementary material) and NI curves (thick black solid line, Eq. 13). For N = 7, the 84% DCR hazard curve based on NI is larger than that obtained based on DCFD's simplified procedure. The SR based on NI is 1.40 (see also Fig. 2b), while it is equal to 1.31 based of DCFD (Fig. 11b). However, the two curves are almost identical for N = 40 (SR = 1.38).  Fig. 10 The safety ratios; a code safety ratio SR Code from Eq. (19); b code safety ratio SR Code,CF from the closed-form expression in Eq. (24) based on δ randomized as a uniform distribution for the suite of records, c 50% confidence DCFD safety ratio SR DCFD from Eq. (4); d 50% confidence safety ratio SR NI based on NI from Eq. (10); e 84% confidence DCFD safety ratio from Eq. (5); f 84% confidence safety ratio based on NI from Eq. (14). The red star shows the same results for a set of N = 40 spectrum compatible records 1 3 The histogram in Fig. 12a shows α 50% , which is the ratio of 50% confidence safety ratio based on DCFD (from Eq. 4) to the code-based safety ratio from the closed form formula in Eq. (25) with the deviation δ randomized. The histogram in Fig. 12c shows the same ratio but with the code safety ratio from Eq. (19), i.e., the mean. It is interesting that both histograms essentially convey the same message. That is, a normal distribution centred around unity (to be exact 1.03-1.06) with the 5th and 95th confidence interval spanning between 0.83-0.86 (code conservative) and 1.26-1.28 (code unconservative). This confirms the intuition from Sect. 2 (below Eq. 25) that assigning certain typical values for various parameters led to α 50% very close to unity (1.02). The histogram in Fig. 12b shows α 84% , the ratio between safety ratios based on DCFD with 84% confidence (from Eq. 5) and the closed-form code safety ratio based on Eq. (25). It is to note that the closed-form version of the code safety ratio (Eq. 24) can be calculated even in the presence of the so-called "collapse cases". The histogram in Fig. 12d essentially shows the same quantity as Fig. 12b   Fig. 11 The robust DCR hazard curve based on a N = 40 records, and b N = 7 records. The 50% DCR hazard curve based on DCFD (narrow orange dotted line) has a closed-and analytic form (reported in Equation A6, Electronic supplementary material). The 84% DCR hazard curve based on DCFD (thick dashed red line) also has a closed-form form (reported in Equation  A12, Electronic supplementary material). The 50% (thick blue dashed-dot line, from Eq. (10) and 84% (thick solid black line, from Eq. 14) robust DCR hazard curves are reported in the figures with the difference that the code safety ratio is calculated from mean in Eq. (19). As it can be expected, at 84% confidence, the DCFD results more conservative (the mean value for α 84% is 1.20) with respect to the code (without the partial safety factors). The 5th and 95th confidence interval for both histograms are very close and vary between 0.95 and 1.5. The two histograms in the bottom row of Fig. 12 clearly indicate that while DCFD and NI lead to the same results for 50% confidence (Fig. 12e), they lead to different results for 84% confidence level. It looks like DCFD cannot fully capture the uncertainties due to limited sample size (the epistemic uncertainty in the hazard curve is propagated in the same manner in both methods), when compared to NI (Fig. 12f). The histogram in Fig. 12e shows the ratio between the two performance-based procedures: namely, DCFD and NI for 50% confidence level (no epistemic uncertainties). It is known that for mono-curvature concave seismic hazard curves in the log scale (usually the case with the Italian hazard curves), the DCFD safety-checking is always conservative ). This is confirmed by this histogram which is basically equal to unity with a very small skew totally to the right of unity. The histogram shows a very small deviation from unity which is good news for DCFD as an analytic safety checking format based on a set of simplifying assumptions (described in Sect. 2). The histogram in Fig. 12f shows the same ratio but for 84% confidence (epistemic uncertainties are considered). The histogram totally skews to the left of  Fig. 12 The ratio of safety ratios: a α 50% , the ratio of DCFD safety ratio with 50% confidence and the code safety ratio SR Code,CF from the closed-form expression in Eq. (25) based on δ randomized (uniform distribution); b α 84% , the ratio of DCFD safety ratio with 84% confidence and the code safety ratio SR Code,CF from the closed-form in Eq. (25) based on δ randomized (uniform distribution); c ratio of DCFD safety ratio with 50% confidence from Eq. (4) and the code safety ratio SR Code from Eq. (19); d ratio of DCFD safety ratio with 84% confidence from Eq. (5) and the code safety ratio SR Code from Eq. (19); e ratio of DCFD safety ratio with 50% confidence from Eq. (4) and the safety ratio based on NI, SR NI , from Eq. (10) with 50% confidence; f ratio of DCFD safety ratio with 84% confidence from Eq. (5) and the safety ratio SR NI based on NI from Eq. (14) with 84% confidence 1 3 unity and significantly underestimates the uncertainty due to limited sample size. This is to be expected since the DCFD considers the uncertainty due to limited sample size as uncertainty in only the median DCR given S a ; this is while the safety ratio based on NI from Eq. (14) considers the uncertainty in all the fragility parameters and not just the median. We presume that this difference is significant when the number of records is small (e.g., N = 7 here). We expect that the difference between the DCFD and NI safety ratios is going to become smaller as the number of records increases. In fact, the ratio for N = 40 (red star) is equal to unity (see also the comments for Fig. 11). Looking at the histograms in Fig. 13, several interesting observations can be made. The histogram in Fig. 13a shows the ratio between SR Code,CF (Eq. 24) and SR Code (Eq. 19) in the case where the deviation of the mean spectrum with respect to the code design spectrum at T 1 = 0.84 s (the fundamental period of the structure), δ, is calculated. The ratio is substantially equal to unity with very small variability. This validates the closed-form ratio α x derived in the Methods section (Eq. 25). Essentially, the code-based safety ratio multiplied by α x (from Eq. 25) leads to the performance-based safety ratio based on the DCFD format. Given the interest in the closed-form safety-ratio of the code SR Code,CF from a predictive point of view, we have also randomized the deviation δ (i.e., a uniform distribution between 0.9 and 1.3). The ratio between SR Code,CF (Eq. 24) and SR Code (Eq. 19) in the case when δ is randomized is shown in Fig. 13b. While the ratio is centred around unity, it shows more variability with respect to the case where the deviation is calculated for each bootstrap sample (Fig. 13a). The histogram in Fig. 13c shows the ratio of the safety ratio based on NI with 50% confidence (from Eq. 14) and the code-based safety ratio from Eq. (19). The ratio is centred around unity showing a nice overall agreement (the mean and the median are very close and almost equal to unity 1.02-1.03). In fact, the red star which reports the result for a suite of N = 40 spectrum compatible records is also very close to unity (1.04). The 5% and 95% confidence interval for the ratio is between 0.84 (code conservative) and 1.22 (code unconservative). Figure 13d shows the same ratio with the difference that the safety ratio for NI considers the epistemic uncertainties in the hazard curve and related to limited sample size (84% confidence). In this case, the code would become unconservative and the ratio varies between 1.01 (5th percentile) and 2.16 (95th percentile). This is interesting because the partial safety factors γ el in the NTC 2018 for shear resistance and ultimate rotation are equal to 1.15 and 1.50, respectively. Moreover, the partial safety factor for steel and concrete material properties are equal to 1.15 and 1.50, respectively. As mentioned before, we did not include these factors (set them equal to unity). However, applying these partial safety factors could lead to an increase in safety factor of maximum 1.70 for this structure (less than 95th percentile value of 2.16). Figure 13e shows the histogram of the calculated deviations δ. The histogram shows a truncated normal behaviour between 0.9 and 1.30 centred around the mid interval (mean = 1.12, see also Fig. 5c). This is useful for randomizing δ using a truncated lognormal distribution between 0.9 and 1.3 and centred between the two ends (i.e., mean equal to 1.1) with a coefficient of variation around 6% (β = 0.06).

Discussion
Tables 1, 2 and 3 tabulate the same parameters extracted from bootstrap analysis, discussed in detail for N = 7, for N = [7, 10, 12]. Comparing the tables is useful for observing possible trends as the number of ground motion records increases. It can be observed that the mean SR based on all three methods (code-based, DCFD and NI at 50% confidence) and based on N = 7, 10, 12] are in the range 1.3-1.4, indicating that the structure does not satisfy safety-checking for SLV. This result is also confirmed by the red stars in Fig. 10, which indicate the results for N = 40 spectrum-compatible records. It is to note that the ratio α 50% has median and mean values in range 1.05-1.06 for N = [7, 10, 12] (with a 90% probability interval between 0.83 and 1.28). This confirms the rough ballpark estimate in Sect. 2, which estimated α 50% around 1.02. Even the ratio of safety ratio based on NI and the codebased safety ratio varies in the range 1.02-1.04 for N = [7, 10, 12] at 50% confidence. It is to mention that 50% confidence level is the most suitable for a comparison between the code-based safety ratio and the performance-based ones (DCFD and NI) -considering that the partial safety factors are set equal to unity. This "lucky" coincidence is perhaps facilitated by (a) adopting the DCR as a system level DM, and (b) making sure that the bootstrap realizations lead to meaningful regressions (i.e., integrating the code spectrum compatibility with extra criteria ensuring a meaningful log regression). Another good news is the variation (β) of these ratios that stays in the range 8%-13% for different values of N. The value of dispersion is slightly larger (0.12-0.13) when δ is randomized from a uniform distribution. It is also noteworthy that the ratio of safety ratios based on DCFD and NI is practically equal to unity for all values of N (with DCFD safety ratio being very slightly larger, as expected). Things change at the 84% confidence level (the epistemic uncertainties are considered). The first observation is that DCFD's 84% safety ratio is significantly smaller than that of the NI. The effect tends to reduce as N increases: the mean value of the ratio between safety ratios is around 0.82 for N = 7, around 0.89 for N = 10, and around 0.92 for N = 12. This trend is in line with the results in Fig. 12, where for N = 40, DCFD and NI lead to the same safety ratio at 84% confidence level. The difference observed for smaller value of N can be attributed to the fact that DCFD considers only the epistemic uncertainty in median DCR given S a . This is while the robust procedure described in Sect. 2.2.2 based on NI considers the epistemic uncertainty in all fragility parameters. Comparing the 84% safety ratio from NI and that of the code, shows that the code-based estimates are un-conservative. The 5th and 95% percentile interval for this ratio varies between: 1.04 and 1.67 for N = 12, 1.05 and 1.75 for N = 10, and 1.01 and 2.16 for N = 7. The mean values for the same ratio are 1.48 (N = 7), 1.34 (N = 10), and 1.30 (N = 12). For the structure in question, the maximum effect of applying partial safety factors is estimated around a factor of 1.70. Therefore, even applying the partial safety factors might not be sufficient. Moreover, the code-based confidence factors are not performance-based in the sense that they are not defined in terms of structure's overall safety ratio and that they do not explicitly consider the effect of small sample size. For different values of N, the ratio of closed-form safety ratio (from Eq. 24) based on code provisions and the safety ratio from Eq. (19) (the mean value) is equal to unity with negligible variations (0-1%), when δ is calculated for each mean spectrum. The variation with respect to unity increases (10-11% for N = 7-12), when δ is randomized . This clearly shows that the closedform does a good job in estimating the code-based safety ratio. It is to recall that the utility of the closed-form for code's safety ratio (Eqs. 24 and 25) is both for predictive purposes (multiplying the code safety ratio by α x to arrive at DCFD's performance-based estimate) and for safety-checking when few "collapse" cases are present. As a ballpark estimate, only less than 16% of N records can be collapse-inducing to be able to estimate the 84th percentile for DCR. Finally, it is interesting to observe the trend in the parameters of α x (Eq. 25) for values of N = [7, 10, 12]. S a and β DCR have very similar statistics for all values of N. For all N, the mean estimate for both parameters is around 50%. Moreover, DCR|S a , the standard error of regression (or the standard deviation of the log of DCR give S a ), shows very small variability for different values of N. The mean value for DCR|S a is equal to 0.30 for N = [7, 10, 12] with decreasing dispersion as N increases (from 0.27 for N = 7 to 0.18 for N = 12). The value of β UDCR is also almost invariant around 10% for different N. It is also to note that b × β UF (from Fig. 4) is equal to 0.15 (N = 7), 0.12 (N = 10) and 0.11 (N = 12), and is systematically larger than β UDCR for all N (as expected). It can also be seen that the epistemic uncertainty in spectral acceleration hazard β UH has mean value around 20% with small variability (β = 0.07). As expected, the statistics of β UH do not depend on the number of records. Due to SRSS combination of the epistemic uncertainty in hazard β UH (around 20%) and that due to small sample size β UDCR (around 0.10-0.12) the overall epistemic uncertainty, β UT , has stable statistics with varying N with mean around 0.11-0.13 and dispersion of 16%-20%. The k value (slope of the hazard curve in the log scale) is almost invariant (at 2.4) as expected. Its variation is small since the position of the tangent line can undergo very small variations from one bootstrap realization to the other (due to the change in the value of S a |DCR=1 ). The value of slope b of the DCR versus S a curve in the log scale seems also to have a stable value of mean equal to 0.80 for different values of N with a standard deviation around 30%. This value depends on the behaviour of the structure as a whole: b = 1 indicates equal displacement rule, b > 1 indicates a prevalence of ductile softening behaviour, and b < 1 indicates a prevalence of potentially fragile failure modes.

Conclusions
This work explores the provisions of the recent Italian building code (NTC 2018) for seismic safety-checking by employing non-linear time-history analysis. The Italian code safety-checking is based on semi-probabilistic partially prescriptive procedures. Compared to its American and European counterparts, the Italian code seems to impose less restrictive spectrum compatibility criteria on the selected set of ground motion records. In fact, compatibility with code's design spectrum is required with the mean spectrum and not the spectra for individual records. This leads to less constringent variance reduction consequences for the records and leaves the floor open to comparisons with performance-based safety-checking. Based on the minimum acceptability criteria provided in the code, it is quite likely that the non-linear time history analysis is going to be done by employing very few ground motion record groups (N ≥ 7). This implies that the epistemic uncertainty to limited number of records (limited sample size) is going to be significant. The code-based provisions consider the uncertainties with application of partial safety factors. Nevertheless, the partial safety factors are expected to cover several sources of uncertainty in the material mechanical properties and the structural capacity and do not explicitly address the issue of record-to-record variability and associated epistemic uncertainty due to limited sample size. Another issue regards the [− 10%, + 30%] interval of tolerance for the mean spectrum regarding design code spectrum-compatibility. In a performance-based framework, such tolerance would perhaps be considered in terms of the epistemic uncertainty in the seismic hazard curve. Given such premise, the results of code-based safety checking are compared with performance-based safety-checking based on (1) DCFD: a simplified procedure for safety-checking that can also consider the epistemic uncertainties in a firstorder sense (i.e., in the median); (2) convolution of structural fragility and hazard based on numerical integration (NI) and propagation of the uncertainties in the estimation of the fragility parameters. Both performance-based methods envision propagation of the epistemic uncertainties in the seismic hazard curve. We had to pose some limitations in order to arrive at meaningful comparisons: (1) the code-based partial safety factors are set to unity; (2) the collapse-inducing records are filtered out of the selected suits of ground motion records. The following considerations can be made: • The choice of DM: Code-based safety-checking both at the component level and at the system level is made possible by choosing a global demand to capacity ratio (DCR) as the DM. This damage measure, which is equal to unity at the onset of the limit state, finds the critical demand to capacity ratio among different components. Although formulated in fully deformation-based terms in this study, the DCR manages (thanks to adopted structural model) to perform both ductile (deformation-based) and fragile (resistance-based) safety checking as required by the code for two ultimate limit states of near collapse and life safety, respectively. • Record selection criteria: To be able to make comparisons with performance-based safety-checking, the bootstrap sampling is subjected to simple additional requirements that go beyond the code-specific minimum requirements. These requirements, which refer to the criteria for record selection for Cloud Analysis, aim at making sure that a statistically significant regression of the damage measure versus seismic intensity can be performed. • Comparison at 50% confidence: It can be observed that, at 50% confidence level (no epistemic uncertainties considered), the ratio of the two performance-based safety ratios (DCFD and NI) to the code-based safety ratio (i.e., the mean DM) show fairly stable statistics as the number of records change. In fact, the ratios have a mean value slightly larger than unity (in the order of 3-6%) and relatively small dispersion (8-11%). This agreement can be attributed also to the additional criteria set for selecting the records in the bootstrap procedure (mentioned in the previous point). The closed-form expression derived for the ratio between DCFD and code-based safety-checking confirms the observed results. • Comparison at 84% confidence: At 84% confidence level (epistemic uncertainties due to limited sample size and the epistemic uncertainty in the seismic hazard curve), the performance-based procedure with NI and a full propagation of the uncertainties in the fragility parameters leads to significantly larger safety ratios both with respect to the DCFD and the code procedure. It is shown that the difference between the 84% safety ratios between the NI and DCFD reduces to zero for larger record sets (N ≃ 40). For N = 7 the difference between the safety ratio based on NI and that of the code in extreme cases can arrive up to a factor of 2.16 (95% percentile). This difference reduces to a factor of 1.67 (95% percentile) for N = 12. It can be argued that application of partial safety factors (although different conceptually) might mitigate such effect only partially.
• Difference between the UHS and the code design spectrum: It is shown that, for this case study, and for the same acceptable exceedance rate λ o , performance-based safety checking based on UHS corresponds to smaller intensity values compared to code's design spectrum (i.e., S o, UHS a (T 1 ) is smaller than S o, Code a (T 1 ), see Fig. 8). Alternatively, if performed for the same intensity level (as it has been done here, for S o, Code a (T 1 ) = 0.40 g), performance-based safety checking corresponds to a smaller level of acceptable exceedance rate. More specifically, in our example, the performance-based (DCFD and NI) safety-checking results are "magnified" with respect to code results (i.e., S o, Code a (T 1 ) is 1.29 times S o, UHS a (T 1 )). This cannot be generalized as such difference depends on both the range of period of interest and the shape of UHS compared to that of the code spectrum.
It is to keep in mind that the present work is subjected to some limitations: • The bootstrap sampling is conditioned on and limited to the original pool of 92 noncollapse inducing ground motion records. Bootstrap is effective in mimicking the variation in the different parameters. Nevertheless, it does not represent the domain of all possible ground motion records. However, considered as a proof of concept, the bootstrap has provided interesting insight herein. The results cannot be generalized without further verifications. • The works focuses on record-to-record variability only and the uncertainties in structural model parameters are not considered. This was done to single out the effect of record-to-record variability. • The collapse-inducing records have been filtered out. This "limitation" was implied by the code itself which recommends using the mean response for safety-checking (N ≥ 7). This issue and the relative interpretation of the code need further elaboration that is beyond the scope of this work.
It seems that, satisfying both code-spectrum compatibility and additional requirements (i.e., those related to careful record selection for Cloud analysis), that favour a meaningful trend between the structural damage and seismic intensity, render the results of code-based safety-checking very close to those based on performance-based safety checking procedures. On the other hand, the performance-based safety checking with simplified DCFD can be achieved with almost no additional effort with respect to code-based safety checking. Further, the DCFD is very accurate in safety-checking with 50% confidence (no epistemic uncertainty). It could be unconservative for safety-checking at 84% confidence for small number of records (N in the range of 7 to 12). Nevertheless, the difference between DCFD and NI (full uncertainty propagation) is going to diminish for larger record sets (N = 40). Of course, implementation of performance-based safety-checking concepts in the code requires a comprehensive shift from semi-probabilistic methods based on partial safety factors to performance-based methods based on structural response. It should be noted that there are not many papers available in the literature related to safety-checking with nonlinear time-history analysis following the provisions of the recent Italian code. Therefore, we have had to devise several novel concepts to make comparisons between code-based and performance-based safety checking possible. These concepts include the definition of code-based safety ratio (also the closed-from interpretation), the definition of hazard curve for the damage measure (DCR) and propagation of epistemic uncertainties (limited sample size and hazard curve), and the definition of safety ratio for the performance-based safety-checking using NI.