Model selection
Following the methodology described in 3.2.1, explanatory variables are added one by one to the ordered systematic component of the GLM [Eq. (4)], as well as possible interaction terms between these variables. At each step, a likelihood ratio test is used to determine whether the added term results in a significantly better fit of the most complex model compared with the simple model. The same test is used to compare the ordered model with an equivalent but more complex, multinomial model [Eq. (3)], which relaxes the assumption that the response categories (i.e., damage states) are ordered.
Two different link functions will be considered: the probit and logit link functions. Because models differing by their link functions are non-nested models, we use the AIC [Eq. (5)] in order to determine which link function provides a better fit to the data.
In this stage of the analysis, the binary debris impact variable has been defined for a radius of projectile action of 50 m. Indeed, this distance is thought to be significant enough for the linear predictor to also contain meaningful variations in measured flow depths and simulated flow velocities, while not being too large, which could increase the uncertainty of whether a building has in reality been impacted or not. Each possible radius is considered individually in the next stage (Sect. 4.2). The variable ‘number of stories’ was treated as a continuous variable.
The results from Tables 2 and 3 indicate that, for both link functions the more complex multinomial model generally does not yield a better fit compared to the ordered model. Thus ordinal regression is an adequate choice for the analysis. A comparison between the AIC values of the best fitting probit model and the best fitting logit model (i.e., 5862 and 5766, respectively) indicates that the logit function leads to a slightly better fit.
Table 2 Deviance and AIC values of the ordered and multinomial probit models. The values in bold represent minimum deviance and AIC values (i.e. the best fitting model)
Table 3 Deviance and AIC values of the ordered and multinomial logit models. The values in bold represent minimum deviance and AIC values (i.e. the best fitting model)
According to the results in Table 3, the explanatory variables to be included in the model [i.e., Eq. (4)] are: the available tsunami IMs (flow depth, flow velocity, debris impact), the building’s construction material, the number of stories; and the interaction terms to be included are interactions between flow depth and velocity (h.v), flow depth and debris impact (h.I), flow velocity and debris impact (v.I), flow depth and building material (h.M), flow velocity and building material (v.M), material and debris impact (M.I), number of stories and flow depth (S.h), number of stories and debris impact (S.I), as well as number of stories and building material (S.M). These findings are consistent with the physical processes at play during tsunami flow-structure interaction. Indeed, although the extent to which maximum flow velocity interacts with maximum flow depth is unknown, it is expected an increase in maximum flow velocity would have an effect on the rate of increase in damage probability which depends on the value of flow depth. Similarly, it is expected that the effects of hydrostatic and hydrodynamic loads (i.e., flow depth and velocity) on increasing damage probabilities will depend on whether the building is also suffering debris impact loads. The same reasoning can be applied for other interaction terms.
Finally, it is interesting to note for both link functions, and aside of significance testing, the largest reductions in model deviance (i.e., more than 1000) result from the addition of building material and number of stories to the initial flow depth as explanatory variables. This result is consistent with previous findings (Suppasri et al. 2013a, b; Leelawat et al. 2014) and indicates that aside from tsunami IMs, the construction characteristics of a building strongly determine the severity of damage. Comparatively, the difference in deviance associated with the inclusion of debris impact and flow velocity is smaller. This can be explained by the uncertainty associated with the determination of both variables, as flow velocity inland was calculated with a constant roughness coefficient and using nonlinear shallow water equations, which may not be perfectly accurate onshore (Charvet 2012) and the debris impact variable is relatively coarse, based on the assumption that debris impact is due to washed away buildings rather than actual individual observations. Nevertheless, these variables still yield significant model improvement (which is consistent with field survey evidence, showing the unquestionable contribution of these IMs to tsunami-induced damage—see Sect. 1, as well as Figs. 1 and 4). It is expected that more dramatic reductions in deviance and AIC will result if refined estimations for these two IMs are obtained in the future.
Penalized accuracy and debris impact radius
In the aftermath of a tsunami attack, and due to the different responses of structures to tsunami loads depending on the building’s construction type, fragility is often represented separately for different building classes. Thus, following the method presented in 3.2.2, the overall penalized accuracy of the model is calculated for each building class and for each debris impact radius (10 m–150 m). The required number of iterations of the repeated cross-validation scheme is determined through the examination of the results’ stability, for a debris impact radius of 50 m. We consider the model is stable when the variability of the penalized accuracy results is in the order of 0.01 % (so no additional iterations are required). Table 4 presents the overall penalized accuracy for each building class and debris impact radius, as well as the number of iterations used. Note that for Steel buildings, the number of stories variable and associated interactions could not be included in the model. Indeed, this information was only available for 12 points in the Steel database, which is not sufficient for the present regression analysis (Green 1991).
Table 4 Overall penalized accuracy of the ordinal model calculated using N repetitions of a tenfold cross-validation algorithm. The values in bold represent the maximum accuracy obtained for each building type
The results in Table 4 show that the model yields a high predictive accuracy (>90 %) for all building classes. This result partly reflects the commonality of heavy damage states (Fig. 6), particularly for Wood and Masonry buildings, for which 85 % and 65 % of the data, respectively, lie in the “collapse” category, making the overall possible loss trivial for potentially misclassified data. However, the model performs reasonably well also with regard to accuracy for Steel and RC structures, for which the damage distributions, while skewed to the left, are more even. It is therefore considered from such results that the predictors and model chosen are adequate to predict building damage probabilities for this event.
The highest accuracy for RC buildings is obtained for a debris impact radius of 90 m, for Steel buildings 30 m, for Wood buildings 130 m to 140 m, and for Masonry buildings 120 m. These results appear consistent with the mechanisms of debris action; as the kinetic energy of the flow (as well as the moving debris) will decrease progressively as it travels: Wood and Masonry buildings being the weakest construction types, it is expected that the effects of projectiles impacting such buildings will be influential over longer distances from the point of debris generation in comparison with strong structural materials such as RC and Steel. We can also observe that the variations in model accuracy across the different radiuses considered for debris impact are relatively small: a maximum of 1.03 % (± 2 buildings) for RC, 1.29 % (± 1 building) for Steel, 0.82 % (± 44 buildings) for Wood and 0.80 % (± 10 buildings) for Masonry. This means that between 10 m and 150 m only from a washed away house, the change in the debris impact variable (i.e., the consideration of buildings impacted further from the debris generation point) does not yield large differences in model performance. This possibly indicates that the effects of debris impact should be considered on a much larger scale and/or that the debris impact variable should be refined (for example, by using visual evidence of impact for individual buildings, as done by Reese et al. (2011) in the case of the 2009 Samoa tsunami). Because such evidence is not available, for derivation of the final fragility functions in this study we will use the debris impact radius resulting in the highest predictive accuracy, as shown in Table 4.
Fragility functions
The multivariate fragility functions—Eq. (4)—corresponding to the highest damage state (DS5, defined in this study as the aggregation of damage states 5 and 6 defined by the MLIT, see Table 1) are represented for each building class in Figs. 7, 8, 9 and 10 by fragility surfaces. The range of observed explanatory variable values for the expected probabilities (circles) plotted on Figs. 7, 8, 9 and 10 correspond to the values of maximum flow depths and velocities from the observed data, illustrating the extent of available data coverage. The smooth surfaces are constructed not through spatial interpolation, but by approximating the discrete expected probabilities on a two-dimensional grid (D’Errico 2010). Because the debris impact has been defined as a binary variable, it is possible to represent separately the probability outcomes with and without debris impact.
The results presented for Wood buildings (Fig. 7) show very clearly that buildings which have been impacted will rapidly reach high probabilities of collapse (i.e., >0.9), from flow depths and velocities as low as 2.5 m and 5.4 m/s, respectively. On the other hand, buildings which have not been impacted will experience a milder increase in collapse probability with flow depth and particularly velocities, ultimately reaching the higher probability range for a combination of very high depths and velocities.
The evolution of collapse probabilities for Masonry buildings (Fig. 8) behaves similarly to the one of Wood buildings. Again the occurrence of debris impact seems to trigger a much steeper rise in damage probability in comparison with the no impact case. However, compared to Wood buildings, high probabilities of collapse (>90 %) when the structure has been impacted seem to occur after flow depth and velocity have reached higher values (7 m and 9.5 m/s, respectively). According to the present results, buildings which have not been impacted would reach a 90 % collapse probability for a combination for moderately severe flow conditions (a flow depth of 4 m associated with a flow velocity of 6 m/s). This can be explained by the fact that Masonry structures are relatively more resistant than Wood buildings (Leone et al. 2011; Suppasri et al. 2013a, b). In addition, it can be noticed from Fig. 8 that the range of flow depths and velocities available for buildings which have not been impacted is very reduced in comparison with the impact case. A meaningful extrapolation and interpretation of the fragility surface beyond the range of the available data would require additional statistical prediction methods to be discussed and applied; however, it is expected for larger values of tsunami IMs the shape of the fragility surface in the no impact case would resemble the one for Wood buildings.
The fragility surfaces for RC buildings (Fig. 9) also show an important increase in collapse probability in the case of debris impact, with very rare occurrences of collapse when the building is not impacted. Similarly to the dataset for Masonry buildings, the range of flow depths and velocities in the no impact case is much smaller than in the impact case, with most expected probabilities of collapse being very small (i.e., <0.15). The shape of the corresponding fragility surface for larger values of these IMs is thus difficult to infer. In line with previous studies (Rossetto et al. 2007; Mas et al. 2012; Suppasri et al. 2013a, b), the much steeper fragility surfaces for Wood and Masonry structures show that RC buildings are more resistant to tsunami loads. An interesting feature of the fragility surface corresponding to impacted buildings is the visibly stronger influence of flow velocity on the rate of increase in damage probability, compared to flow depth. This could be expected as the dynamic impact force is proportional to the square of the flow velocity, although such relativity between the effects of the two variables seems less prominent in the case of weaker structural types. This indicates that hydrostatic loads alone may not significantly determine RC building collapse, and is consistent with previous findings demonstrating flow depth alone to be a poor predictor of tsunami damage for RC structures (Charvet et al. 2014b).
In the case of Steel buildings (Fig. 10), the difference between the impact and no impact cases is less striking than for other building types. The rate of increase in collapse probability appears similar in both cases, with regard to flow velocity. The increase in damage probability with flow depth appears very minor for buildings which have not suffered debris impact, and in line with the result from Charvet et al. (2014b) for Steel buildings this also indicates the hydrostatic loads may not have a significant influence on damage for this construction material. Nevertheless, it can be seen that the scarcity of data for this construction type leads to results which are inconsistent with the other building classes.
The construction of the fragility surface for the non-impacted buildings in the high probability region is mainly driven by one high probability point, with most of the data lying in a range of very low probabilities (i.e., <0.10). Similarly, the collapse probabilities for impacted buildings are mostly <0.50, so half of the probability domain cannot be reliably covered by the fragility function. Thus, the probability values shown by the fragility surfaces for Steel buildings should be considered of limited use, until such functions can be refined using additional data. Information on building height is also crucial to update model estimations consistently with the other construction classes considered.
Given the nature of the new fragility functions estimated in this study, which are multivariate, derived using different statistical assumptions, methods, and model diagnostics compared to previously published research, the fragility results cannot be directly and qualitatively compared. Nevertheless, and regardless of the statistical estimation methodology adopted, all previous studies without exception derived two-dimensional fragility functions (i.e., fragility curves), mostly relating the maximum tsunami flow depth to damage probabilities. Therefore, as a point of reference fragility curves are also derived here, based on the measured flow depth only as explanatory variable (Fig. 11), and consistently with the ordinal regression method adopted throughout the paper. Table 5 shows example combinations of maximum flow depths and velocities yielding specific collapse probability thresholds (i.e., 0.25, 0.50, 0.75, 0.95) from the fragility results presented in this section (Figs. 7, 8, 9, 10, 11). This allows for the comparison of outcomes between the multivariate fragility functions (in cases of both impact and no impact), and simple fragility curves. The examples values chosen indicate fragility curves based on flow depth only may underestimate collapse probabilities, particularly in cases where debris impact has a dominant effect.
Table 5 Combinations of flow depth and velocity values for corresponding collapse probabilities, using fragility surfaces (impact/no impact) and fragility curves (one explanatory variable only)