1 Introduction

According to the International Disaster Database (2013), the 2011 Great East Japan earthquake and tsunami caused the highest estimated damage, USD 201 billion, among other natural disasters during 1990–2012. At 14:49 JST on March 11, 2011, a M9.0 earthquake was recorded, and triggered large and powerful tsunami waves which attacked Japan. The East coast of Japan suffered extensive damage and the destruction of more than 400,000 buildings (National Police Agency 2011).

The objective of this study is to provide a quantitative assessment of the influence from the factors that appear to be determinant on tsunami damage, namely the inundation depth, the coastal topography, the number of floors, the structural material, and the function of buildings (Suppasri et al. 2012a, c, 2013, 2014; etc.). Such an assessment allows for the ranking of such factors by order of importance in their contribution to the damage level. In addition, it is possible to suggest a relationship between the significant variables and the estimated damage level.

In Sect. 2, a review of the literature on tsunami damage prediction is carried out, highlighting the usual damage factors considered. In Sect. 3, the study area is presented, followed by a description of the ordinal regression methodology (Sect. 4). Section 5 covers the data collection and analysis. Finally, the results, their applicability, and their implications are discussed in Sect. 6.

2 Literature review

2.1 Building damage due to tsunami inundation depth

Shuto (1993) studied the relationship between a range of tsunami inundation depths and building damage using the information from historical tsunamis. For example, it was found that if the tsunami inundation depth is higher than 2-m wooden houses may collapse, for an inundation depth of 8 m, reinforced concrete buildings may collapse. Subsequent studies confirmed such results: Ruangrassamee et al. (2006) found from the data of the 2004 Indian Ocean tsunami that a 2-m inundation depth can destroy a wooden house, and Reese et al. (2007) found this same inundation depth would destroy unreinforced brick buildings. After Shuto (1993), the damage criteria for each structural material against a range of tsunami inundation depths have been investigated further (Suppasri et al. 2013). Suppasri et al. (2014) also studied the damage criteria by using coastal topography categorizing into ria coast and plain coast while Charvet et al. (2014) categorized by the geographical environment into plain, terrain (i.e., “a narrow coast backed up by high topography”), and a river. Table 1 summarizes the research related to building damage criteria on structural and inundation depth. Table 2 summarizes the research related to building damage criteria on coastal topography and inundation depth. In addition, “tsunami fragility” was introduced as a new measure for estimating tsunami damage to buildings (Koshimura et al. 2009b). Some studies proposed fragility curves for structural destruction from tsunami for many events such as the 1993 Okushiri tsunami in Japan (Koshimura et al. 2009a; Koshimura and Kayaba 2010; Suppasri et al. 2012b), the 2004 Indian Ocean tsunami (study area: Sri Lanka) (Muora and Nakazato 2010), Banda Ache, Indonesia (Koshimura et al. 2009c) and Phuket and Phang Nga, Thailand (Suppasri et al. 2011), the 2009 American Samoan tsunami (Gokon et al. 2011), the 2010 Chilean tsunami (study area: Dichato, Chile) (Mas et al. 2012), and the 2011 Great East Japan tsunami (study area: Miyagi prefecture, whole of Japan and Ishinomaki city) (Suppasri et al. 2012c, 2013, 2014, respectively).

Table 1 Summary of building damage criteria (structural material and inundation depth)
Table 2 Summary of building damage criteria (coastal topography and inundation depth)

2.2 Vulnerability of buildings as estimated by the Papathoma tsunami vulnerability assessment method (PTVA)

The Papathoma tsunami vulnerability assessment method (PTVA) was developed by Papathoma et al. (2003). Based on the importance of characteristics of buildings identified by previous field surveys of tsunami events and calculations and using a multi-criteria evaluation method, Papathoma et al. (2003) set weight factors for various criteria according to their relative importance as follows: (1) “building material” (weight factor 7), (2) “row” (weight factor 6), (3) “surrounding” (weight factor 5), (4) “condition of ground floor” (weight factor 4), (5) “number of floors” (weight factor 3), (6) “sea defense” (weight factor 2), and (7) “natural environment” (weight factor 1) (Papathoma et al. 2003). They formulated the vulnerability of each building (BV) as follows:

$${\text{BV}} = (7 \times a) + (6 \times b) + (5 \times c) + (4 \times d) + (3 \times e) + (2 \times f) + (1 \times g)$$
(1)

In Eq. (1), a is the standardized score (i.e., raw score of the building/maximum raw score) of building material; b is the standardized score of row of the building; c is the standardized score of number of floors; d is the standardized score of building surroundings; e is the standardized score of ground floor; f is the standardized score of sea defense in front of the building; and g is the standardized score of width of the intertidal zone in front of the building. PTVA-3 is a revised version of PTVA, which has been tested at Maroubra, Sydney (Dall’Osso et al. 2009a, b).

Moreover, previous studies (Papathoma and Dominey-Howes 2003; Papathoma et al. 2003) show the importance of building physical parameters and their surroundings in analyzing building damage by tsunami, thus such parameters will also be considered in this study. Also, we included other parameters (i.e., inundation depth, coastal topography, function of the building) following Koshimura et al. (2009b), Shuto (1993), and Suppasri et al. (2012a, b, 2013, 2014) in our study.

3 Study area

Following the 2011 Great East Japan Earthquake and Tsunami, among the 251,301 buildings surveyed by the Ministry of Land, Infrastructure, Tourism and Transport (MLIT), more than 25 % (63,605 buildings) were in Ishinomaki city. According to the damage and field survey of Suppasri et al. (2014), while the coastal topography can be separated into ria and plain coasts, the residential area is located in the plain area inside the bay (see Fig. 1). Some parts of the city are located along the Sanriku ria coast. According to a visual inspection from satellite images, the amount of washed-away buildings in the area outside the breakwaters was found to be as high as 88.4 % while inside the breakwater protected area, the amount of washed-away buildings was only 42.8 %. (Gokon and Koshimura 2012).

Fig. 1
figure 1

Ishinomaki city (Suppasri et al. 2014)

4 Research design and methodology

4.1 Methodology

The present analysis was performed using IBM SPSS version 19. Given the number of predictor variables to be taken into account and the relative simplicity of linear regression analysis comparatively to other regression techniques, multiple linear regression was initially considered as a potential tool for analysis. However, a preliminary inspection of the data revealed that applying multiple linear regression would violate the associated statistical assumptions: According to Crewson (2006), Osborne and Waters (2002), and Seber (1977), the variables should follow a normal distribution, and they should also display homoscedasticity (i.e., the variance of errors needs to be constant), have the mean of errors equal to zero, and be independent (i.e., no trend in the errors). The basic assumption of normally distributed data is violated since normal distributions can only be applied to continuous response variables, so we did not select multiple linear regression. Because our objective is to estimate the damage level, which can be considered as a categorical dependent and ordinal outcome, ordinal regression is likely to be the most suitable statistical technique for our study.

Ordinal regression is a method used to determine the direction of the relationship between each predictor and a categorical outcome (Chan 2005), taking into account the ordered (“ordinal”) nature of such outcome. The strengths of ordinal regression consist in “identifying significant explanatory variables that influence the ordinal outcome,” “describing the direction of the relationship between the ordinal outcome and the explanatory variables,” and “performing classifications for all levels of the ordinal outcome, subsequently evaluating the validity of the regression model” (Chen and Hughes 2004). Ordinal regression has been often used in medical sciences (Bender and Grouven 1997; Lall et al. 2002; Sutton et al. 2000).

According to previous studies and the available data, the assumed predictor variables are (1) the inundation depth, (2) the coastal topography, (3) the number of floors, (4) the structural material, and (5) the function of the building. The dependent variable is the damage level.

4.2 Dependent variable: damage level

Based on the MLIT classification of damage, the degree of building damage can be categorized into six levels: (1) minor damage, (2) moderate damage, (3) major damage, (4) complete damage, (5) collapsed, and (6) washed away. The description and schematically illustration of each damage level are given in Table 3. Besides damaged buildings, there were a small number of buildings with no damage.

Table 3 Damage levels, classification descriptions, and condition of buildings categorized by MLIT

4.3 Independent variables (predictors)

The assumed independent variables used in this study were chosen based on previous studies and include: (1) the number of floors (Papathoma and Dominey-Howes 2003), (2) the inundation depth (Koshimura et al. 2009b; Matsutomi and Harada 2010; Reese et al. 2007, 2011; Ruangrassamee et al. 2006; Shuto 1993; Suppasri et al. 2012a, c, 2013; Valencia et al. 2011), (3) the coastal topography (Charvet et al. 2014; Suppasri et al. 2014), (4) the building function (Suppasri et al. 2013, 2014), and (5) the structural material (Papathoma and Dominey-Howes 2003; Papathoma et al. 2003; Suppasri et al. 2013, 2014).

In Ishinomaki, the tallest damaged building in MLIT’s data has fourteen floors. This study uses the metric system (i.e., meter) for the inundation depth. The coastal topography is divided into two types of coast in Ishinomaki city: ria and plain coasts. In this study, the structural material has been categorized into four types: (1) wood, (2) reinforced concrete, (3) steel, and (4) masonry. Similar to Suppasri et al. (2014), the buildings were classified into six functional categories, based on MLIT’s classification system: (1) residential houses, (2) shared accommodations, (3) commercial facilities, (4) industrial plants, (5) public facilities, and (6) agriculture–forest–aquaculture facilities. The definition of each category is given in Table 4, and Fig. 2 schematically illustrates the building function.

Table 4 Function of building and their definitions
Fig. 2
figure 2

Illustration of function of building

5 Data collection and analysis

5.1 Data collection

The detailed data of damage buildings collected during field surveys by MLIT were obtained from Ishinomaki city. There were 68,596 buildings in the dataset (both ria and plain coasts). The tsunami inundation depth of each building shown in the MLIT data was obtained from the Tohoku Earthquake Tsunami Joint Survey Group (2011), the MLIT survey, other survey reports, photos and videos or other visual materials, eyewitness accounts, and other sources.

5.2 Descriptive statistics

Although there were 68,596 buildings in the raw dataset, the information was only complete and usable for 32,429 buildings (47.18 % of the total). The reduction from the original amount of data still allows for extremely large sample sizes to be analyzed, thus does not compromise the power of the following analysis (Green 1991). Table 5 shows the descriptive statistics of inundation depth and the number of floors.

Table 5 Descriptive statistics of floors and inundation depth

Following the damage level categorization mentioned in Sect. 4, the descriptive statistics of damage level can be seen in Table 6. While the largest group is damage level 5 (N = 7,821; 23.4 %), the smallest is damage level 4 (N = 477; 1.5 %), and there are 205 buildings (0.6 %) reported to have not suffered any damage.

Table 6 Descriptive statistics of damage level

According to Suppasri et al. (2014)’s categorization, Table 7 shows the descriptive statistics of coastal topography. 89.7 % of the buildings are located in the plain coast, and 10.3 % are on the ria coast. Table 8 shows the descriptive statistics of structural material. Wooden buildings form the largest group (84.3 %). As shown in Table 9, among six functions of building, the largest group is residential houses (65.3 %), followed by shared accommodation (21 %), commercial facilities (6.6 %), transportation/storage facilities (4.8 %), and public facilities (1.5 %), and the smallest group is agriculture, forest, and aquaculture facilities (0.8 %).

Table 7 Descriptive statistics of coastal topography
Table 8 Descriptive statistics of Structural Material
Table 9 Descriptive statistics of function of building

5.3 Testing for correlated predictors

Before performing regression analysis, it is necessary to check that all predictor variables are independent. Indeed, when predictors are highly correlated, multicollinearity can occur and strongly affect the coefficient estimates of the regression model, making it non-robust to small variations in the predictors (Farrar and Glauber 1967; Katz 2011; Vanichbancha 2006).

A Pearson product–moment correlation coefficient, which is a method to determine the strength of the relationship between two factors (Chan 2003; Kaiyawan 2010; Katz 2011), was computed to assess the relationship between the number of floors, the inundation depth, the coastal topography, the structural material, and the building function. Results are shown in Table 10. While others had no high correlation, according to Chan (2003), the correlation coefficient value showed moderate strong relationship (i.e., correlation coefficient value is more than 0.6 (Chan 2003)) between coastal topography and inundation depth (r = −0.613) at the significance level p < 0.01, which demonstrates that the relationship is unlikely to happen by chance (Chan 2003). This result was highly expected given that the physics of the inland flow is predominantly driven by land and coastal features. Therefore, coastal topography was eliminated from our analysis.

Table 10 Correlational analysis

5.4 Ordinal regression analysis

Next, the data were analyzed using ordinal regression. Similar to logistic regression, ordinal regression uses a so-called link function to express the relationship between the linearly related predictors and the mean outcome: Because the logit is the link function typically considered to be adequate for multinomial distributions (Chan 2005; Gelman and Hill 2007; Norusis 2010), it was initially selected. The ordinal regression follows the assumption that all categorical outcomes have the same set of parameters. This assumption can be verified using the test of parallel lines (i.e., test whether the coefficient estimates for each variable across categories are all the same) (Chan 2005; Norusis 2010). However, the test of parallel lines showed significance at level p < 0.001 for the logit link, thus the assumption that all categories contain the same set of parameter was not reasonable. The complementary log–log link (Clog–log) is likely to be a suitable alternative due to its typical application (i.e., higher categories more probable). Therefore, the Clog–log link function was tested in a similar fashion, and the null hypothesis (i.e., the location parameters (slope coefficients) are the same across response categories) could not be rejected. Hence, the Clog–log link function was selected. The buildings with no damage were set to be our reference category for the damage level. For the predictor variables, following Katz (2011)’s suggestion to choose the largest sample size when the hypothesis does not lead to choose a particular category, residential houses were set to be our reference category for function of building and wood was set to be our reference category for structural material.

Here, in order to see the amount of variation in output that can be explained by the predictor variables, the model-fitting statistic, so-called Pseudo-R 2, was calculated (Chen and Hughes 2004). Based on the methodology from Norusis (2010), the three commonly used Pseudo-R 2 formulas (Cox and Snell 1989; Nagelkerke 1991; McFadden 1974) of the analysis have been applied. The results showed as follows: R 2Cox and Snell  = 0.861; R 2Nagelkerke  = 0.893; R 2McFadden  = 0.591. They indicate that at least about 60 % is being explained by this model. It is normal that R 2McFadden tends to be much lower than R 2Cox and Snell and R 2Nagelkerke (Tabachnick and Fidell 2013). As well as Ganguly et al. (2010), the model with R 2McFadden more than 0.4 is considered as very good-fit. The result of the ordinal regression analysis is shown in Table 11. All thresholds (except for the damage level 2) are found to be significant at level p < 0.001. The results also show that significant explanatory variables include inundation depth (p < 0.001), shared accommodation function (p < 0.001), commercial facility function (p < 0.01), transportation/storage facility function (p < 0.01), reinforced concrete structural material (p < 0.001), and steel structural material (p < 0.001).

Table 11 Explanatory variables associated with the damage level based on ordinal regression with the complementary log–log link

Since our link function is Clog–log, the general model is formulated as follows (see Norusis 2010):

$$\ln ( - \ln (1 - \gamma_{j} )) = [\theta_{j} - (\beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{3} x_{3} + \cdots + \, \beta_{m} x_{m} )]/\exp (\tau_{1} z_{1} + \tau_{2} z_{2} + \tau_{3} z_{3} + \cdots + \tau_{n} z_{n} )$$
(2)

In Eq. (2), γ j is the cumulative probability of damage for the jth category (j = {1,…, 5}), θ j is the threshold for the jth category, x i are the predictors, β 1β m are the m regression coefficients (m representing the number of predictors), and τ 1τ n are n coefficients for the scale component.

If we substitute the significant explanatory variables into Eq. (2), we obtain:

$$\ln ( - \ln (1 - \gamma_{j} )) = \{ \theta_{j} - [\beta_{{{\text{func}}\_{\text{shared}}}} x_{{{\text{func}}\_{\text{shared}}}} + \beta_{{{\text{func}}\_{\text{comm}}}} x_{{{\text{func}}\_{\text{comm}}}} + \beta_{{{\text{func}}\_{\text{tran}}}} x_{{{\text{func}}\_{\text{tran}}}} + \beta_{{{\text{mat}}\_{\text{rc}}}} x_{{{\text{mat}}\_{\text{rc}}}} + \beta_{{{\text{mat}}\_{\text{steel}}}} x_{{{\text{func}}\_{\text{steel}}}} ]\} /\exp (_{\text{depth}} z_{\text{depth}} )$$
(3)

In Eq. (3), β func_shared is the regression coefficient obtained for the shared accommodation building function, β func_comm is the regression coefficient for the commercial facility building function, β func_tran is the regression coefficient for transportation/storage facilities, β mat_rc is the regression coefficient for the reinforced concrete structural material, β mat_steel is the regression coefficient for the steel structural material, τ depth is the scale component [i.e., a component used to account for differences in variability for different values of the predictor variables (Norusis 2010)] coefficient corresponding to inundation depth, x func_shared, x func_comm, x func_tran, x mat_rc, x func_steel are the predictor variables (i.e., each x representing a different value of the building function and building material categorical variables), and z depth is the continuous predictor variable for the scale component as the thresholds and regression coefficient estimate are shown in Table 12.

Table 12 Summary of prediction model

5.5 Accuracy of the mean function

In order to evaluate the accuracy of the model, we applied a cross-tabulating method. The predicted classification and the actual classification are shown in a 5 × 7 classification table (Table 13), along with the proportion of correct estimations (in bold). The actual damage-level-1 buildings are estimated correctly for 56.0 % of the buildings (21.9 % are estimated as damage level 2 and 22.0 % are estimated as damage level 3). 33.0 % are correct for damage level 2 (35.1 % are estimated as damage level 1 and 31.8 % are estimated as damage level 3). 68.9 % are correct for damage level 3 (22.0 % are estimated as damage level 2). 55.9 % are correct for damage level 5 (39.2 % are estimated as damage level 3). 48.0 % are correct for damage level 6 (48.3 % are estimated as damage level 5). In general, the model can estimate the actual damage level ± one damage level. However, the model does not estimate any buildings to be at damage level 4 due to the truly small samples in actual damage level 4. On the other hand, the actual damage level 4 buildings are estimated as damage level 5 (60.8 %).

Table 13 Cross-tabulation analysis

5.6 Relative importance of the predictors

In this section, we aim at finding the explanatory variables which influence the damage level for each structural material, then for each function of building.

5.6.1 Building material

A number of studies (Matsutomi and Harada 2010; Reese et al. 2007, 2011; Ruangrassamee et al. 2006; Shuto 1993; Suppasri et al. 2012a, c, 2013; Valencia et al. 2011) showed that the range of inundation depths influences the scale of damage differently when structural material is taken into account (see Table 1). We continued the analysis by using the same method as applied previously but reduced the scope of data into each specific building’s structural material in order to check the significant variables which can influence the damage level. Table 14 shows the results from the ordinal regression analysis applied to structural material. Similar to our previous results, the inundation depth is the significant explanatory variable for all structural materials. The number of floors is the significant explanatory variables for only steel and wood buildings (p < 0.001 and p < 0.01, respectively). The function of shared accommodation is found to be significant for reinforced concrete (p < 0.05), wood (p < 0.001), and masonry (p < 0.05). The commercial facility function is not a significant explanatory variable for any structural material. The transportation/storage facility function is found to be significant for only steel building (p < 0.05). The public facility function is found to be significant for reinforced concrete and masonry buildings (p < 0.05 for both of them). The agricultural facility function is found to be significant for only reinforced concrete buildings (p < 0.05). The regression coefficients, the Pseudo-R-squares, and the accuracy results from the cross-tabulating method are shown in Table 14.

Table 14 Explanatory variables associated with the damage level based on ordinal regression with the complementary log–log link for specific structural material

5.6.2 Building function

We continued the analysis by using the same method as applied previously, this time reducing the scope of data into each specific building function in order to check the significant variables which can influence the damage level. The results are shown in Table 15. It can be seen that inundation depth is always the significant explanatory variable for all functions at level p < 0.001, while the number of floors is the significant explanatory variables for only shared accommodation and transportation/storage facilities. Furthermore, reinforced concrete is found to be significant for shared accommodation (p < 0.001), commercial facilities (p < 0.001), and agricultural facilities (p < 0.05). Steel is found to be significant for residential houses (p < 0.01), shared accommodation (p < 0.001), and commercial facilities (p < 0.01); finally masonry is found to be significant only for public facilities (p < 0.01). The regression coefficients, the Pseudo-R-squares, and the accuracy results from the cross-tabulating method are shown in Table 15.

Table 15 Explanatory variables associated with the damage level based on ordinal regression with the complementary log–log link for specific function of building

6 Discussion and conclusion

6.1 Discussion

In line with previous studies (Koshimura et al. 2009b; Matsutomi and Harada 2010; Reese et al. 2007, 2011; Ruangrassamee et al. 2006; Shuto 1993; Suppasri et al. 2012a, c, 2013; Valencia et al. 2011), our model includes and ascertains the inundation depth as one of the significant explanatory variables, together with the structural material (reinforced concrete and steel). The function of buildings (shared accommodation, commercial facility, and transportation/storage facility) is also found to be of importance.

Although the number of floors is found not to be one of the significant explanatory variables when considering the entire dataset, it is found to be significant for wooden and steel buildings (when the data are categorized by structural material) and for shared accommodation and transportation/storage facilities (when the data are categorized by building function) (see Sect. 5.5).

The significance of the number of floors for steel and wood buildings only in relation to their damage state is likely to be explained by the difference in wall resistance to tsunami loads. Referring to Table 3, the description of damage (particularly for high damage levels) is largely based on the amount of damage to walls, proportionally to the size of the structure (e.g., “more than half of wall density” for level 5): In the case of a reinforced concrete or masonry building, walls are made of reinforced concrete/brick, whereas the walls of wood and steel buildings are typically made of weak materials such as ply wood. In addition, wood and steel buildings typically have less than three stories, whereas the range of heights for RC buildings is much broader (up to 14 stories) (see Table 16). This means that for a given inundation depth, the walls of a reinforced concrete/masonry building will likely resist well the hydrostatic and hydrodynamic wave loads, regardless of the number of floors, so the damage level will appear not to be strongly dependent on this variable. On the other hand, under tsunami loading, the walls of wooden and steel buildings will fail very easily, causing proportionally more damage as the flow depth increases and reaches higher floors.

Table 16 Distribution of number of floors of the building in each structural material

Similarly, the significance of the number of floors for shared accommodation and transportation/storage in relation to their damage state is likely to be a consequence of their dominant structural material. Indeed, we can see that 87 % of shared accommodations and 84 % of transportation/storage facilities are made of wood and steel (shared accommodation: wood 82 % and steel 5 %; transportation/storage facilities: wood 31 % and steel 53 %), which would cause the walls of such structures to be more vulnerable to tsunami forces, against only 50 % (wood 31 % and steel 19 %) and 59 % (wood 30 % and steel 29 %) for example for public and agricultural facilities, respectively (see Fig. 3). It should be noted that residential houses, however, primarily made of wood (95 %), have not resulted in a statistically significant influence of the number of floors. This is probably due to the extremely large counts of 2-story buildings for this function (4 times the number of single story houses, with an insignificant number of buildings higher than 3 stories) in comparison with shared accommodation and transportation/storage facilities which display a greater spread across the range of heights (see Fig. 4). In other words, a variable which is virtually constant (effectively only takes one value at number of floors = 2), will not appear as significant whereas a greater spread will allow for the effect of this variable to be more apparent, which is the case for these specific building functions.

Fig. 3
figure 3

Structural materials for each function of building

Fig. 4
figure 4

Histograms for each function of building

The cross-tabulation results highlight an interesting issue about the classification standard of the buildings damaged by the tsunami. According to the present findings, the model estimated 60.8 % of the actual damage-level-4 buildings as damage-level-5 buildings, and 34 % as being at damage level 3—which put together is almost the totality of the amount of damage level 4 observations. A closer examination of the definition of this damage level in Table 3 reveals that these two levels are likely to have many similar characteristics: Damage level 4: “heavy damages to several walls and some columns”; Damage level 5: “destructive damage to walls (more than 50 % of wall density) and several columns (bend or destroyed).” Similarly, “Possible to be use after a complete reparation and retrofitting” (damage level 4) can easily be seen as “Possible to be use after major reparations” (damage level 3). Therefore, it is likely that survey teams may have misclassified a lot of buildings being at damage level 4 as having reached damage level 5, or under-estimated the damage to being at level 3. In light of these observations, it is suggested that the damage level classification may need to be reconsidered to avoid potential judgment errors in future surveys. These levels may need to be combined, redefined, or described in more details to highlight their differences.

6.2 Conclusions

This study presented the analysis of the detailed damage data of the buildings impacted by the 2011 tsunami in Ishinomaki by applying ordinal regression to generate a model relating all available predictor variables to the damage level. The accuracy of the results was evaluated by cross-tabulation. This is the first attempt in applying this statistical perspective to buildings damaged by tsunami which combined all significant parameters in one equation. Inundation depth, function of building (shared accommodation, commercial facility, and transportation/storage facility), and structural material (reinforced concrete and steel) have been found to be significant exploratory variables that can influence the damage level of the buildings.

In addition, as mentioned (see Sect. 6.1), the significance of the number of floors is likely to be explained by the difference in wall resistance to tsunami loads. When the data are categorized by structural material, we found that the number of floors is found to be another significant variable for wooded and steel buildings, whose wall resistance is weaker to tsunami loads than reinforced concrete and masonry buildings. Meanwhile, when the data are categorized by building function, the number of floors is also significant for shared accommodation and transportation/storage facilities whose structural material was indeed mostly made by wood and steel.

The results of this study can contribute to both academic research and industrial or governmental practice. In the context of tsunami research, a new approach has been applied to identify and rank influential variables on the process observed, and a new model for tsunami damage prediction based on the ordinal regression methodology and the extensive database from the 2011 Japan tsunami is proposed. In the field, the prediction model can be applied to predict the damage level when the input variables are known, and its outputs compared with state-of-the-art predictions. The government, urban and disaster planners, engineers, architects, insurance companies, and construction businesses may also take into account such results in the decision making process.

However, it is important to understand the limitations of the aforementioned results to understand their applicability and highlight avenues for improvement. First, this study used only the available data, so the predictive capability of the model to future events and different datasets needs to be improved and evaluated through further analysis. In addition, the observed damage might be influenced by other variables (such as tsunami flow velocity and distance from the shoreline) or other external variables (e.g., floating debris, barrier, and environment). If such data becomes available, it will be possible to include those variables into future analyses and assess their importance, as well as improving the accuracy of estimations. Also, even though we chose the reference variables by following the suggestion of Katz (2011), there may be limitation inherent to the choices we made regarding the reference variables (i.e., wood and residential house) whose any variability were not be captured by the model. Finally, despite the large number of points which is considered sufficient to perform the analysis, all buildings were surveyed in one city. Some characteristic setting of the area of study may not be generalized to other areas, and therefore, it is necessary to test this approach with other affected areas and compare results.