1 Introduction

Earthquakes are major natural disasters that result in heavy casualties and property losses (Alexander 1993). Throughout history, earthquakes have been responsible for a large number of deaths and injuries (Corinne et al. 2000). According to international data, earthquakes remain the least predictable and most lethal of natural disasters (Guha-Sapir and van Panhuis 2004). The number and impact of disasters seems to be increasing in the last decades, and their consequences have to be managed in the best possible way (Ortuño 2013).

China is located between two of the world’s largest seismic zones: the circum-Pacific seismic belt and the Eurasian seismic zone. Under the effect of the extrusion of the Pacific, Indian, and Philippine Sea plates, earthquakes frequently occur near the rupture zone (Tian and Yao 2015). According to Munich reinsurance (2013) company statistics, in the past 20 years, natural hazards had resulted in 13 million deaths and $404 billion lost. Recent earthquakes, including the one in Wenchuan in 2008 ($136 billion property loss, 69,227 deaths), the one in Yushu Qinghai in 2009 ($48.3 million property loss, 2698 deaths), the one in Yingjiang, Yunnan in 2011 ($0.28 billion, 25 deaths), and the one in Yaan in 2013 ($28.23 billion 196 deaths), resulted in loss of lives and property in China. Analysis has shown that earthquakes have a greater impact in China. The costs of damage, the loss of productivity and life, are difficult to estimate since they are compounded by the rapid economic growth that is occurring (Song 2003). Many examples illustrate the undesirable consequences of strong earthquakes, above all in terms of the loss of human life, despite the difficulty in obtaining accurate death toll figures. In the twentieth century, half of the deaths caused by earthquakes were in China, and half of the deaths caused by natural disasters in China since 1950 were due to earthquakes.

The estimation of seismic mortality has drawn much attention from the scientific community because of the severe loss of life in earthquake disasters. Estimation of possible losses, such as seismic mortality, is an important aspect of risk assessment. Since the scales of earthquakes can be large and unexpected, efficient and precise death toll prediction becomes extremely important. When an earthquake strikes, a proper estimation of possible mortality is necessary to understand the magnitude of expected seismic losses and to facilitate decision making in relief operations (Wu et al. 2015).

Over the past 20 years, there have been important advances in seismic science and engineering relating to the probabilistic estimation of seismic hazards and early warning, but there are uncertainties regarding source initiation, rupture phenomena, and the accuracy of the timing and magnitude of earthquake occurrence that seem difficult to overcome (Kossobokov 2013; Sheu and Cheng 2010). By searching databases, only several publications were found dealing with the research on the dynamic mortality of earthquakes. To effectively estimate mortality in the event of an earthquake, a number of models at local or regional levels have been established. Several papers have put focus on exploring the death prediction based on the historical database (Fu and Chen 2009; Guo and Zhou 2011; Wu 2012). Chan et al. (2003) found that the immediate effects of the Taiwan earthquake included a higher proportion of female and elderly seismic deaths and an association between seismic death rates and earthquake damages in the disaster area. However, these models set basic factors which affect the consequences of earthquakes as parameters to build models without considering the population density as a key factor, especially for China. In China, the population density varies greatly from region to region. It should be noted that the severity of an earthquake in some affected areas may differ greatly from other areas. And other papers are based on the vulnerability of various categories of structure and facilities in the region concerned (Okada 1992; Coburn et al. 1992). However, these models require a detailed inventory database of structures and facilities, which is not always readily available in many regions of the world (Jaiswal et al. 2011). In China, the gathering of relevant complex structural parameters is not easy for two reasons (Wu et al. 2015). (1) The buildings in urban and rural areas differ greatly in structure and materials due to the uneven socioeconomic development of different regions in China. (2) Data on buildings cannot be updated in a time-frame suitable for modeling since the buildings are reconstructed frequently with the rapid economic growth in China.

This literature review as mentioned above reveals that, most current research has a general quantitative idea on mortality prediction, but not much emphasized the population density which is not suitable for the practical cases in China. Differences in the fitting methods used and the accuracy achieved pose a great challenge (Wu et al. 2015). And among the limited quantitative studies, some are in need of improvement or development. The lack of scientific application of quantitative mathematical models is a major problem because it hampers the development and testing of mathematical models that could become decision support tools to expedite response (Holguín-Veras and Jaller 2012). The overall goal of this paper is to help fill this gap with the analyses of prediction on emergency materials.

This present paper employs average population density to propose a forecasting methodology aiming at carrying out the mortality prediction timely and accurately to facilitate humanitarian relief logistics for different responding parties. And this research aims to answer the following research questions: (1) What will influence the death rates among Chinese earthquakes? (2) How to perform the deaths prediction of Chinese earthquakes more precisely and scientifically? First, the paper selects most similar cases as the target case in a given case set by case-based reasoning. Then, average population density is proposed to build the models to calculate the final death rate. Finally, 18 large-scale earthquake cases occurred throughout China are employed to verify the validation of the forecasting method. And a specific numerical case is given in detail to describe the proposed method.

This paper is organized as follows. In Sect. 1, the background and relevant literature is introduced. Section 2 introduces the methodological framework of the proposed approach, including the final deaths prediction and related mathematical models. Section 3 depicts a numerical study and the corresponding numerical results aiming at a real earthquake case to gain important insights. Finally, concluding remarks and directions for future research are made in Sect. 4. The general research model is shown in Fig. 1, and more detailed introduction will be elaborated in Sect. 2.

Fig. 1
figure 1

Research Model

2 Experimental methods and numerical model

One of challenges to efficiently execute disaster relief operations is to obtain sufficient and accurate information from the disaster-affected area. Therefore, it becomes extreme difficult to correctly estimate useful information to facilitate disaster response (Lin et al. 2001).

This section aims at forecasting the final death toll of the target case associated with the selected cases which belong to the same category using three computational procedures: (1) selecting the most similar cases as the target case in a given case set by cluster analysis, (2) proposing average population density to build the models for calculating the final death rate, and (3) verifying the validity of the method by forecasting the final number of deaths in 18 earthquake cases, and forecasting the final death toll of the 18th earthquake case as a numerical case.

2.1 Methods

In this section, we apply case-based reasoning (CBR) to select the most similar cases as the target case by retrieving previously solved problems and their solutions from a knowledge source of cases (see Fig. 1). The CBR’s retrieval mechanism identifies similar problems from the past, in the expectation that the previous solutions to the problem will be useful for new problems because they will have similar solutions. Case-based reasoning is a problem solving technique that is attracting increased attention (Clancey 1985). Case-based reasoning systems retrieve and reuse solutions for previously solved problems that have been encountered and remembered as cases. In some domains, particularly where the problem solving is a classification task, the retrieved solution can be reused directly. Respective of this, it seems scientific and practical to use the previous experience in the base case for reference after the earthquake took place without any complete and actual data. The proposed methodology was applied on the 18 earthquake cases to predict the death rates and obtained the mean absolute percent error (MAPE) which all can be acceptable. Then a specific earthquake case (the Wenchuan earthquake, China) was presented as a numerical case explained in details in the paper. Therefore, before using the Wenchuan earthquake case, the proposed methodology has been validated by all the decades of data (see Table 3).

The main data resource of the present study regarding deaths in Chinese earthquakes came from the National Geophysical Data Center and the National Earthquake Data Center of China. We collected 18 Chinese earthquakes from 1948 to 2014, and then listed them including year, month, date, time, latitude, longitude, location, and deaths. Details of the collected earthquakes are given in Table 1. This paper proposes a forecasting methodology aiming to predict the final death toll based on the historic Chinese earthquake cases recorded in the database by CBR (case-based reasoning). The National Geophysical Data Center (NGDC) has been working closely with contributors of scientific data to prepare documented, reliable datasets, and its data holdings currently contain more than 300 digital and analog databases, some of which are very large. As technology advances, so does the search for more efficient ways of preserving these data. The NGDC database is regularly inspected for statistical analysis of fatalities and injuries. The data contain an extensive amount of information on natural disasters.

Table 1 Case set of Chinese earthquakes from 1948 to 2014

Ground-shake parameters mainly include the seismic magnitude and intensity, distance from the epicenter, and focal depth. Magnitude and intensity are measures of an earthquake’s strength which are always taken as main parameters because it has been demonstrated to have a strong association with the death rate (Liu et al. 2011). The connection between epicenter and mortality has a clear relationship. Mortality rate increases with proximity to the epicenter (Liang 2000). The severity of an earthquake in most affected areas is not as high as that in the epicentral area, and damage is therefore less (Wu et al. 2015). Focal depth is another important factor affecting the impact of earthquakes. However, it should be noted that the severity of an earthquake in most affected areas is various since the population density varies greatly from region to region. The paper thus considers such differences and retraces the death rate according to the distribution of population density, taking the “average population density” as a key parameter which can be defined as the ratio of the population density of related quake-affected areas to the number of areas. The equation is

$$\bar{p} = \frac{{p_{AA - 1} + p_{AA - 2} + \cdots + p_{AA - q} }}{q},$$
(1)

where \(\bar{p}\) is the average population density of the whole affected area, and \(p_{AA - q}\) is the population density of the relevant affected area \(q.\)

The paper suggests that earthquakes which have similar properties including Richter Scale, distance from the epicenter, focal depth, and population density have similar death rates. When earthquake strikes, if we can approximately know the average population density \((\bar{p}_{i} )\) and the final death toll \((d_{i} )\) of the known earthquake which is selected by similarities and the average population density of the target (unknown) earthquake \((P_{T} )\), the total death toll \((d_{T} )\) can be estimated as follows:

$$\frac{{d_{i} }}{{\bar{p}_{i} \cdot {\rm \uppi} \cdot R_{i}^{2} }} = \frac{{d_{T} }}{{\bar{p}_{T} \cdot {\rm \uppi} \cdot R_{T}^{2} }}.$$
(2)

2.2 Numerical model

In this section, the property values are month, time, longitude, latitude, magnitude, and population density when the earthquake occurs, where the model parameters are carefully set by information and suggestions from the official statistics reports and other researchers. Original data need to be normalized in order to allow the application of general clustering models due to the great disparity in dimension and magnitude of properties. This section aims to forecast the final mortality by selecting the most similar case which is considered to have similar death rate as the target case. By proposing the average population density to calculate the final mortality estimation required, the mathematical formulation of the problem, the specific parameters, and assumptions are given below:

  • \(T\) a specific day apart from the earthquake

  • \(j\) a specific property value of the earthquake

  • \(i\) a specific case in the case set

  • \(m\) number of property values of a specific earthquake

  • \(n\) number of selected cases

  • \(N(T)\) the deaths of the earthquake-affected area at time \(T\)

  • \(\bar{p}_{i}\) average population density of case \(i\)

  • \(\bar{p}_{T}\) average population density of the target case

  • \(q\) number of affected areas of a certain earthquake

  • \(d_{i}\) the final deaths of case \(i\) in the case set

  • \(d_{T}\) the final deaths of target case \(T\)

  • Step 1. Cluster analysis based on CBR

    1. I.

      Formation of a data matrix

      Suppose that there are n earthquake cases in a given case set which is proposed as \(C = \left\{ {C_{1} ,C_{2} , \cdots ,C_{n} } \right\}\); \(X_{ij}\) and \(X_{tj}\) denote the jth property value of the \(i{\text{th}}\) case and the \(j{\text{th}}\) property value of the target case, respectively. We can then have a matrix of the property values given by

      $$X = \left| {\begin{array}{*{20}l} {X_{11} } \hfill & {X_{12} } \hfill & \cdots \hfill & {X_{1m} } \hfill \\ {X_{21} } \hfill & {X_{22} } \hfill & \cdots \hfill & {X_{2m} } \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill \\ {X_{n1} } \hfill & {X_{n2} } \hfill & \cdots \hfill & {X_{nm} } \hfill \\ \end{array} } \right|.$$
      (3)
    2. II.

      Normalization of the property indexes

      One of the most common reasons people perform a transformation is in an attempt to give a variable a normal distribution (DeCoster 2001). For each property \(j\), the normalized scaled value \(X^*_{i}\) has been calculated by means of

      $$X_{i}^{*} = \left\{ {\begin{array}{*{20}l} 0 \hfill; {X_{i} \le X_{\hbox{min} i} } \hfill \\ {\frac{{X_{i} - X_{\hbox{min} i} }}{{X_{\hbox{max} i} - X_{\hbox{min} i} }}} \hfill; {X_{\hbox{min} i} < X_{i} < X_{\hbox{max} i} } \hfill \\ 1 \hfill ; {X_{\hbox{max} i} \ge X_{i} } \hfill \\ \end{array} \quad {\text{where}}\quad \left\{ {\begin{array}{*{20}c} {X_{\hbox{max} i} = \hbox{max} \{ X_{1} ,X_{2} , \ldots ,X_{n} \} } \\ {X_{\hbox{mini} } = \hbox{min} \{ X_{1} ,X_{2}, \ldots ,X_{n} \} } \\ \end{array} } \right\}.} \right.$$
      (4)
    3. III.

      Calculation of the Euclidean distance between different cases and target case is

      $$d_{(i,t)} = \sqrt {\left| {x_{i1} - x_{t1} } \right|^{2} + \left| {x_{i2} - x_{t2} } \right|^{2} + \cdots + \left| {x_{nm} - x_{tm} } \right|^{2} } = \sqrt {\sum\limits_{j = 1}^{m} {(x_{ij}^{*} - x_{tj}^{*} )^{2} } } .$$
      (5)

      Equation (5) donates the Euclidean distance between the base cases and the target case. The most similar case with the target case can be selected according to shortest Euclidean distance.

  • Step 2. Calculate the affected population

    This paper suggests that the selected cases having a similar mortality rate (deaths/total population of the affected area) are the target case based on their similar properties. Therefore, as long as the total population of the affected area of the target case is obtained, calculation of the final deaths of the target earthquake is possible. However, how large the total population of affected area is, depends on the size of the earthquake and the population density within that affected area. Specific steps are shown as follows:

    1. I.

      Estimating the radius of the earthquake based on historic information and calculating earthquake sizes.

      Figure 2 illustrates the new concept proposed by the study which is the average population density. Having established the locations where fatalities are reported for each significant earthquake, circles are drawn with a radius equal to the maximum distance from the epicenter to the furthest point which has deaths. According to the National Bureau of Statistics of the People’s Republic of China, the estimated radius \((R_{i} ,R_{T} )\) of the affected area of the target earthquake and the selected earthquakes can be obtained. Then, earthquake sizes can be calculated (see Eqs. 1 and 2).

      Fig. 2
      figure 2

      Illustration of average population density

    2. II.

      Calculating the average population density of the affected areas.

      Since affected areas of earthquakes usually consist of different adjacent regions, it is necessary to estimate the average population density. Based on the aforementioned information, average population density is proposed and the formulas are put forth based on the equal mortality rate between the selected earthquake and the target earthquake as shown in Eqs. (1) and (2). From these data, it is also possible to calculate the population densities of other cases.

    3. III.

      Calculating the final deaths of affected area

      According to the size, the deaths and the average population density of the seismic area, and the size and average population density of the affected area, the total quake-affected population can be calculated by Eq. (2).

3 Results

We collected Chinese earthquake data from 1948 to 2014 as a case set. Details of the property characteristics are provided in Table 1. The target case is “Wenchuan earthquake” which took place on May 12, 2008 in China (see Fig. 2).

This section provides a numerical study for three purposes: (1) to prove the similar death rate of each category by case-based reasoning based on key factors affecting the impacts of earthquakes, and to verify the validity of the predicted final deaths compared to the actual data as shown in Table 2, (2) to demonstrate the quantitative analytical results of the proposed forecasting models on mortality prediction of the target case (“Wenchuan earthquake, China,” May 12 2008), and (3) to study how to accurately forecast the final death toll with the proposed method by comparing the fitting results as shown in Table 2.

Table 2 Final deaths forecasts based on case-based reasoning

By searching for the list of Chinese earthquakes from the database of National Geophysical Data Center and China Earthquake Data Center, 18 cases returned as shown in Table 1. There are some key factors affecting the impact of earthquakes, such as distance from the epicenter, the Richter magnitude, the focal depth, time of day and time of year, and population density. The goal of this section is to estimate the final death rate and deaths of the target earthquake (“Wenchuan earthquake”) based on key factors as mentioned above when the crisis occurred.

On Monday May 12, 2008, an earthquake with a magnitude of 7.9 occurred in northwestern Sichuan Province of China. It was the most devastating earthquake in China in more than three decades. At least 69,227 people were killed, 374,177 injured, and 18,392 missing and presumed dead in the Chengdu-Lixian-Guangyuan area. Specific parameters are shown in Table 2.

According to these parameters, cluster analysis of the case set is carried out and is presented in Fig. 3. The cluster results show that the earthquake cases collected can be categorized into five basic types according to the Euclidean distance of four key factors, where each category which includes many cases has a similar or same death rate due to the similarities of properties and impacts. The next step aims to test the model’s performance particularly in the dynamic final mortality prediction. Based on the similar death rate of each category mentioned above, this study uses the proposed Eqs. (1) and (2) to estimate the final deaths associated with each affected area, and then compares the forecasting results to the historical database.

Fig. 3
figure 3

Cluster analysis diagram by Euclidean distance

This section aims at testing the model’s performance particularly in dynamic predicted deaths based on the CBR and proposed average population density. To assess the model’s validity, the results are also examined using the measures of mean absolute percentage error (MAPE). Table 3 presents the comparison results and mean absolute percentage error with respect to \(d(i)\).

Table 3 Parameters of target case “Wenchuan earthquake” (Data are from USGS)

4 Conclusions and future work

4.1 Conclusions

This paper forecasts the final death toll by suggesting that the same category of earthquakes has the same mortality rates due to the similar attributes, which puts forward new insights for future research on mortality forecasting. To validate the forecasting results, data from 18 large-scale earthquakes occurred in China are used. And also a particular case of Wenchuan earthquake, China was employed to present the computed detail by calculating the final death toll. The strength of this paper is that it provides scientific methods with overall forecast errors lower than 20 %, and opens the door for conducting final death forecasts with a qualitative and quantitative approach.

These findings suggest the major contributions of this paper summarized as follows: First, empirical models that use historical death data are quite important for providing the path to analyze the death tolls for earthquakes since the large-scale earthquakes occurred in China can provide the reference and direction for the future disasters. Second, earthquake cases which have similar parameters such as seismic magnitude and intensity, distance from the epicenter, focal depth, and population density have similar mortality rate. Third, the research highlighted the importance of population density which is considered as a key parameter, particularly in China where population density varies greatly from region to region.

Overall, we suggest the following: First, the database of historical earthquakes in China should be gradually and systematically constructed which would be beneficial for the extensive study on future earthquakes. In reality, the data are not always available and open in China. Thus, to borrow the experience from the past, we should build a systematically and complete database for further research. For better prediction and response, historical data, such as the accurate death tolls, casualties, and missing people could be provided by the government and other institutions to help achieve an open, complete, and massive earthquake database. Second, affected area with similar parameters such as seismic magnitude and intensity, distance from the epicenter, focal depth, and population density should be classified into groups. In China, some places are earthquake-prone area, which have some characteristics in common, such as Sichuan Province and Yunnan Province; thus, the policy makers in the same group could cooperate together to work out a feasible response plan and borrow experience from each other. Third, much focus should be put on the high populated affected area in terms of disaster preparedness and response of local authorities. The death toll from earthquakes highly depends on the population density, especially in a large-population country like China. The government should always consider that the role of population density plays in terms of vulnerability, mortality rate, and loss caused by the earthquake disasters. Therefore, aiming at the high population density regions, the local authorities should well prepare for the unknown disasters including training and education for the residents, necessary emergency infrastructure for victims, and effective emergency and response plan.

4.2 Extension of the work

Further extensive studies are needed and, thus, some recommendations for future research are given as follows. First, this research examines the final death tolls of Chinese earthquakes. Future studies can extend the study by examining earthquakes that are not located in China to determine whether our findings can be generalized into different contexts. Second, this paper estimates the seismic mortality of Chinese earthquakes without taking into account the casualty rate since the accurate data of injured are not easy to obtain. It would be of great interests to focus on injury prediction in the future work.