1 Introduction

On December 26, 2004, a 9.3-magnitude earthquake and subsequent tsunamis in the Indian Ocean stimulated interest in disaster relief and demonstrated the susceptibility of coastal areas with dispersed populations, which required more efficient humanitarian relief efforts. In the south of Thailand, one of the most affected areas, the delivery of humanitarian aid was stated as a major problem in the aftermath of the tsunami (Banomyong et al. 2009). A key lesson learnt from this 2004 Thailand disaster was that the tsunami-devastated urban areas, with their unplanned evacuation points, and perhaps most importantly, with the failure to estimate the affected population after the onset of the disaster, led to unknown demand. In turn, this led to a deficiency in optimised-routing decisions, resulting in insufficient, misdirected, and delayed relief logistics operations. This poor preparedness in disaster management stimulated both academics and practitioners to improve the performance in the case of a future tsunami occurrence.

As 80 per cent of disaster relief involves logistics activities, humanitarian logistics must be acknowledged as a crucial factor in disaster relief operations (Kovács and Spens 2007; Van Wassenhove 2006). In the humanitarian field, allocating supply items to the people affected by disasters can be a matter of life and death, and it is significantly different from those served to customers in the business environment. Therefore, basic survival needs must be systematically delivered as quickly as possible once a disaster occurs. As this can be a large and urgent need, applying optimisation techniques for vehicle routing to facilitate humanitarian logistics is critical for reaching survivors. Hence, due to the criticism of the tsunami response in Thailand, there is a compelling need to improve humanitarian relief logistics in tsunami-prone areas, and the best way to achieve this in the future is to be better prepared for relief routing and scheduling.

However, these humanitarian logistics involve high uncertainties in many aspects such as the locations and affected numbers. Decision makers, therefore, face some specific challenges, especially in obtaining victim figures, which are directly proportional to demand in the delivery stage. Many scholars agree that good knowledge of demographic data can contribute to good planning (e.g. Hirschman 1981; Keyfitz 1993; Koedam 2012; Guha-Sapir & Hoyois 2015). Nevertheless, disasters have no regard for the administrative or political boundaries over which official population data are collected (Wood 1994). As a result, assessing the population in affected areas is difficult, as population data are mostly not tabulated for small areas such as census blocks, particularly in developing countries. This leads to difficulty in approximating the affected population, on which humanitarian logistics heavily depend. However, population data alone are not sufficient to inform effective humanitarian responses: geographical data are also needed, to give context and to assist visualisation of the disaster situation (National Research Council 2007). Thus, the integration of population estimation methods into Geographical Information Systems (GIS), along with flood hazard map data, can be developed into a useful tool for guiding disaster relief activities.

With regard to tsunami disasters, there are numerous organisations researching and developing approaches using GIS to model tsunami inundation worldwide. However, it appears in the scientific literature that there is a limited amount of research on applying GIS to estimate the affected numbers in tsunami-susceptible areas (Takashima et al. 2005; Doocy et al. 2007; Khornarugin et al. 2010; Geiss et al. 2017; Patel et al. 2017). This paper therefore aims to develop a GIS-based application to model a tsunami prone-area using ArcGIS software. Areal interpolation techniques are applied to produce a range of affected numbers in each evacuation shelter location within areas affected by tsunami inundation.

A major advantage of the tool presented here in this paper is that specialist GIS skills are not needed to get ArcGIS to carry out the complex processing used by the application. The ArcGIS graphical user interface (GUI) is designed for very intuitive use, with the user only inputting a set of data and parameters. A Python implementation of GIS analytical operations is used to find solutions through the data entries. The results in each target zone show the coordinates of each location for gathering evacuees, with a range of the minimum and maximum numbers affected by the tsunami inundation. This innovative tool straightforwardly produces not only numerical solutions for the decision maker, but also various alternative visualisation maps. A case study in Patong, a town of Phuket, effectively illustrates the application of a GIS-based approach. The outcomes can be used as a key decision-making factor in planning and managing humanitarian relief logistics in both the preparedness and response phases.

The rest of this paper is organised as follows: Sect. 2 provides some background to population estimation techniques. Section 3 presents the study area and dataset used in this work. Section 4 describes the methodology of estimating the affected population in the study area. Section 5 presents the results of computational study for a set of proposed scenarios. Section 5 is a discussion of the main findings with recommendations for further research, followed by the conclusion. This research has been adapted and updated from the PhD doctoral thesis of one of the authors (Jitt-Aer 2018).

2 Background

More efforts to improve disaster relief are required from humanitarian organisations following the major disasters that have occurred across the world, e.g. the earthquakes and associated tsunamis affecting the Indian Ocean in 2004 and Japan in 2011, as well as the Haiti earthquake in 2010 (Leiras et al. 2014). As a result, risk and vulnerability models for natural disasters have received much attention in the humanitarian sector. When a disaster occurs, estimations of the affected number of people are needed for the humanitarian response, especially the relief logistics (National Research Council 2007; Jordan et al. 2012).

Population estimation methods fall into two groups: areal interpolation and statistical modelling. Areal interpolation is the transformation of known values in a set of source polygons to a set of target polygons in which the values of the variables are estimated (Hawley and Moellering 2005). Specifically, this approach describes the interpolating of census population data in the source zone to obtain a population estimation for the target zone. Wu et al. (2005) separated areal interpolation methods into two categories: areal interpolation with and without ancillary information. Areal interpolation methods without ancillary information aggregate population data from a source zone for a population estimation for a target zone without any assisting data sources. Moreover, areal interpolation without ancillary information is further divided into point-based and area-based techniques (Lam 1983). Point-based interpolation refers to a process of estimating unknown values that are distributed around a centroid point with a known value. In the population context, a centroid point represents the concentration of a known population size of each source zone in which the population is uniformly distributed. In contrast, an area-based approach uses the entire volume of the source zone instead of a centroid point. A source zone and a target zone are overlaid to obtain a proportion that is used as a weight to estimate the population in the target zone. However, an area-weighting method is stated as an over-simplified assumption that the population in each source zone is uniformly distributed, which is rarely the case (Kim and Yao 2010). This weakness was improved by pycnophylactic interpolation, which was developed by Tobler (1979). A smooth density algorithm is used in Tobler's (1979) method to blend the boundaries between the zones, while having volume preservation for each zone (Kim and Yao 2010; Wu et al. 2005). The disadvantage of the pycnophylactic interpolation method is that the population distribution can be distorted due to a lack of assisting information (Kim and Yao 2010). Areal interpolation with ancillary information, known as the dasymetric-mapping method, refers to a population estimation for which census population data are combined with other population-relevant variables, such as buildings, land use, and transportation networks that can be used to suggest the most-likely population distribution (Kim and Yao 2010). With respect to a statistical modelling approach, in contrast to areal interpolation methods, this approach employs statistical methods for estimating the total population size in an area, to study the relationship between population and socio-economic variables, such as urban areas, land use, and dwelling units (Wu et al. 2005). For example, Lo and Wrech (1977) and Lo (2002) used this approach to study the relationship between population and urban areas to estimate the population size of many cities in the world. Lo and Chan (1980) and Lo (1989) estimated the total population in cities by observing dwelling units from aereal photographs. However, all studies using this approach are less accurate than those using areal interpolation techniques, particularly in small-area population estimation (Wu et al. 2005).

As a result, areal interpolation methods have been widely used and have received more attention in population-estimation research. For instance, Tobler (1979), Martin (1996) and Rase (2001) studied and applied spatial interpolation without ancillary information techniques to obtain aggregate counts of population, as well as population densities, for many areas across the world. However, this approach encounters some significant problems, and the worst scenario is that the population estimation for the target zones does not give a valid overall sum of the source units (Wu et al. 2005). For research using the dasymetric-mapping method, Holt et al. (2004), Mennis (2003), Eicher and Brewer (2001) and Yuan et al. (1997) classified land use by transferring census population data in source units for redistributed populations in the target zones based on the types of land use. Due to the support of other data sources for interpolation, Wu et al. (2005) insisted that the methods using ancillary information usually produce more accurate results than those using only census population data. For example, Schmid Neset et al. (2008) used transportation networks as the primary indicators of population, because roads and streets play a vital role in human settlement. More specifically, people actually tend to live and travel around street networks through their environments rather than in random spaces. Reibel and Bufalino (2005) examined the street-weighted interpolation, which is the dasymetric-mapping, and the area-based technique for population estimation in the Los Angeles area. The street-weighted method shows better performance in terms of error reduction and ease of use compared to the area-weighted approach. However, Kim and Yao (2010) argued that the accuracy of population estimation can be improved by integrating the dasymetric-mapping method with area-based interpolation. In this sense, it would be more meaningful to develop a combination of interpolation techniques as suggestive of the estimated population in a specific area.

There are relatively few studies on population estimation with regard to disaster management. GIS and remote sensing technologies have been increasingly used for assessing vulnerability models, as well as estimating the affected numbers for many types of natural hazards. For example, Albert et al. (2012) studied a wide range of spatial techniques for estimating the population affected by disasters and came up with the conclusion that the best choice was the Population Explorer. This is a web-based application that presents the population data for the world with easy-to-use mapping and visualising tools. The Population Explorer has been used as a guide for initiating aid deliveries by using LandScan global population distribution data (Dobson et al. 2000) to estimate the local population of disaster-affected areas for users (Jordan et al. 2012). Although the Population Explorer covers global population data with 1-km resolution, it has some shortcomings. For example, using Population Explorer may be costly, as it is for commercial use only; furthermore, the data are updated annually and thus within-year population changes might not be taken into account. Recently, Stevens et al. (2015) have produced the WorldPop annual population datasets, with better (100-m) resolution, although still with only annual updates.

Balk et al. (2005) studied the estimation of coastal populations exposed to the 2004 tsunami by creating buffers of 1 and 2 km from the coast lines as vulnerability areas and using data from the Global Rural Urban Mapping Project (GRUMP: Tobler et al. 1995). Furthermore, Gorokhovich and Doocy (2012) studied geographic vulnerability models for Typhoon Ketsana by using GIS coupled with pre-disaster population data, to estimate the affected population. Cyclone susceptible areas were visualised by combining data from areas prone to storm surge flooding. The estimation of the affected population was produced by overlaying the maps of flood hazard zones with GRUMP population data. The key variables in their models encompass terrain features, population density, and storm surge extent. However, their models produced estimates of the affected population in administrative boundaries in terms of provincial areas, and therefore, subareas at district or community level were not considered.

In related studies, Ahola et al. (2007) used a spatio-temporal population model to support disaster damage assessment and risk analysis for decision makers. Doocy et al. (2007) assessed tsunami mortality using GIS-based vulnerability models and demographic methods from surveys of tsunami-displaced populations; while Ranjbar et al. (2017) also used a GIS-based approach for estimating earthquake casualties, based on rapid mapping of damaged buildings. Zeng et al. (2012) mapped vulnerability to natural hazards at county-scale using detailed satellite imagery, and Aubrecht et al. (2013) used multi-level geospatial modelling of human exposure patterns and vulnerability indicators.

The European Space Agency has been providing freely-available satellite imagery with 10 m-pixels since 2014, via the Sentinel-2 sensor (visible and infra-red wavelengths), and since 2015 with the Sentinel-1 radar sensor: both datasets are now widely used for mapping and monitoring urban land cover (e.g. Lefebvre et al. 2016; Pesaresi et al. 2016; Koshimura et al. 2017). The Sentinel-2 imagery can also be used to map nearshore bathymetry and coastlines with a tsunami wave run-up hazard (Teeuw & Leidig 2019).

The development of cloud-based computing and search engines, notably Google Earth Engine, which can carry out time series analysis on some 40 years of archived Landsat imagery (30 m pixels), has enabled new approaches to population mapping, based on mapping changes in urban land cover (Patel et al. 2014; Goldblatt et al. 2016; Geiss et al. 2017; Schug et al. 2021). These Google Earth Engine applications with archive satellite imagery have greatly improved the detailed mapping of urban population distribution. However, with repeat visits of Sentinel-1, Sentinel-2 and Landsat taking between 6 and 16 days, there are practical limitations for applications aiming to inform rapid post-disaster humanitarian logistics, where information on the distribution of affected populations is needs within a day or two of the disaster event. Fortunately, since 2018, PlanetScope satellites have been able to provide daily imagery (visible and near infra-red, with 3 m pixels: https://www.planet.com/products/planet-imagery/), which enables rapid post-disaster damage mapping at a district scale of operations, e.g. for detecting landslide-disrupted roads or washed-away bridges.

Another recent development has been the widespread use of GPS-enabled location-aware mobile devices and social media. Many types of geospatial big data have consequently been produced, such as mobile phone use locations and points-of-interest (POI: e.g. street names, social venues and their locations), which contain information that can be used to estimate population distributions (Guha-Sapir & Hoyois 2015; Jiang et al. 2015; Hu et al. 2016; Zhao et al. 2019). However, in the aftermath of a major disaster, with probable severe disruption of mobile phone networks in the impacted areas, POI-based population distribution estimates are unlikely to be available for many days.

In this paper we present a GIS-based methodology to model tsunami flood risk in a test area, using dasymetric mapping for rapid interpolation and estimation of the affected population distribution. The novelty of this paper lies in the development and integration of a GIS and a population estimation algorithm. Furthermore, the presented methodology is able to provide estimates of affected populations over a range of key parameters, including the Tsunami surge height. This leads to estimates of minimum and maximum affected populations. These novel features could provide help in facilitating humanitarian relief logistics by accurately determining the number of disaster-impacted people in a given district.

3 Study area and dataset

The 2004 tsunami was triggered by a major earthquake near the Sumatra coast of Indonesia. The tsunami hit adjacent coastal areas, destroying villages and infrastructure, and resulting in a large number of injuries and deaths. Six coastal provinces in Thailand were severely affected by the tsunami: Phuket, Phang Nga, Krabi, Ranong, Trang and Satun.

Phuket is a relatively small provincial island, but it is the most densely populated of the provinces affected by the tsunami. Phuket is one of the largest city economies in Thailand; it is divided into three districts, which are subdivided into 17 sub-districts and 103 villages. Patong, a sub-district on the west coast of Phuket, was selected as the study area because it is a town in which a majority of commercial buildings and offices, residential units, and other social and service facilities are located near the shoreline. This area has a range of mountains in the east, trending north/south, as shown in Fig. 1. Patong covers an area of about 19 square kilometres, with a population of 20,628 (Thai Department of Provincial Administration, DOPA, updated in 2016). Patong lies in a tsunami-prone setting, facing the Indian Ocean, which was one of the areas most severely impacted by the tsunami. Figure 1 illustrates the Patong sub-district and a sample of the datasets used in this study.

Fig. 1
figure 1

Topography of the Phuket study area, Thailand

The datasets used in this case study include digital elevation and administrative boundaries, street footprints, demographic data and evacuation shelter locational data. The first three datasets are in the GIS file formats. The digital elevation data in this case study can be downloaded at no cost from the USGS Earth Explorer portal (https://earthexplorer.usgs.gov/) in the form of a digital elevation model (DEM), which consists of a matrix of elevation values over the Earth’s land surface. This study used a product of the Shuttle Radar Topography Mission (SRTM; https://yceo.yale.edu/srtm-shuttle-radar-topography-mission): a DEM with 30 m × 30 m pixels.

The administrative boundaries and street footprints used in this study are in the format of vector polygons and a vector lines, respectively. These GIS shapefiles can be directly downloaded for free from the OpenStreetMap website, which provides the detailed country-level shapefiles for GIS (https://thaimap.osm-tools.org/ or https://data.maptiler.com/downloads/asia/thailand/). The accuracy of the vector road map used in this study was visually verified by overlaying the road map on colour aerial photographs of the study area, which indicates acceptable accuracy (Fig. 2).

Fig. 2
figure 2

Overlay of the road map layer on a local aerial photograph

Demographic data coupled with a digital map for small-area analysis, such as a census block, have not been developed in Thailand. Thus, the most detailed population data available for digital maps are limited to the administrative boundaries of sub-districts. The population data for 2016 are taken from the Department of Provincial Administration (DOPA), Ministry of Interior, Thailand.

The data for the evacuation shelter locations are obtained from the Department of Disaster Prevention and Mitigation (DDPM), Ministry of Interior, Thailand. As a result of the Asian tsunami in 2004, Thai authorities have developed tsunami warning systems as well as evacuation plans including evacuation routes and temporary shelters for evacuees. The evacuation shelter locations are prearranged, utilising local government buildings/areas, such as schools, temples and mosques, and other public spaces which are expected to be safe from tsunami flood inundation. To facilitate geospatial analysis, the coordinates of the evacuation shelters have to be converted into the vector point format of ArcGIS.

All the data used in this case study are assessed using ArcGIS 10.3 software. Thus, the ArcGIS platform is also an essential component for the data preparation as well as further spatial analysis and geoprocessing operations. However, for such processes, an implementation of Python scripting in ArcGIS enables users to obtain results via the Graphic User Interface (GUI) without the need for analytical-operation skills in the GIS environment.

4 Methodology

This method involves estimating the population exposed to a tsunami impact, and therefore, the affected people in this context must first be identified. The description of the affected people would be best defined by the EM-DAT database cited in Menoni and Margottini (2011) as the “people requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, sanitation and medical assistance, including those displaced or evacuated”. This paper thus uses this term for the affected people in all sections. However, the term here is divided into two categories: directly and indirectly affected. The directly affected population includes the people who live within the path of the flooding and receive the direct impact of the tsunami, such as injuries and the destruction of their properties and facilities, and who have essential needs. While indirectly affected people include those living outside the flooding path, the disaster may impact their medical and social services, or any necessary resource for living. Therefore, in this paper, an estimation of the population affected in each area is given as a range of the minimum and maximum affected population. Direct flooding-affected refers to the minimum number, whereas the maximum is the sum of the directly and indirectly affected population.

Figure 3 shows the overall process of estimating the affected population which consists of two main parts; data preparation and the tsunami hazard assessment and estimates of the affected population. This section is twofold. First, the areal interpolation techniques used for population estimation are described and illustrated by a given example. The second part explains the analytical operations for estimating the population affected by tsunami flood inundation in the GIS environment.

Fig. 3
figure 3

The overall processes of estimating the affected population

4.1 Areal interpolation techniques

The simple area-based method was developed in the early age of interpolation studies. This approach is based only on the overlap of the vector polygons of the source and target zones. The method has the advantage that it can be simply implemented in ArcGIS, and no assisting data are required in its process. The function was formulated by Fisher and Langford (1995) and is described as follows:

$$ P_{{\text{t}}} = \mathop \sum \limits_{s = 1}^{S} \frac{{A_{{{\text{ts}}}} P_{{\text{s}}} }}{{A_{{\text{s}}} }} = \mathop \sum \limits_{s = 1}^{S} A_{{{\text{ts}}}} d_{{\text{s}}} $$
(1)

where \(P_{{\text{t}}}\) is the estimated population of target zone t; \(A_{{{\text{ts}}}}\) is the area of overlap between target zone t and source zone s; \(P_{{\text{s}}}\) is the population of source zone s; \(A_{{\text{s}}}\) is the area of source zone s; and \(d_{{\text{s}}}\) is the average population density of source zone s. This simple areal weighting, however, does not imply the actual population spatially distributed in a geographic area. Thus, the use of a rigid concept of the area-based technique may not be sufficient for obtaining good estimations. Renovating and combining its model to other interpolation methods would improve the estimation accuracy, particularly for a small area.

We therefore propose a multi-stage areal interpolation for population estimation using GIS. As the area-based method is criticised in the literature as an oversimplified assumption, an area-based method is applied from a different view in the initial stage. The dasymetric-mapping method using street footprint GIS data is therefore used in the next stages to estimate the disaster-affected population for a smaller area. By doing this, each interpolation stage requires digital maps in the GIS environment. Within ArcGIS, every dataset has a coordinate system and thus all vector GIS data must be in the same coordinate system and map projection, in order to perform integrated analytical operations, such as overlaying vector layers.

The first stage of interpolation requires three GIS geospatial data layers: source-zone boundaries, target-zone boundaries, and satellite Earth Observation (EO) imagery. The purpose of this stage is to estimate the population by subdividing the source zone into smaller spatial units, which is a target zone that has better consistency with the population distribution. In this regard, the population data in the source zone are combined with its remote sensing data to obtain a feasible area that infers the most-likely population distribution, which is the target zone. In other words, the population data at the source zone level are spatially reallocated to predict the population of the target zone based on its geographic characteristics. Thus, the weight in this step is not the proportion of the target and source zone areas, as shown in Eq. (1). Instead, the ratio of the population figures in the target zone and the source zone acts as a weighting factor. This population-weighted ratio is estimated based on the remote sensing and population data held by decision makers. Also, the decision makers’ subjective view could be combined with those datasets to estimate the ratio. As a result, a new equation of areal interpolation in the initial stage can be formulaically described as:

$$ P_{{\text{t}}} = \mathop \sum \limits_{s = 1}^{S} P_{{\text{s}}} r_{{{\text{ts}}}} $$
(2)

where \(P_{{\text{t}}}\) is the estimated population of target zone t; \(P_{{\text{s}}}\) is the population of source zone s; and \(r_{{{\text{ts}}}}\) is the ratio of population between target zone t and source zone s.

Figure 4 shows the study area as a source zone that covers an area of 19.18 km2 with a total population of 20,628. The target zone occupies an area of 7.34 km2. An example in Fig. 4a is the traditional area-based interpolation, which aggregates population data from the source zone to the target zone by computing a weight for the overlap zone between the source and target zones. The intersection zone occupies 38% of the source zone. Thus, the estimated population in the target zone can be calculated by substituting \(A_{{{\text{ts}}}} = 7.34 {\text{km}}^{2} ,\;A_{{\text{s}}} = 19.18 {\text{km}}^{2} , \;{\text{and}}\;P_{{\text{s}}} = 20,628\) in Eq. (1):

$$ P_{{\text{t}}} = \mathop \sum \limits_{s = 1}^{1} \frac{{A_{{{\text{ts}}}} P_{{\text{s}}} }}{{A_{{\text{s}}} }} = \left( {\frac{7.34}{{19.18}} \times 20,628} \right) = 0.38 \times 20,628 = 7,894 $$
(3)
Fig. 4
figure 4

A comparison of the new perspective and traditional area-based interpolation

In contrast, Fig. 4b gives a different perspective of cartographic areal interpolation using the population-weighted technique coupled with remote sensing images to draw the area in which the dwelling units are mostly distributed. According to the remote sensing and population data, the ratio is suggested by a decision maker as 90% \(\left( {r_{{{\text{ts}}}} = 0.9} \right)\), for example. This means that another 10% of the population is scattered in an area outside the target zone, which perhaps is a non-residential area. Therefore, the population in the target zone can be estimated by substituting \(r_{{{\text{ts}}}} = 0.9, \;{\text{and}}\;P_{{\text{s}}} = 20,628 \) in Eq. (2):

$$ P_{{\text{t}}} = \mathop \sum \limits_{s = 1}^{1} P_{{\text{s}}} r_{{{\text{ts}}}} = \left( {20,628 \times 0.9} \right) = 18,565 $$
(4)

The estimation solutions provided by the two methods are significantly different. The estimation in the target area shows a population of 7,894 using the traditional area-weighted technique. On the other hand, the population-weighted method with the assisting data provides an increased population estimate of 18,565 in the target zone. Thus, it is essential to evaluate these results in terms of accuracy, as implementing accurate population estimates in disaster relief can contribute to efficient and effective humanitarian logistics.

In order to evaluate the population estimation provided by the above methods, collecting population counts in the area may consume too much time and cost. Thus, aerial photographs can be used to superimpose the study area for a visual presentation in order to assess the estimate results without any cost. Figure 5 illustrates the overlaying of the study area and its remote sensing image. As can be seen in the aerial photograph, most residential units are located in the target area, whereas the outer area occupies a range of mountains with very few dwellings. This can imply that taking the geographic characteristics of an area into area-based interpolation to identify population weighting can provide a more realistic solution for population distribution.

Fig. 5
figure 5

Population-estimation validation using remote sensing imagery

The second stage of areal interpolation involves using the street-weighted technique. In this stage, three GIS vectors are needed for analytical operations: the source-zone boundaries, the target-zone boundaries, and the street vectors. When overlaying, the street weight for each intersection area is calculated as the ratio of the street length in the overlap zone to the street length in the source zone. The proportion estimate for a given target zone can be calculated according to the following function (Reibel and Bufalino 2005):

$$ W_{{{\text{st}}}} = \frac{{\sum L_{{{\text{st}}}} }}{{\sum L_{{\text{s}}} }} $$
(5)

where \(W_{{{\text{st}}}}\) is the weight for a given intersectional-area fraction defined by its unique pair of source and target zones; \(L_{{{\text{st}}}}\) is the length of the street vector in that intersection zone; and \(L_{{\text{s}}}\) is the length of the street vector in the source zone.

In order to obtain the population estimation in the target zone, the weight for the intersection zone is multiplied by the population counts in the source zone. This results in an estimate of the target-zone population aggregated from the source zone based on street weighting. The concept of street-weighted interpolation is also applied in the next steps to improve the estimate by subdividing the zone into a smaller area.

In this case study, the target zone defined in the first stage will become the source zone when moving into the second stage. The new source zone is then divided into a number of smaller areas depending on a number of evacuation nodes. The neighbourhood of each node, in turn, transforms into a target zone. Figure 6 shows the GIS vector data including the source zone and its subdivided target zones in accordance with the evacuation locations.

Fig. 6
figure 6

The GIS vector data for the street-weighted approach

To illustrate the street-weighted interpolation technique in the second stage, the length of each street vector in the source zone and in each target zone first has to be summed. The summation of the street length in each zone is simply obtained from the attribute data table of the street vector in ArcGIS. Subsequently, each intersection-zone fragment is defined by its source zone and identical target zone pair. For example (see Fig. 6), the intersectional-zone fraction of the length of the street vector in the node 3 neighbourhood and the length of the street vector in the source zone can be calculated by substituting \(L_{{\text{s}}} = 75,982,\; {\text{and}}\;L_{{{\text{st}}}} = 20,135\) in Eq. (5):

$$ W_{{{\text{st}}}} = \frac{20,135}{{75,982}} = 0.265 $$
(6)

Thus, the estimate of the population in this node neighbourhood is the outcome of multiplying \(W_{{{\text{st}}}}\) in Eq. (6) by \(P_{{\text{t}}}\) in Eq. (4):

$$ P_{{\text{t}}} = 0.265 \times 18,565 = 4,919 $$
(7)

The population in an evacuation-node neighbourhood has been estimated using the street-weighted interpolation technique in the second stage. In the next stage, the flooded areas that are generated from the tsunami hazard model with a parameter of a 10-m tsunami surge are used for superimposition over each evacuation-node neighbourhood to find the intersectional-zone between the flooded area and its evacuation neighbourhood. As a result, an area flooded by a tsunami surge in each evacuation zone is drawn within ArcGIS. The flooded area becomes a target zone for its identical evacuation area which has inversely changed to the source zone in the next stage of the street-weighted interpolation.

The last stage is utilised to estimate the population affected by the tsunami flood. This final step also uses the street-weighted interpolation method for the population estimation. The processes of interpolating are again operated as in the second stage using Eq. (5). The ratio of the total length of the street vector in the intersection zone to the total length of the street vector in the source zone is also calculated as a weight factor. The different context is that the source zone is an evacuation-node neighbourhood, while the target zone is an inundated area. Figure 7 shows the overall view of the tsunami flood in the study area and demonstrates the node 3 neighbourhood for guidance in estimating the population that lives in the flooded area. By substituting \(L_{{\text{s}}} = 20,135 \;{\text{and}}\;L_{{{\text{st}}}} = 13,661\) in Eq. (5), the number of people directly affected by the tsunami flooding in the area of evacuation node 3 is estimated in the following steps:

$$ {\text{W}}_{{{\text{st}}}} { = }\frac{{13,661}}{{20,135}}{ = 0}{\text{.68}} $$
(8)
Fig. 7
figure 7

An example of estimating the population affected by tsunami flooding in an evacuation-node neighbourhood using the street-weighted interpolation technique

Hence, the estimate of the population living in the flooded area of location 3 is the result of multiplying \(W_{{{\text{st}}}}\) in Eq. (8) by \(P_{{\text{t}}}\) in Eq. (7):

$$ P_{{\text{t}}} = 0.68 \times 4,919 = 3,344 $$
(9)

The estimate of the population in the target area, \(P_{{\text{t}}}\) in Eq. (9) infers the number of people who are dwelling in a node 3 neighbourhood and directly affected by the 10-m tsunami wave for this example. In this paper, \(P_{{\text{t}}}\) in Eq. (9) is defined as the minimum population affected as these numbers receive the direct impact from the flood. \(P_{{\text{t}}}\) in Eq. (7) refers to the maximum number exposed to the tsunami. In this regard, it is an assumption that people living outside the path of a disaster may experience secondary effects from the incident. The maximum affected, therefore, are the total number of people who live in an evacuation-node neighbourhood, including those directly and indirectly affected.

The final solution for the node 3 example then can be concluded, that is, the estimate of the population affected by the tsunami flood falls into the range of a minimum of 3,344 and a maximum of 4,919 people. This policy therefore may assist a decision maker to avoid either overestimating or underestimating the affected population. This solution gives merely a preliminary estimation of the affected population in the disaster area. Identifying the exact number in the range is nearly impossible in a real situation. However, the decision maker may utilise simulation or mathematical techniques to predict the most likely possible outcome in the range. This depends on a number of decision-making factors, such as the unique characteristics of the area and the historical data of the disaster, such as the location and numbers affected.

4.2 GIS analytical operations

This section describes the integration of a methodology that estimates the population exposed to a tsunami inundation within the GIS environment. ArcGIS was used to build a tsunami risk model of the study area, combining geospatial data on the tsunami hazard, the vulnerable features (in this case, population) and the exposure to the hazard. The model can estimate the population affected by various scenarios of tsunami flooding, with the GIS analytical operations divided into five phases, as shown in Fig. 8.

Fig. 8
figure 8

Phases of the GIS analytical operations

In the first phase of the operations, the spatial analysis of the study area requires population data, digital elevation terrain models, feature data of the administrative boundaries, street footprints, and evacuation location data. All the datasets are acquired from the different sources described in Sect. 2. In the second phase, a set of parameters is required from the users to model a flooding vulnerability area. Moreover, the level of uncertainty in estimating the population affected in the vulnerability risk model depends on decision-making variables, which include (1) the tsunami surge height (hazard), (2) the buffer distance from the flooding path (potential exposure to hazard), (3) the urban boundary where people are grouped densely, (4) the population ratio of (3), and the total population of the study area (vulnerable feature where all coincide). Therefore, the proposed method involves the characteristics of the study area and its modelling of the flood vulnerability region during the preliminary phases of the operations.

In phase 3, the aim is to model the coastal tsunami flooding using a raster elevation dataset with the polygon boundaries of the study area. The geoprocessing tools in ArcGIS 10.3 can identify what parts of the raster or the extent to which the study area would be flooded under a variety of scenarios depending on the input parameters. A tsunami hazard model of the study area is generated as a result of GIS processing in this phase. The characteristics of the model and its feature datasets are repeatedly used in the rest of the operations.

Once the flood inundation model has been built, another two input parameters, buffer distance and urban elevation (showing the amount of exposure to the hazard), are used for the spatial analysis in phase 4. The urban elevation parameter here refers to a specific value of terrain elevation where people are densely distributed and most likely to be affected. The most detailed population data available are limited to the geographic boundary of the study area. This is too coarse to use for the whole area of the town to estimate the affected population. Thus, it would be meaningful to downscale the study area to predict for smaller areas where real communities are mostly settled. This may result in a more detailed view of the distribution of the population within the town. With regard to the buffer distance parameter, a distance buffer from the flood line may give the decision maker the buffer area for predicting the indirectly-affected population. As mentioned above, people living outside the primary affected area may also experience some negative impacts as a result of the tsunami inundation. Therefore, defining an area as being indirectly affected can result in an estimate of the secondary affected population. This is true because when a tsunami occurs, the total number affected usually includes not only those who are in the path of the disaster, but also a number of people living outside the devastated area.

After an additional two polygons are generated in the study area by the given parameters, the two vector polygons are then computed to find the intersection zone. This overlapped area is then used as the source zone for initiating the population estimation in the final phase, in which areal interpolation techniques are employed. This final phase requires the population ratio parameter, which is the population in the source zone that is used as an initial step of the multi-stage interpolation techniques described above. Finally, the last operation produces the numerical results of the affected population. To test the approach, the GIS-based application for the affected population estimation is demonstrated in the next section with a number of tsunami flood scenarios.

5 Experimental results of the affected population estimation: Phuket case studies

This proposed approach offers various advantages by using areal interpolation techniques to estimate the affected population by using GIS. Not only are numerical terms produced as a result of executing the application, but also graphical solutions are provided as an alternative media in the GIS environment. Decision makers are able to use both types of solutions simultaneously in order to increase the efficiency of delivering aid to evacuation locations. This section illustrates the proposed GIS-based application and tests the solutions with different flood scenarios using an Inter ® Core i5-3360 M CPU computer with a memory of 6.00 Gb on a Windows 7 operating system.

5.1 A GIS-based application for estimating the affected population

The application starts with the extraction of the GIS data from the geographic database (GDB) to construct a tsunami risk model. Figure 9 shows the GUI of the application in which a digital elevation model of the study area and a set of parameters are entered.

Fig. 9
figure 9

GUI and data and parameters input of the application

A flooding scenario is generated in the form of a map with its attribute data in the ArcGIS system, depending on the input parameters. The attribute data table contains all the geospatial data in the flood zone map. A set of data in the table is used for querying, analysing, and integrating with other parameters to estimate the population in the affected area. A Python scripting language is used to execute geo-processing tools to perform spatial analysis and data management in the ArcGIS platform. As a result, the application directly yields the numerical solution for the decision maker on the screen. The solution is also connected to ArcGIS, enabling various visualisations of the extent of the tsunami inundation.

5.2 Experimental results

To examine the application solutions, several flood scenarios using different sets of input parameters are conducted so that the solutions can be compared and then analysed. Prior to comparing the results, the worst-case scenario, with the 10-m surge reported in 2004, is used as the threshold problem and is solved first with the following input parameters: tsunami surge height (s) = 10, buffer distance (b) = 0.5, urban elevation (u) = 50, and population ratio (p) = 95%. The numerical solutions of the threshold scenario are shown in Table 1.

Table 1 Numerical results for the threshold scenario

As the solutions are linked to ArcGIS, the identical results in terms of visualisation are provided with their attribute data so that they can be analysed further by the decision maker.

The following scenarios are then proposed by merely altering one parameter, while the rest of the parameters continue to have the same values. This experiment is divided into three groups of scenarios, each of which trials three different values for the identical parameter: (1) surge height, (2) buffer distance, and (3) urban elevation and the population ratio. It is a fact that urban elevation is critically related to the population ratio, as the level of urban elevation always affects a fraction of the population, and therefore, both parameters are characterised in the same group. Table 2 shows each scenario group that corresponds to its parameter test, while the solutions for each scenario group are illustrated in Tables 3, 4 and 5.

Table 2 Parameter test in each flooding-scenario group
Table 3 Numerical results of scenarios 1–3
Table 4 Numerical results for scenarios 4–6
Table 5 Numerical results of scenarios 7–9

The first scenario group outputs the estimates of the population affected by a set of different surge-height parameters. Scenarios 1, 2, and 3 examine the results when using the parameters of s = 9, s = 8, and s = 7, respectively, while other parameter values remain the same as specified in the threshold scenario. A range of minimum and maximum populations is produced as the estimation for each evacuation location in the study area. The notable observation when changing the surge-height value is that the minimum number affected is directly proportional to the tsunami surge height. Specifically, when the surge height is reduced to 9, 8, and 7 m, the number of people directly affected by the surge in evacuation node 3, for example, decreases to 2,854, 1,501, and 864 people, respectively (see Table 3).

With respect to the maximum affected, the numbers very slightly fluctuate in most evacuation locations compared to the threshold scenario. However, the maximum numbers in some areas could be significantly decreased, especially when the tsunami surge is lower. This depends on the characteristics of an area. For instance, node 2 in scenario 1 (s = 9) shows a minor decrease in the maximum affected population of 4511 people compared to the threshold (s = 10) of 4705 people. Nevertheless, there is a significant reduction in the maximum numbers for node 2 when the surge height drops to 8 or 7 m; the numbers are 2044 and 1680 people, respectively.

Scenarios 4, 5, and 6 assess the solutions with respect to buffer distance. This parameter of distance away from the flooding line is assessed using the values of b = 0.3, b = 0.2, and b = 0.1 correspondingly. The other parameters are identical as in the threshold problem. As a result, when comparing solutions, a decreased buffer distance leads to the reduction in the maximum affected number in every evacuation node. In contrast, the minimum affected remains the same number between two matching nodes. Thus, altering the buffer distance amount will affect the maximum numbers only (see Table 4). This implies that the value of the distance away from the flood line is directly proportional to the total estimate of the affected population. Figures 10 and 11 show the minimum and maximum height affected populations for different surge heights for nodes 1 to 6.

Fig. 10
figure 10

Minimum and maximum height affected populations for different surge heights for nodes 1, 2 and 3

Fig. 11
figure 11

Minimum and maximum height affected populations for different surge heights for nodes 4, 5 and 6

With regard to the urban elevation and the population ratio, these two factors are the area in which people are densely distributed and most likely to be affected by tsunami inundation. Thus, when the amount of urban elevation is adjusted, the population figures have to be changed consistently with the amended area. For example, an urban elevation of 50 m above sea level is considered to be an area where 95 per cent of the total population is settled, and this populated region is vulnerable to tsunami flooding. Thus, the population ratio is interpreted as 95 per cent. In this experiment, flood scenarios 7, 8, and 9 then consider urban elevation followed by its population fragment to examine the affected population estimation, giving parameters of u = 40 with p = 80, u = 30 with p = 70, and u = 20 with p = 60, respectively. There is no change in the other parameters in order to properly control the inspection. The results show that these two parameters affect both the minimum and maximum numbers when compared to the threshold scenario (see Table 5). The reduction in the urban elevation value will downscale the study area resulting in a decreased population. By doing this, each neighbourhood in the evacuation node may cover a smaller area with a smaller number of residents. As a result, there is a reduction in the estimation of both the minimum and maximum affected.

To examine the extent to which parameters affect the solution, there is a control of the variables in each flood scenario. Accordingly, more scenarios can be conducted when each input variable is adjusted. In a real situation, most parameters can be periodically obtained in the disaster preparedness phase. However, the tsunami surge height is revealed by the Deep Ocean Assessment and Reporting of Tsunami System (DART) only once an earthquake stimulates a series of tsunami waves. This allows the emergency management centre to estimate the affected population using this GIS-based application. The estimates can be used as a vital decision-making factor in planning and managing relief deliveries for the affected population in a timely manner.

6 Discussion

An inefficient logistical response to the 2004 India Ocean tsunami in Thailand was partly the result of a failure to estimate the spatially-distributed population affected by the tsunami inundation. This study therefore developed an innovative tool for emergency management that integrates techniques for estimating the affected population and utilises the capabilities of GIS for geospatial analysis, mapping and visualisation.

Early approaches to learning from the 2004 Indian Ocean tsunami focused on modelling the dynamics of the tsunami (e.g. Takashima et al. 2005). More recent tsunami impact modelling has also considered the distribution of populations exposed to a given tsunami (e.g. Koshimura et al. 2017). The innovative aspect of the methodology presented here is the estimation of a range of the minimum and maximum population affected by tsunami inundation for an urban test area. The GIS-based application can be used in the disaster preparedness phase to examine tsunami flood scenarios. The tool can also be used in the response phase, guiding delivery routing for humanitarian relief, which critically depends on the number of affected people in each administrative area.

The use of the application begins by entering the data and decision-making parameters. A tsunami risk model is then generated in the GIS environment. This model provides a geographical format with its attribute data in ArcGIS, which is used as the primary information for the multi-stage interpolation processes. Regarding the population estimation technique, the first stage is the population-weighted interpolation with assisting information from satellite remote sensing. The weighted-street methods are used in the subsequent stages for the estimation and analysis of small areas. The latter could be enormously advantageous in cases when the availability of population data is limited.

The methodology was piloted in an urban area of Phuket to demonstrate its operability for assessing affected areas when a tsunami occurs. The integration of GIS and interpolation techniques rapidly produced estimates of the affected population in each evacuation shelter location. In the experiments, the average of the CPU time is approximately 257 s to obtain the solutions for a tsunami flood scenario.

The population estimates provide useful information, not only in the form of direct numerical results, but also as GIS-generated maps that can assist decision makers in their visualisation of locations with the greatest need for assistance. The application is expected to be used as a rapid source of information, providing emergency managers with the ability to plan and manage vehicle routing for humanitarian relief in the aftermath of a tsunami.

It is recommended that mathematical methodologies are used to obtain specific estimation with a given probability, instead of a range of estimation. The proposed approach currently does not take into account non-resident populations, such as migrant workers and tourists. By including such dynamic populations in the estimation model, the accuracy of the estimation solutions would be significantly improved. Also, any vehicle routing algorithms used in conjunction with this application need to be able to incorporate up-to-date information on any infrastructure damage to ensure no suggested routes are inaccessible as a result of the event.

With regard to limitations of this research, a major shortcoming is that this methodology has been developed using relatively expensive proprietary geospatial software: ArcGIS. That creates a cost barrier which might limit the uptake of this methodology in low-income countries and other locations where funding is limited, such as in many government agencies or NGOs (Teeuw et al. 2012; Leidig et al. 2016). The main Free Open Source Software (FOSS) for geospatial applications, QGIS, can process the ArcGIS shapefiles around which this methodology is built. Therefore it is recommended that further research be carried out into adapting this methodology for use by QGIS. Future development of this application could also benefit from the sets of more detailed global geospatial data that are now freely available for GIS processing, notably the 12.5 m-pixel PALSAR DEM (https://asf.alaska.edu/data-sets/sar-data-sets/alos-palsar/) for mapping low-elevation coastal zones susceptible to tsunami inundation (Teeuw & Leidig 2019).

7 Conclusion

This study has developed a methodology to determine the number of disaster-impacted people in a given district, by integrating GIS and population estimation algorithms. A multi-stage spatial interpolation was used for estimating the affected populations using ArcGIS software.

A dasymetric mapping methodology has been developed, using a population-weighted technique coupled with GIS and remote sensing data; street-weighted methods were also considered. The results in each target area show the coordinates of each shelter location for evacuees, with the minimum and maximum numbers of people affected by the tsunami inundation. This innovative tool produces numerical solutions for decision makers, as well as a variety of maps that improve visualisation of disaster severity across neighbourhoods. A case study in Patong, a town in Phuket district, illustrates the application of this GIS-based approach. This methodology provides an enhancement on current planning processes as it provides estimates of affected populations against key factors, such as the Tsunami surge height. Further validations and comparisons against future alternative methodologies across a range of geographical case studies arising from different Tsunami affected countries are welcomed as they become available.

This methodology can assist decision makers in planning and managing humanitarian relief logistics during the disaster preparedness and response phases, potentially improving the effectiveness of disaster management. The application could assist local authorities with managing diverse flood scenarios, i.e. not just tsunamis, but also storm surges or major river floods. With global heating and the associated general increase in the frequency and severity of flood disasters globally, the tool presented here for estimating flood-impacted populations could increase the efficiency and effectiveness of humanitarian relief logistics.