1 Introduction

Between 1998 and 2017 more than 4 million people globally were affected by landslides and more than 18,000 people were killed (CRED and UNDRR 2020). However, these impacts are not equally distributed across countries. Low-income countries continue to experience disproportionate impacts from landslides, with fatalities almost four times higher compared to higher income nations (Dowling and Santi 2014). While the cause for this discrepancy is no doubt a complex web of social and economic determinants, limited tools to understand landslide risk in data scarce contexts undermine efforts to reduce these losses.

There has been increasing emphasis on identifying populations at risk to reduce landslide disasters. In recent years global frameworks, including the Sendai Framework for Disaster Risk Reduction and the United Nations Sustainable Development Goals, have accelerated disaster risk reduction at national levels, however, there remain challenges in implementing disaster risk reduction programmes at a local level (UNDRR 2019). There is a need to develop and apply new geospatial methods which can be used to predict exposure that are accessible to local governments.

Landslide risk modelling can be challenging as it relies on detailed spatial, demographic, and geological data for effective assessments. In low-income countries that have limited access to spatial and demographic resources, quantifying landslide risk at a local scale remains a significant barrier to disaster risk reduction efforts. Gridded population count datasets are increasingly available—these datasets disaggregate national censuses to a spatial unit. Datasets with different modelling techniques and resolutions have resulted in a growing number of approaches (Sapena et al. 2022). However, these datasets continue to be an underutilised tool for risk assessments. There is a need to understand the suitability of these datasets in predicting risk in a locality.

Previous landslide risk modelling has used a wide range of population estimates, however comparison of the suitability of common global datasets for local planning purposes has yet to be fully examined. We focused on assessing the usability of two common population datasets in this study—WorldPop and the High-Resolution Settlement Layer (HRSL). These datasets were selected based on their accessibility for low- and middle-income countries and population disaggregation techniques. We conducted a landslide risk assessment in the Municipality of Carigara, located in the central Philippines to compare risk estimates from these datasets against a comprehensive local survey conducted in the municipality. Under these objectives, we sought out to answer the following research question: How suitable are current gridded population count datasets for local landslide risk assessments? Our aim was thus to understand the extent to which gridded population datasets can be used for assessing exposure in local landslide risk studies.

We first review how landslide modelling is conducted in the literature and the need for population datasets. We then discuss our methodology which compared three landslide risk models using WorldPop, HRSL, and local datasets. We then discuss the implications of comparative risk, suitability of gridded datasets, usability of proposed risk methodology and limitations of our study. We conclude with the contributions of this research to theory and practice. Our study provides a novel empirical comparison of two of the mostly widely used gridded population datasets, offering important validation of their use and uncertainty in landslide risk studies. Understanding the extent to which such exposure data accurately represents populations is vital to ensure that appropriate landslide risk reduction measures can be planned and implemented.

2 Background

Disaster risk can be conceptualised as three elements: hazard, exposure, and vulnerability (Dai et al. 2002). The hazard is the function of a naturally occurring process, such as a landslide. Exposure is the element in the path of the hazard, such as populations, and vulnerability is susceptibility to physical, social, economic and political loss (Cardona et al. 2012). Wisner et al. (2012) define vulnerability as the degree of loss or impact of one's social and physical status from dynamic pressures and hazards where the dynamic pressures include deficiencies in social, economic and political structures. Vulnerability is a multidimensional concept, and is dependent on the hazard, location, time, and scale (Murillo-García et al. 2017).

Risk assessments identify the potential consequence of hazardous events (Papathoma-Köhle et al. 2007; Pereira et al. 2017; Azzimonti et al. 2020). There have been a limited number of studies that examine the effects of resolution and sources of data in landslide risk assessments. Landslide susceptibility mapping (LSM) identifies regions most prone to landslides by investigating topographic details such as slope, curvature, and elevation derived from a digital elevation model (DEM). Reichenbach et al’s (2018) review of landslide susceptibility and zonation literature points to continued gaps stemming from inconsistent modelling methods, scale, and inputs used to determine areas of concern across different geographical regions. Out the nine recommendations, the need for selecting the right mapping unit, accuracy of geo-environmental information, and use of appropriate modelling method was of chief concern. Discrepancy among scholars shows there is continued disagreement on the level of resolution needed for landslide mapping. While there has been expanding coverage and comparison of national landslide databases (Modugno et al. 2022; Gómez et al. 2023), these comparisons are largely limited to susceptibility mapping and to date there has been little comparative study of the impact of population resolution in landslide risk assessments.

Population data is important for government, planning, and risk prevention purposes. While census data conforms to high quality standards, this information by nature does not include more granular spatial information below the lowest administrative boundaries. Therefore, census data often needs to be redistributed spatially to regions through interpolation. A common redistribution method is dasymetric processes, where the basic census unit (BSU) is combined with targeted regions (buildings) and population per pixel to determine the number of people per building area. The BSU is population aggregated per unit area on a spatial map. Population dasymetric distributions allocate population per building area instead of a generic unit area on a map (Garcia et al. 2016). This helps to reach higher precision of population estimation as people are clustered around known settlements. The redistributed population can be shown as population density or number of people per pixel (Corominas et al. 2014). Common alternatives include the use of statistical methods which construct models that weight the distribution of population data into areas or zones as well as hybrid combinations of both above methods.

Ma et al. (2021) compared LandScan and WorldPop, both common gridded population datasets, to understand accuracies, finding that WordPop better predicted populations densities at the province, city and county scale in China. Bai et al. (2018) also found WorldPop had the highest accuracy in population estimation using statistical analysis in comparison to CnPop (5 km), GPWv3 (5 km), and GRUMPv1 (1 km)—other common population datasets. However, recent research by Fries et al. (2021) suggested that the HRSL density gave more accurate urban and rural prediction in comparison to LandScan, WorldPop, and GPW. In terms of population accuracy, WordPop and HRSL continue to emerge as two of the most compelling for use.

Application and comparison of these population datasets for risk assessments is still in a state of relative infancy. Smith et al. (2019) and Bernhofen et al. (2021) compare differences in population exposure from datasets for flood hazards—for which there has been greater attention on the importance of population data sources—but there has been less attention to the impact of population datasets for landslide studies. Lin et al.’s (2023) study of future landslide risk under climate change in China for draws attention to the often widely varying estimates of landslide risk derived from different population datasets, calling for further study of the quality and assumptions of population data being used for landslide risk models. Further comparison between distributed population datasets will help to understand how resolution and population distribution methods influence landslide risk assessments. This will aid in identifying whether finer resolution distributed population datasets are necessary for accurate risk assessments for communities with the potential to guide continued dataset development. Population datasets capable of determining landslide risk accurately have potential to improve landside risk mitigation measures and planning.

3 Methods

In this section, we outline our data collection, which included obtaining the two secondary spatial population datasets of interest, collection of household surveys in the studied municipality, and capture of relevant landslide susceptibility maps. Using InaSAFE, a GIS impact analysis tool, we determined population exposure for each of the three population data sources which were then integrated with a household and community vulnerability assessment to determine community landslide risk. Using t tests, we then compared differences in community landslide risk to understand the suitability of the considered population datasets. A summary of our methods is shown in Fig. 1.

Fig. 1
figure 1

Overview of methodology

Landslides are a significant issue in the Philippines, however, despite significant work to map landslides a national level, risk assessments at a local level remain sparse. To date, extensive susceptibility mapping has been completed at a national scale. However, these maps are only a first step as they do not consider the vulnerability of communities, nor their capabilities to cope with landslides (Arrogante-Funes et al. 2021). Therefore, to bridge this gap we set out to examine the social vulnerability of these communities and available population datasets that can be used to inform risk assessments. This research focuses on the Municipality of Carigara, located in the province of Leyte in the central Philippines. The province has some of the highest susceptibility to landslides. As an example, in 2006, a landslide on the island in the neighbouring province of Southern Leyte killed 1221 people (Hadmoko and Mauro 2012).

3.1 Data collection

Data used for this study drew from previously established population datasets, a local survey, and landslide susceptibility maps. We selected two secondary spatial population datasets for this study—WorldPop UN Adjusted (WorldPop 2018) and the High-Resolution Settlement Layer (Facebook Connectivity Lab and Center for International Earth Science Information Network 2016). The population dataset represents the population densities as people per square kilometre. The UN adjusted estimates were selected for 2015, the same year that forms the basis for the HRSL. We selected these datasets as they are open access, making them widely available. Furthermore, they have some of the highest coverage of publicly available population datasets. WorldPop is available in all 195 countries globally, while the HRSL currently covers 140 countries.

The selected population datasets use dasymetric mapping techniques to accurately predict the number of people per unit area (grid) by redistributing population using topography, land cover, night-time lights, and other geospatial covariate information (Gaughan et al. 2013). The WorldPop UN Adjusted dataset redistributes national census data that has been adjusted for United Nations Population Division estimates into administrative units and calculates population density per pixel for each unit by using random forest algorithms (Gaughan et al. 2013). The population density per pixel is then disaggregated to 100m spatial grids for the entire nation (Sorichetta et al. 2015). Contrary to this, the HRSL dataset uses binary-dasymetric mapping techniques that disaggregates national census data to units with settlement (buildings) identified using semi-supervised machine learning of Digital Globe imagery at 30m spatial grids (Palacios-Lopez et al. 2019). Both datasets represent estimations of population distribution over a spatial grid.

To compare these estimations to actual population distributions at a municipal scale, we obtained population data collected through an extensive household survey in the Municipality of Carigara. The local survey was conducted in 2017 and led by the Municipal Disaster Risk Reduction and Management Office (MDRRMO). In collaboration with Barangay Disaster Risk Reduction and Management Committees (BDRRMCs), age, gender, disability status, chronic illnesses, and occupation were compiled into a household database. Using OpenStreetMap and participative community mapping, the MDRRMO also digitised and linked this data to building footprints. Standardised reporting templates were used, and the location and household details were validated through multiple rounds of community consultation. This data represents the actual spatial distribution of population by building, as compared to the estimated datasets that rely on redistribution methods. Examples of the resolution of each of these population datasets are shown as densities in Fig. 2. WorldPop data is shown at 100m resolution, HRSL at 30m resolution, and the local survey data is at the building scale. These were resampled for the final analysis for consistency.

Fig. 2
figure 2

Comparison of population density datasets

We used pre-existing landslide susceptibility maps developed by Eco et al. (2015), Alejandrino et al. (2016), Rabonza et al. (2016), and Luzon et al. (2016) which utilise 5-m IFSAR-derived digital terrain model to determine landslide potential in the case location. These are currently the most detailed susceptibility maps available for the region of study. The susceptibility maps include conefall propagation areas. Susceptibility and elevation maps for the province of Leyte, where our case site was located, are shown in Fig. 3 and Fig. 4, respectively. The maps are divided into three classifications: high, moderate, and low. Low susceptibility regions require monitoring, moderate susceptibility regions require slope protection measures and high susceptibility regions are found on unstable slopes where construction of structures is not permitted.

Fig. 3
figure 3

Landslide susceptibility map

Fig. 4
figure 4

Elevation map

3.2 Data analysis

Landslide risk was quantitatively defined by combining landslide susceptibility classes, social vulnerability, and exposure. This allowed us to understand and compare the suitability of population datasets through paired t tests. We used the household as the unit of observation and barangay as unit of analysis in this study. Barangays are the lowest administrative division in Philippines and in this research, considered as a community. An index-based approach was used in this study to measure vulnerability and risk. This approach is well suited for studies that conduct comparisons of risk between different locations and susceptibility.

Deductive and inductive approaches are commonly used to estimate vulnerability indices (Beccari 2016). Deductive approaches rely on past literature to select and combine a limited number of indicators together through z-score or linear standardisation, whereas inductive approaches often use indicators that are narrowed through principal component analysis techniques (Yoon 2012). For this research, a deductive approach was adopted as fewer than ten indicators were available for analysis. The risk assessment relied on indicators selected from previous research (Tate 2012). Since our interest was the impact of distribution methods of population at risk, we sought to maintain the same vulnerability scale for each of the three datasets considered. We developed a vulnerability index using a methodology adopted by Eidsvig et al (2014).

A social vulnerability index at the barangay level was formed through selection of vulnerability indicators, weighting, normalisation, and scoring. The indictors used were: (1) % women in a household, (2) % of household under age 5, (3) % of household over the age of 65, (4) % of household with a chronic illness, (5) % of household with disability and (6) % of people unemployed. These characteristics provide information on how people can cope and resist the impacts of landslides (Frigerio et al. 2016; Murillo-García et al. 2017; Guillard-Gonçalves and Zêzere 2018).

Household indicators were normalised using min—max rescaling methods, as defined in Eq. 1, where XHH is the household indicator, \({X}_{\mathrm{min}}\) is the minimum value recorded for that indicator among all households, and \({X}_{\mathrm{max}}\) is the maximum value recorded for that indicator among all households. This normalises the scores of each indicator to give a value between 0 and 1.

$${\mathrm{Indicator}}_{\mathrm{HH}}= \frac{{X}_{\mathrm{HH}}- {X}_{\mathrm{min}}}{{X}_{\mathrm{max}}- {X}_{\mathrm{min}}}$$
(1)

For this research, equal weights were applied to all indicators. Many authors have adopted equal weights in the absence of rigorous statistical justification (Cutter et al. 2003; Ignacio et al. 2015). Statistical analysis can be used to weigh each indicator, however with the complex nature of vulnerability, there is little validated literature to support which indicators are more important. After obtaining the normalised scores and using equal weights, we took an average of all indicators to calculate a vulnerability score per household. Household vulnerability was then aggregated to a barangay level for analysis and then averaged.

A typical household had half women members, either one member with a chronic illness or disability, and three unemployed household members. This household would have a vulnerability score of 0.23. The highest household vulnerability observed was 0.66. To classify which barangays were most and least vulnerable, we adopted standard deviations. Scores greater than one standard deviation above the mean were classified as high, while scores less than one standard deviation below the mean were classified as low. Values in between were deemed medium.

To understand how many, and where, households were at risk to landslides we used InaSAFE, a tool in QGIS. We created an impact layer using the collected landslide susceptibility maps by combining the elements at risk. In the context of InaSAFE, the elements at risk were the estimated population (WorldPop and HRSL) and survey dataset spatially linked with OpenStreetMap building footprints. We used OpenStreetMap barangay boundaries to define communities. We then overlayed a landslide susceptibility layer with population and building elements to identify the number of people exposed to low, medium and high landslide susceptibility levels. From the analysis, we obtained the number of people exposed for each population datasets. We used an average household size of 4.48, derived from the local survey data, to determine an expected number of households estimated to be impacted from the WorldPop and HRSL datasets. We used the household as unit of observation as this better reflected the impacts from landslide events (as opposed to individuals).

Risk was calculated for each barangay by multiplying the average household vulnerability with the number of households exposed to landslides for the WorldPop and HRSL datasets. We classified low, medium, and high susceptibility levels as numerical values of 1, 2 and 3, respectively. For example, if a barangay with a vulnerability score of 0.5 had 10 households exposed to low levels of landslide susceptibility and 20 households exposed in high levels, we would obtain a risk index value of 35 for that community. The risk values were then divided by the total number of households for each barangay.

For the local household dataset, risk was first calculated for each household. This was done by multiplying vulnerability scores with the household exposure level. The risk values of households were then aggregated for each barangay and then divided by the total number of households to normalise risk. This normalisation was done to understand risk relative to community size. Figure 5 shows buildings with designated risk. To compare the community risk results for each population dataset, we conducted a paired t test, analysing differences between estimated and actual population distributions. These tests were selected based on the continuous nature of analysed data. We ran three separate t tests to compare the WorldPop and HRSL datasets against the local survey, as well as against each other. For the two groups in each test, the null hypothesis was that there was no difference between the two population risk values. We used a significance level of 5%.

Fig. 5
figure 5

Building landslide risk

4 Results

We will first discuss the differences between the community risk estimates produced from each of the three population datasets and then discuss differences in the counts of people estimated to be at risk. Using paired t tests of barangay risk estimates, we found that the WorldPop distribution (M = 38.62, σ = 69.30) resulted in higher landside risk estimates when compared to the HSRL distribution (M = 4.66, σ = 13.33) and the local survey (M = 3.22, σ = 5.89). Contrary to this, we found relatively little difference between the risk estimates for the HRSL distribution (M = 4.66, σ = 13.33) and local survey (M = 3.22, σ = 5.89). Figure 6 below shows a comparison of landslide risk for each barangay.

Fig. 6
figure 6

Comparison of landslide risk by population datasets

4.1 HRSL exposure

The HRSL dataset performed well in estimating overall risk to landslides at the municipal scale, however, it showed differences across individual communities. We found that the local survey dataset predicted that 2% of Cariaga’s population (344 people) were exposed to landslides, in comparison to the HRSL, that estimated 2.7% (391 people). According to the local survey, 229 and 115 people were exposed in low and moderate landslide susceptibility classes respectively, and none were in the high landslide susceptibility areas. The HRSL estimated 125, 209, and 57 people were in low, medium, and high landslide susceptibility classes, respectively. Similar overall exposure numbers were seen between the HRSL and local datasets, suggesting that HRSL distribution replicates the actual distribution of population in Carigara. Table 1 shows a breakdown of exposure estimates for each individual barangay. The HRSL dataset overestimated exposure in 2 of 33 communities and underestimated exposure in 6 of 33 communities, when compared to local survey data.

Table 1 Population exposed to landslides

4.2 WorldPop exposure

We found that WorldPop overestimated landslide exposure at the municipal and community levels, predicting that 21% of Cariaga’s population (3,058 people) was exposed. The dataset estimated that 940, 1,704, and 414 people were in low, medium, and high landslide susceptibility classes. We found that 75% of barangays showed overestimated exposure, when compared to the local survey dataset. A summary of population for individual communities is shown in Table 1.

5 Discussion

In the following sections we will discuss the main findings of this research and their use in practice. The first section discusses the suitability of gridded population count datasets when estimating landslide risk at municipal scales. We explore why the HRSL distribution matches local population data and its implications. The following section discusses the implications of risk mapping in Carigara for land use and contingency planning. Lastly, we discuss the usability of the proposed risk methodology for other municipalities. We conclude this section with limitations of this study and considerations for future research.

5.1 Suitability of population datasets for risk assessments

Differences in disaggregation techniques affect risk estimates, and we found WorldPop showed different risk estimates compared to the local survey dataset and the HRSL. WorldPop overestimated risk and the number of households exposed to landslides in all communities. The HRSL predicted similar risk estimates to the local survey dataset. To understand why this is the case, it is important to highlight how these datasets use different modelling techniques and at what spatial resolution population estimates are disaggregated.

WorldPop overlays distributed census data into administrative boundary units with areas rich in human activity through the following covariate factors: topographical factors, earth surface material, night satellite imagery, roads, and land cover. Random forest algorithms are used to combine population per unit with the covariate factors to predict population per pixel, that is disaggregated to 100m spatial grids. Using this distribution technique, the census data is spread across a larger area. This is reflected in the exposure estimates as WorldPop predicts higher number of people in all susceptibility classes, as shown in Fig. 7. WorldPop relies on the covariates that are spread across large regions to distribute population. These covariates are roads, land cover, and other features. We found WorldPop disaggregated population in southern forest regions of Carigara where no settlements are found. In comparison, the HRSL distribution technique disaggregates census onto building footprints at a 30m spatial grid. Populations are concentrated to smaller areas than WorldPop, which explains why we see WorldPop overestimate risk as more people are found under low, medium, and high susceptibility classes.

Fig. 7
figure 7

Population exposure comparison

We found that the HRSL is similar in predicting risk to actual population distributions. In our studied case, the HRSL distribution predicted 2.7% of total population exposed, close to the actual prediction of 2% population exposed using local surveys. This happened as both datasets relied on buildings to distribute their population. However, when we compare exposure estimates for different susceptibility class, there were differences. For example, the HRSL overestimated 57 people in the high susceptibility class, equivalent to 15% of the total exposed population in Carigara when compared to the local survey. The high susceptibility regions fell in communities with lower density. HRSL data was capable of disaggregating the majority of population to urban and rural settlements.

5.2 Useability of risk methodology

The developed risk methodology uses readily available population datasets, accessible vulnerability indicators, and a straightforward analysis procedure to compute community landslide risk at municipal level. While standard landslide risk methodologies do exist, these are often complex. Our approach draws on globally available datasets to make risk assessment processes more accessible to municipalities in resource constrained settings. The current variability in access to data forces many municipalities to heavily rely on their local surveys for population distributions when estimating landslide risk. The developed methodology aims to shift this trend of using local surveys to using global gridded datasets. Ultimately, however it will be up to local decision-makers to identify what level of certainty they require in landslide risk assessments. For site specific structural measures, this level of analysis is likely insufficient. Our approach can however be useful for contingency planning and more general landslide awareness to prioritise resources to local communities.

Access to population demographic distributions is becoming more available through high resolution gridded datasets. This study relied on a local survey to benchmark and compare these against. To translate this study to other municipalities, the existing social vulnerability index can also be applied using traditional, non-geospatial population datasets. Social indicators including age, gender, disability, and employment can be found from disaggregated population and housing censuses at the barangay level. By using this methodology across other landslide-prone municipalities in Philippines, risk can be compared between locations for national and provincial policy makers to support effective resource and financial distribution between communities.

5.3 Limitations and future considerations

This section discusses the limitations in trying to estimate landslide risk from the data and approach used. It is important to consider the errors produced by down-scaling or up-scaling when estimating risk which introduce epistemic uncertainty into our findings. We believe this to be a condition which will vary in different sites and something that needs to be considered by those ultimately performing local landslide risk assessments. Locations where there are wider gaps between populations and landslides will no doubt have uncertainty, whereas high density and susceptible areas may need to more carefully consider this uncertainty in future analysis. There is also the potential for our results to have been influenced by aleatory uncertainty in the underlying landslide susceptibility maps used—an important factor given the difficulty in predicting landslides. This uncertainty is potentially significant, but the validation of maps with landslide inventories in the province of Leyte and confirmation from our local partners suggest that the maps used offer a realistic picture of landslides in our case site.

The utilised gridded population datasets are static and do not consider temporal changes, including day and night movement of people. Disaster risk is dependent on time and spatial characteristics not captured in this study. The need for dynamic risk assessments that capture these aspects is necessary for risk reduction policies. To consider these dynamics, future research should seek to develop enhanced high-resolution population datasets that consider these temporal effects. For example, further research into human mobility through mobile data location could be one line of data exploration to help in identifying locations of population over time. Risk estimates with dynamic properties will lead to real-time representation of impacts on populations.

For this study we used equal vulnerability weights as we assumed that all social indicators were equally important. To ensure a more robust vulnerability analysis, it is important to understand which indicators are more relevant to landslides. For example, we assumed disabled and unemployed people to be equally vulnerable to landslides. This may not be true, as disabled people may be at higher risk of impact then those who are unemployed in certain communities who have weak social capital. We generalised vulnerability and simplified the underlying conditions that can make one indictor more influential than another. Additional research is needed on the sensitivity of these measures and the relationship between social indicators.

6 Conclusion

This research sought to understand how gridded population datasets influence risk assessments of communities exposed to landslides. We used accessible global datasets, simple risk assessments, and an index-based approach convenient for the municipal scale to examine how different global spatial population datasets compare in estimating landslide risk. We found that the HRSL dataset most closely represented the actual population distribution and landslide risk estimates at barangay level in the selected case site. This was due to disaggregation of population to building footprints and use of a high-resolution spatial grid. Contrary to this, the WorldPop dataset overestimated risk as it disaggregated population to wider area. This comparison helps to understand the suitability of these global datasets in predicting risk in a locality.

We found the HRSL dataset represented similar risk as local survey and were able to identify risk in communities with relatively high precision. This is significant as it shows the suitability of HRSL data when preparing risk assessments. However, HRSL data is still not yet available for many low-income countries with high landslide exposure. In this context, the global availability of WorldPop data provides a useful alternative. The availability of high-resolution gridded population datasets is crucial in settings where population data gaps still exist and where time intensive local surveys might otherwise be required. Population data can be expensive to gather, time consuming, and require resources for planning and management that can be invested in other initiatives. Our findings provide evidence on the potential for new global population gridded datasets to fill gaps in exposure data for local landslide risk assessments.