1 Introduction

Landslides are one of the most frequent geological hazards in the world (Skrzypczak et al. 2021). Disasters related to landslides are directly associated with loss of lives as well as property, infrastructure, and environmental damages. Moreover, urbanization and raising population numbers with a potential simultaneous lack of urban planning practices have led to informal developments in hazardous hillslopes of urban agglomerations (Mendes et al. 2018; Kurniawan and Krol 2014). Especially, low-income populations are forced to settle in those hazard-prone areas (Müller et al. 2020; UN-Habitat 2016; UN 2015), increasing the landslide risk through man-made changes like vegetation removal or sewage disposal systems (Mendes et al. 2018; Reichenbach et al. 2014; Glade 2003). According to UN-Habitat, in 2016 four out of ten non-permanent houses in developing countries were built in areas threatened by natural disasters like landslides. Besides, climate change is increasing the occurrence and risk to experience damaging hazardous landslide events (UN-Habitat 2016).

As a consequence, the exposure of physical, social, economic, and environmental assets, known as the elements at risk, is increasing (Kurniawan and Krol 2014; Pellicani et al. 2013). In this context, risk reduction measures play a very important role in diminishing the impact in case of a landslide, especially since the magnitude of a disaster has been directly related to the vulnerability and exposure of the assets at risk (Carvalho de Assis Dias et al. 2018; Birkmann and Welle 2015). Similarly, the Sendai Framework 2015–2030 for Disaster Risk Reduction by the United Nations recommends that disaster risk assessment, prevention, mitigation, and implementation of response measures are established by analyzing the hazard in all its dimensions (i.e., vulnerability, capacity, exposure, hazard characteristics, and environment) (UNISDR 2015).

Landslide risk assessment depends on the complex interplay of the hazard, the elements at risk, and their vulnerability (e.g., Birkmann 2006). A hazard is defined as the probability of a disastrous event happening in a certain period, with particular intensity at a particular location (Unesco 1973). Indicators such as the probability of occurrence, intensity, or duration specify the hazard event (Geiß and Taubenböck 2013). Exposure refers to the elements present in the potentially affected area, such as people, infrastructure, or economic values, while vulnerability relates to the resilience of these elements. This can comprise, for example, the stability of building structures (Geiß et al. 2015), or the potential to recover from the effects of the natural hazard through economic reserves. Concerning the social component of vulnerability, it is generally understood that the vulnerability of economically deprived communities is higher due to more precarious conditions of their assets and livelihoods. Therefore, informal settlements are more impacted by natural disasters like landslides (Hallegatte et al. 2017; Wisner et al. 2003). Since risk is a multidimensional concept and based on their mutual influence, exposure and vulnerability are suggested to be analyzed together (UNISDR 2015). To this end, paying special attention to the social aspect is crucial. The same loss is more severe with increasing social vulnerability, which influences the ability of a community to cope, resist, and recover from a disaster (Carvalho de Assis Dias et al. 2018; World Bank Group 2017; Vranken et al. 2015). Nevertheless, risk studies in urban systems often analyze exposure independently as an individual part of risk assessment. An example is the identification of exposed people and housing through the spatial overlay with hazard susceptibility zones (Geiß et al. 2017).

However, studies focusing on the quantification of area-wide landslide exposure and vulnerability consider the latter mostly from an economic, physical, or environmental perspective, focusing on the monetary loss or severity of damage to buildings, infrastructure, or environment (e.g., Guillard-Gonçalves et al. 2016; Vega & Hidalgo 2016; Vranken et al. 2015; Galli & Guzzetti 2007). The inclusion of social vulnerability on the other hand is limited for landslide risk assessment analyses (Nor Diana et al. 2021; Puissant et al., 2014; Hollenstein 2005). To the knowledge of the authors, there are comparatively few exposure studies that consider the concept of social vulnerability as the socioeconomic level of the exposed assets to distinguish between certain degrees of severity and ability to recover from a disaster (e.g., Wijaya & Hong 2018; Carvalho de Assis Dias et al. 2018). Previous studies approached social vulnerability rather as the number of affected people regardless of their socioeconomic status (e.g., Kurniawan and Krol 2014; Puissant et al. 2014; Pellicani et al. 2013; Papathoma-Köhle et al. 2007), or work on comparatively low spatial resolutions, cover a smaller geographic extent, or are focused on multirisks (e.g., Aksha et al. 2020; Frigerio and De Amicis 2016; Guillard-Gonçalves et al. 2014).

Not only the definition of vulnerability varies in the literature but also the geographic scale for assessments. Landslide risk studies are mostly conducted on a macroscale level, where the study area covers a whole region, generally with the aim to identify and rank the elements at risk to produce risk maps (e.g., Promper et al. 2015; Vranken et al. 2015; Pellicani et al. 2013; Jaiswal et al. 2011). The mesoscale level is used for local planning purposes in cities or municipalities (e.g., Guillard-Gonçalves et al. 2016; Carvalho de Assis Dias et al. 2018). At this level, potential losses and consequences (i.e., physical or economic vulnerability) can already be partially quantified, for example, by clustering assets with similar characteristics (Puissant et al. 2014). Microscale level analyses are performed for local areas, quantifying physical, social, environmental, and economic vulnerabilities and aiming to implement technical and protective measures (e.g., Singh et al. 2019; Holcombe et al. 2012). Therefore, the amount of data required to conduct such studies is proportional to the detail of the analysis scale (Puissant et al. 2014).

Likewise, time plays an important role in risk assessment (UNISDR 2015) and should not be neglected. On the one hand, the temporal probability of landslide occurrences is based on historical records, and on the other hand, exposed elements and their vulnerability are subject to temporal changes (Van Westen et al. 2006). Few studies analyzed landslide exposure including the development of elements at risk (e.g., Promper et al. 2015; Kurniawan and Krol 2014), however, to the knowledge of the authors no reviewed study considered the temporal change of vulnerability. In this sense, multitemporal risk assessment requires consistent data over time.

Accordingly, Earth Observation (EO) has become more popular in the last decades in the landslide analysis domain as a source of data. The advances in the space-borne sector make EO techniques increasingly effective for landslide detection, mapping, monitoring, and hazard assessment (e.g., Novellino et al. 2021; Zhong et al. 2020; Casagli et al. 2017). For instance, the detection and mapping of landslides have been conducted by analyzing land cover changes through vegetation indices (e.g., Behling et al. 2014) or machine learning techniques (e.g., Ghorbanzadeh et al. 2019). Besides, remotely sensed data are also used to identify the exposed elements at risk over time (e.g., Promper et al. 2015; Kurniawan and Krol 2014). Particularly, land use and land cover (LULC) information is commonly applied to classify physical assets like built-up areas (Rahman and Di 2017). Whereas population distribution data, the social assets, can be estimated from official population counts supported by information derived from remotely sensed data (e.g., Sapena et al. 2022; Taubenböck et al. 2011). Likewise, the social component of the assets can be retrieved from space. Many studies demonstrated the possibility of identifying economically deprived settlements based on their morphological characteristics with high-resolution satellite data (e.g., Stark et al. 2020; Wurm et al. 2019, 2017).

The population in Medellín, Colombia, is growing fast and often informal, thus more and more people with a high social vulnerability live in the risk-prone hills of the city due to scarce land availability. This fact along with climate change-induced higher occurrences of heavy rainfalls leads to an increase in landslide events (IDEAM-UNAL 2018). Against this background, we conduct in this study a long-term multitemporal, intra-urban, and mesoscale landslide exposure and social vulnerability assessment in the city of Medellín. This is accomplished by utilizing multitemporal satellite and census data for three time steps (1994, 2006, and 2018), as well as machine learning algorithms and population disaggregation methods. We quantify the physical assets as the urban structures composed of buildings and infrastructure, and the social assets as the amount of population in those areas, separating between formal and informal settlements to approximate social groups and relate this to vulnerability. This analysis enables us to highlight different aspects of landslide risk assessment. We illustrate the evolution of built-up and population growth throughout time and on this basis investigate whether exposure and social vulnerability, in absolute and relative terms, have increased over time. The results intend to support urban planners and risk managers for informed decision-making.

2 Material and methods

This section introduces the general workflow of the study, the study area, the employed datasets, their processing steps, and the methodology of the exposure and social vulnerability analysis. Moreover, a conceptual note explains the assumptions taken in the context of this paper.

We developed the workflow shown in Fig. 1 for the long-term multitemporal, intra-urban, and mesoscale landslide exposure and social vulnerability assessment. First, for the generation of urban masks for three time steps (1994, 2006, and 2018), we perform land cover (LC) classifications based on Landsat mosaics (a); second, we divide the built-up area of the city into two thematic groups: using the informal settlement layer from Kühnl et al. (2021), we classify informal and formal settlement areas to approximate social groups (b); third, population is estimated at the pixel level using disaggregation methods (c) and specified into the population of informal settlements (d). These results (i.e., multitemporal urban masks, informal settlements, and population) are combined with the landslide hazard map for a multitemporal exposure and social vulnerability analysis (e).

Fig. 1
figure 1

Workflow of the study composed of land cover classifications based on Landsat data to conduct multitemporal urban masks (a), multitemporal informal mask calculations based on an informal settlement layer from Kühnl et al. (2021) (b), population disaggregation methods based on census data separating between formal (c) and informal areas (d) as well as the exposure and as social vulnerability analysis using all pre-processed data and a landslide hazard map (e)

2.1 Study area

Our study area, the city of Medellín, is the second largest city in Colombia. It is the capital of the Department of Antioquia as well as of the Metropolitan region of the Aburrá Valley, a political and administrative unit of ten municipalities with a population of 3.5 million (Fig. 1b; Garcia Ferrari et al. 2018; Hernandez Palacio 2012).

The municipality of Medellín (Fig. 2, white boundary) is composed of urban (Medellín and San Antonio) and rural parts (Fig. 2a, c). The area of interest (AOI) in this study refers to the urban, expansion, and urbanized areas of Medellín (Fig. 2a). Expansion areas are in the process of officially getting added to the administrative urban areas but do not yet fully belong to this planning level (Alcaldía de Medellín 2014a, b). Urbanized areas are characterized by high built-up density consisting mostly of informal urban structures, which have grown into the official administrative rural area of the municipality. By including them in our AOI besides urban and expansion areas, the analysis reflects the built-up conditions beyond administrative boundaries.

Fig. 2
figure 2

Location of the study area. Urban (= administrative urban areas), expansion (= in the process of officially getting added to the urban area) and urbanized areas (= areas with high built-up density, but officially classified as rural) of the Municipality of Medellín within the Aburrá Valley in Colombia. In the background, the hillshade of a digital elevation model shows the topography (DEM AMVA 2022: Open data Medellín)

Geographically, Medellín is situated in a 14 km north–south expansion in the Aburrá Valley between two mountain ranges of the Andes in the west and the east, crossed by the Medellín River running from north to south along the valley. The valley itself has a 10 km maximum width and the height difference between the highest and lowest point is about 1 km (Garcia Ferrari et al. 2018; Hernandez Palacio 2012). These characteristics lead to a very steep topography of the valley slopes in the east and west with a significant landslide risk (Claghorn and Werthmann 2015).

Socioeconomically, the living conditions get worse with distance to the Medellín River and higher up in the mountains. Especially, the landslide and flash-flood-prone slopes in the west and east of the city are mainly occupied by informal dwellers. Also, a north–south segregation is existent. The neighborhoods with low unemployment rates are located in the southeast of the city whereas the contrary is dominant in the northeast and -west (Garcia Ferrari et al. 2018).

2.2 Data

In this study we relied on long-term multitemporal data; we used Landsat satellite imagery at a 30-m resolution to derive LC maps, with a special focus on the urban layout. To do so, we created cloud-free Landsat mosaics for three time steps: 1994, 2006, and 2018. We used the Google Earth Engine platform (Gorelick et al. 2017) for building the cloud-free mosaics and downloading the resulting images. Since the period is fairly wide, we used the atmospherically corrected surface reflectance datasets from Landsat 5 ETM (L5), Landsat 7 ETM + (L7), and Landsat 8 OLI/TIRS (L8) sensors. We filtered cloudy pixels by masking low-quality pixels using the pixel_qa band. Due to the tropical climate in Medellin, the chances to get cloud-free pixels are quite low, and thus, we set long-term periods for the mosaicking to obtain good enough results for each date. Calculating the median, we used imagery from the year 1989 to 1994 to create the 1994 mosaic, images from 2003 to 2006 to create the 2006 mosaic, and from 2013 to 2018 to create the 2018 mosaic. For the LC classification, we selected a subset of spectral bands and calculated additional indices. On top of the visible red, green, and blue (RGB), near-infrared (NIR), and short-wave infrared (SWIR) bands (bands 1,2,3,4,5,7 in L5 and L7, and bands 2,3,4,5,6,7 in L8), we calculated the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI) and the Normalized Difference Buildings Index (NDBI) as well as their 10th, 25th, 50th, 75th, and 90th percentiles. The NDVI gives information about the greenness of vegetation (Rouse et al. 1973), the NDWI indicates water features (positive values) or soil and terrestrial vegetation (negative values; McFeeters 1996), and the NDBI acts as an indicator for built-up areas (Zha et al. 2003). As a result, each image mosaic (see Fig. 1a) has 24 bands (composed of spectral bands, indices, and percentiles). These indices and their percentiles provide additional information for training the LC classification algorithm.

Secondly, to retrieve information on social vulnerability, we proxied the socioeconomic status based on the morphologic characteristics of the living environment and built-up structures. We used an informal settlement mask based on a scene-based Local Climate Zone (LCZ) classification of Medellín, performed with the use of a very high-resolution satellite image from the year 2019 and urban blocks (Kühnl et al. 2021). The lightweight low-rise class is an urban structural type within the LCZ schema (Stewart & Oke 2012), which shows typical morphological features of informal settlements like high density, small and low-rise buildings, lightweight construction materials, and sparse vegetation. We extracted the lightweight low-rise polygons with their centroid in our AOI (Fig. 3a, 1b). Figure 3b, c shows two examples of the neighborhoods and building types identified as informal. The accuracy of the informal settlement layer has been measured at 86% (Kühnl et al. 2021). For this study, we manually checked and corrected over- and under-classifications to improve the informal settlement mask.

Fig. 3
figure 3

(© Google Street View 2021) of a characteristic slope covered with informal buildings and c of an exemplary housing structure in an informal settlement. d Landslide hazard map from the POT 2014 (Plan de Ordenamiento Territorial de Medellín). The map categorizes landslide hazard into very low, low, medium, and high. e, f are visual examples of the high hazardous slopes (own source)

a Location of formal and informal settlements in Medellín based on the improved lightweight low-rise built-up class. b Google Street View image

Thirdly, to estimate the amount of population at risk over time we used population data from the official population projections (Proyecciones de Población 1993–2005 a 2015 de Medellín; Alcaldía de Medellín 2015) for the years 1994 and 2006 and the Colombian census (Censo Nacional de Población y Vivienda; DANE 2018) for the year 2018 (Fig. 4). We used these dates since the projections coincide with former census surveys from 1993 to 2005 and the uncertainty in the projection is expected to be lower. Depending on the year, the population data are available in different spatial units (i.e., commune, neighborhood, sector, and section levels; Fig. 4). Data from 2018 are existing at different spatial levels, therefore we used the sector level to disaggregate the population to the pixel level and the section level, with higher detail, to validate the result.

Fig. 4
figure 4

Population density for the years 1994, 2006, and 2018 as well as the different spatial levels of availability. In 2018, the sector level is used to disaggregate the population to the pixel level and the section level to validate the result

Finally, we used an open source landslide hazard map for the whole municipality in vector format provided by the land use plan from Medellín (Plan de Ordenamiento Territorial de Medellín; POT 2014). The hazard map was created by combining all risk-related information available (Fig. 3d). It relies on hazard maps from the POT 2006 and the National University of Colombia from 2009, as well as mass movement inventories by the Administrative Department of Disaster Management (DAGRD), morphodynamic process maps, and all geotechnical and slope stability studies carried out for the municipality of Medellín since 2006. It categorizes landslide hazard into very low, low, medium, and high susceptibility zones (Alcaldía de Medellín 2014c). Hence, it is the best hazard map that is available at the city level, since it includes multi-source data and was created by local experts.

2.3 Conceptual note

Landslide estimation, exposure, and vulnerability of the population is a multifaceted and highly complex problem. This requires highly accurate, diverse, and, in our approach, multitemporal data sets. Since these are not available in the necessary spatial and thematic depth, nor are they consistent, complete, and given over time, we must make some conceptual assumptions based on the available data.

First, we analyze landslide risk using a hazard map from 2014 assuming that the hazard areas keep constant over time. In this study, we consider two risk levels; risk areas when the hazard of a landslide in the POT is medium or high, and non-risk areas when the hazard is low or very low.

Secondly, concerning exposure and the capabilities of remotely sensed data available since the 1990s, we make the following assumptions: pixels classified as urban that are categorized as ‘formal’ in 2018, are per definition also ‘formal’ in previous time steps if they were identified as urban in 1994 and 2006, otherwise, they are non-urban pixels. Similarly, urban pixels categorized as ‘informal’ in 2018 are also informal in previous time steps, when the pixels are urban. Therefore, formal development does not change to informal over time, since this transition is very unlikely. At the same time, we disregard informal settlement upgrade, since we do not have data on informality over time as our informal settlement mask is from 2019.

Thirdly, socioeconomic indicators allowing us to approach the social sphere in a multitemporal manner are inexistent, especially on such a high spatial granularity. Thus, we tackle social vulnerability using morphological parameters as a proxy. We assume that people living in informal settlements are more vulnerable. We assume their economic capabilities reduce the chances of recovering after a disaster. Besides, the definition of morphologic informality differs slightly from informal settlements from a socioeconomic perspective; however, as Kühnl et al. (2021) and Wurm and Taubenböck (2018) show, it is a fairly accurate proxy of precariousness and informality.

2.4 Land cover classification

Our focus for the analysis of exposure to landslide hazards is settlement areas. In this regard, we used Landsat mosaics for the three time steps to classify Medellín into basic land covers (urban, open vegetation, forest, and bare soil). We applied a supervised pixel-based classification method using the Random Forest (RF) machine learning algorithm (Breiman 2001). In preparation, we manually created ground truth sample data. This was done independently at each of the three time steps by visual image interpretation with the help of historical and very high-resolution images from Google Earth©. The sample data were polygons covering areas of homogeneous land cover, which were equally split into spatially independent training and testing polygons (50/50). The image pixels for the training and testing datasets were subsequently randomly selected from the respective polygons. We used the training dataset to build a RF model for each year that was validated against the testing dataset of the respective year. Using spatially independent pixels in the validation, we thus avoided spatial correlation between classification and evaluation. Table 1 shows the accuracy metrics of the RF classification for each time step for the different thematic classes. The User´s accuracy represents the percentage of correctly classified pixels in a class with respect to the total pixels classified as that class. The Producer´s accuracy represents the correct classified sample pixels in a class with respect to the total sample pixels of that class. While Overall accuracy is the ratio between the correct classified pixels from all classes and the total number of sample pixels (Congalton 1991).

Table 1 Land cover classification accuracy metrics per year

We used the urban land covers to create the urban masks for 1994, 2006, and 2018. To diminish temporal fluctuation in the data, we applied a change trajectory analysis similar to the ones applied by Taubenböck et al. 2012. This approach solves errors from the classifier when it is unable to correctly detect urban areas over time. The assumption is that urban pixels cannot change to non-urban pixels. Therefore, several rules were established following the majority rule. For example, if a pixel is urban in 1994 and 2018, but it is classified non-urban in 2006, we change the status to an urban pixel in 2006 as well to keep consistency. Similarly, if a pixel is urban in 1994 and 2006, it should also be urban in 2018. However, if a pixel is urban in 1994, but it is non-urban in 2006 and 2018, the state of 1994 is changed to non-urban.

2.5 Identification of informal settlements

To produce the multitemporal informal settlement masks, we relied on the urban block level vector dataset with informal settlements in 2019 (see Sect. 2.2). We applied the morphological filters dilation and erosion (Soille 2004) to fill the gaps between urban blocks, such as roads, to obtain a continuous surface, and then it was transformed into raster format using the majority rule with a spatial resolution of 30 m. Finally, the informal settlement mask 2019 was spatially overlapped with the multitemporal urban masks, and matching pixels were considered informal settlements. As a result, we had three consistent informal settlement masks for the years 1994, 2006, and 2018.

2.6 Disaggregation of population to the pixel level

We estimated the population at the pixel level using a top-down binary dasymetric disaggregation method originally developed by Wright (1936). This method consists of redistributing population counts from larger spatial units into smaller spatial units (Reed et al. 2018; Stevens et al. 2015; Wu et al. 2005).

In this study, we used administrative boundaries as source zones where population counts are known (i.e., communes, neighborhoods, and sectors for 1993, 2006, and 2018, respectively) and urban pixels, from the urban masks 1994, 2006, and 2018, as target zones. In the first step, we calculated for each source zone the population density by dividing the population count by the urban area. Then, the population at target zones (pixels) was estimated by multiplying the area of the pixel by the previously calculated population density from the source zone where the pixel is located. For those remaining target zones outside of the boundary of a source zone (a few urban pixels in the west and east of Medellín are not covered by the administrative units), we used the population density of the closest urban pixel. Once we had the population at the pixel level for 2018, we extracted the pixels covered by the informal settlement mask from the same year to obtain the informal settlement population at the pixel level. Lastly, we validated the disaggregation method with official population counts at higher spatial detail. We used the population counts at the section level from the year 2018 as validation zones to evaluate the performance of the disaggregation method (Grippa et al. 2019). Therefore, the population at the pixel level was summarized using the boundaries of the validation zones, and subsequently, this sum was compared to the population counts from the validation zones to measure the root-mean-square error (RMSE) and the RMSE divided by the mean validation zone population count (%RMSE).

The same process was then applied to the remaining time steps 1994 and 2006 using the respective commune and neighborhood levels.

2.7 Quantifying exposure and social vulnerability to landslides

To quantify the development of exposure and social vulnerability in Medellín over time, we spatially overlapped the multitemporal assets and people (urban and informal settlement masks and pixel population, Fig. 1a–d) with the landslide hazard map (Fig. 1e).

To estimate how much urban area and population are at risk, we first intersected the multitemporal urban masks with the hazard map to summarize the total urban area for each hazard level and year. For those pixels crossed by the boundary of a hazard level, only the proportional area of the pixel covered was summarized. Secondly, and similarly, the multitemporal population at the pixel level was intersected with the hazard map and the total population was summarized. Likewise, when a pixel was crossed by the boundary only the proportional population of the pixel was summarized for each hazard level and year, and this was calculated using the population density of the pixel and the proportional area of the pixel covered. Finally, this process was replicated with the multitemporal informal settlement masks as well as the informal population at the pixel level.

The results of these spatial analyses enabled the calculation of the amount of exposed and socially vulnerable areas and their population based on their spatial localization for the years 1994, 2006, and 2018 by separating them into no-risk and risk areas. Since the landslide hazard map consists of four levels (very low, low, medium, and high; see Sect. 2.2), we calculated the results based on this division but focus on risk (medium and high landslide hazard) and no-risk (very low and low landslide hazard) areas in the interpretation. We also calculated the ratio between formal and informal settlements over time, using both, the area and population. In addition, we monitored the development of urban areas and population concerning their risk to answer the question of whether exposure and social vulnerability have relatively increased over time in Medellín.

3 Results: spatial and statistical development of landslide risk over time

In 1994, we found 4% (2 km2) of the entire settlement areas (51 km2) in Medellín in landslide-prone areas (Table 2: Urban settlements). In the 12 years up to 2006, the settlement areas expanded to 61 km2. In this context, settlement areas have been increasingly built into landslide-prone areas in the east and west of the city (see Fig. 5a): They grew to a spatial share of 7% (4 km2). And until 2018, the city extended to 77 km2, with the share in landslide-prone areas steadily growing to 9% (7 km2). Thus, considering the spatial location of the city growth from 1994 to 2018 (see Fig. 5a, b), it is clear that urbanization in Medellín occurred disproportionately on these exposed slopes, predominantly in the east and west of the city. Landslide risk was increasing.

Table 2 Development of the urban layout within the study area from 1994 to 2018. Urban settlements include both formal and informal settlements. Percentage values are rounded up
Fig. 5
figure 5

a Urban settlement masks 1994–2018 extracted from the Land Cover (LC) classifications implemented with Landsat data. The urban settlements include formal and informal areas of the city. b Informal urban settlement masks 1994–2018 calculated based on the informal settlement mask 2019 (Kühnl et al. 2021) and the urban settlement masks 1994–2018 based on the LC classifications. c Population estimation 2018 per pixel in the extent of the urban settlement mask 2018 through disaggregation and extrapolation based on the census 2018. Since the estimates are based on official data, their administrative boundaries are still visible to a certain extent. d Spatial localization of at risk areas in Medellín for the time-step 2018. The areas at risk are separated according to their social vulnerability status proxied by formality and informality

Focusing on the informal settlements, proxying the social group of higher vulnerability, we found for the year 1994 that 24% (12 km2) of the whole urban layout was classified as informal (Table 2: Informal Settlements). Of these informal areas, 16% (2 km2) were located in risk areas at that time. It is interesting to see, how by 2006, the share of informal areas grew to 26% (16 km2) and that the share at locations of risk was also increasing to 23% (4 km2). Until 2018 the expansion of informal areas continued, however at slower rates to around 20 km2 (~ 26% of the urban layout). The informal areas at risk remained at around a quarter (26%; 5 km2).

In this sense, a large number of settlement areas at risk in 1994 were of informal character (1.98 out of 2.23 km2). This means that 89% of the landslide-prone elements were based on our proxy socially of higher vulnerability. Similar results were calculated for the years 2006 (3.71 out of 4.19 km2, 89%) and 2018 (5.23 out of 6.54 km2, 80%). However, even though the total settlement areas at risk grew from 1994 until 2018 (from 2.23 to 6.54 km2) and also the informal settlements at risk increased (from 1.98 to 5.23 km2), the relative share of informal settlements at risk became smaller (from 89 to 80%). This indicates that formal settlements with lower social vulnerability were also built in landslide-prone areas over the last 24 years. Figure 5d separates risk areas into formal and informal settlements for 2018, illustrating this finding.

Regarding the population, the %RMSE was measured with an error of 30%, which is around 695 people. Figure 5c shows the result of the pixel-level population disaggregation for the year 2018. The results for the years 1994 and 2006 can be found in the appendix. It can be seen that especially in the northeastern and the western slopes, where informal settlements are predominantly located, the population density is high. In contrast, lower densities are found in the heart of the city and along the Medellín River. In 1994 Medellín was inhabited by around 1.7 million people. Out of them, 6% (109,000 people) were exposed to landslides (Table 3: Urban settlements). Until 2006 the population grew to 2.2 million, with 10% (around 217,000 people) now located in areas prone to landslides. This means that the total number of people living in areas at risk doubled in 12 years. Afterward, population growth slowed down reaching 2.3 million people by 2018. Similarly, the share of people in risk areas decreased compared to the period from 1994 to 2006. In 2018, 13% of the population lived in landslide-prone sites (around 290,000 people).

Table 3 Development of the population structure within the study area from 1994 to 2018. Urban settlements include both formal and informal settlements. Percentage values are rounded up

When we consider the vulnerability of the population by our proxy, we find that in 1994, 35% of the total population was living in informal settlements, with a share of 16% (around 99,000 people) located simultaneously in landslide-prone areas (5.6% of the total population). In the years up to 2006, the share of people living in informal settlements grew to 40% of the total population. Regarding the total number of people exposed to landslide risk and living in informal settlements, the value doubled (around 194,000 people, 22%), which represents 9% of the total population in 2006. Similar to the total settlement and population growth rates between 2006 and 2018, population growth in informal settlements also slowed down until 2018. In this case, around 41% of the total population was located in informal areas in 2018, and 26% of them (around 236,000 people) were located in risk areas at the same time (10.4% of the total population). Therefore, the increase in the share of people at risk was coupled with an increase in the number of socially vulnerable people. However, over the 24 years of monitoring, there has been a decrease in the relative share of exposed people living in informal settlements (1994: 90%, 2006: 89%, 2018: 82%), despite the increase in absolute figures. This indicates that formal settlements were also developed on landslide-prone slopes.

4 Discussion

Increasingly frequent landslides due to climate change and uncontrolled urbanization are the cause of huge human and economic losses worldwide. Multi-source data from Earth Observation in combination with hazard maps and census data can provide key information for supporting risk management. Objective, accurate, up-to-date, and frequent data can be produced on several scales. However, area-wide multitemporal risk assessment studies that tackle exposure and vulnerability of people and assets at the same time, especially from a social perspective, at a fine spatial resolution are still limited. With this study, we show an approach that helps to close this gap: We performed a long-term multitemporal analysis on the evolution of landslide exposure and social vulnerability in a large, complex, and fast-growing urban area, the city of Medellín, Colombia.

First, because area-wide and open source satellite data with resolutions in the 1-m range did not emerge until the early 2000s, a long-term study since the 1990s requires lower resolution data. The use of medium-resolution Landsat images and machine learning algorithms allowed us to produce this long-term LC information with fairly good quality (overall accuracies above 90%) and resolution for a citywide analysis, even if certain uncertainties had to be accepted due to the tropical climate conditions. To create cloud-free mosaics, we used the median of several years. This method may have led to an under-classification of urban pixels in the following LC classification for example. Different statistics for the mosaicking process could be tested to reduce potential errors. And still, Landsat data are not suitable for detailed intra-urban studies. Alternative open satellite imagery such as Sentinel-2 or PlanetScope can overcome this problem. However, only in exchange for a more recent and thus shorter temporal monitoring period. Nevertheless, the LC classifications based on Landsat data are very reliable providing a good basis for the urban mask extraction for three time steps, proxying the elements that may be exposed to landslide hazards. The results were consistent over time and minor gaps were solved using the temporal series. With this approach, it becomes possible to determine the urban development over time, spatially and quantitatively: We quantified that in the last 24 years Medellín grew by 50% of its total settlement area. The spatial analysis with the hazard zoning map showed that urban areas exposed to landslide risks (medium and high hazard levels) have tripled in this period. Medellín had 4% of the urban areas in exposed slopes in 1994 and this value has increased to 9% in 2018. The average growth rate of exposed areas in the city is 0.2 km2/year, indicating that indeed the city is growing toward hazardous areas.

Secondly, regarding the social component of risk assessment, it is widely understood that a disaster affects differently depending on who experiences it. The same loss is more severe for low-income people since they have fewer resources to recover. However, data on the socioeconomic status are often outdated or even nonexistent, especially at the high spatial resolution applied in this study. We assumed that people living in informal settlements have a lower socioeconomic status than their pairs living in urban formal settlements, and thus they are more vulnerable. This allowed us to proxy the vulnerability of social groups based on the morphology of the built-up structures. We are aware that this approach can only spatially proxy a social group and it does not do justice to the complexity of the social concept in reality. But, as Wurm and Taubenböck (2018) have shown, building structures of this type are certainly a legitimate proxy when other data are not available. Using this conceptualization, we produced informal settlement masks for the three time steps. In the setting of this study, with limited data availability, it was not possible to consider informal settlement upgrades over time, and additionally, we assumed that urban formal settlements do not change to informal. We quantified that the ratio between formal and informal settlements was quite constant in the last 24 years, one quarter of the urban settlements are in precarious conditions in Medellín. Similar numbers are also found in the literature. For instance, Echeverri Restrepo and Orsini (2012) identified 25% informal settlements in the city of Medellín, while URBAM (2012) measured 31% in the whole municipality, which is in-line with our calculations. We show that informal settlements are growing at a similar trend as the overall growth of the city. However, when we measured the share of informal settlements that are exposed to landslide hazards, we found an increase from 16 to 26% over the monitoring period compared to an increase from 1 to 2% for urban formal areas. This confirms that informal settlements are in comparison to formal settlements more exposed to landslide hazards. What is particularly interesting is that more than 89% of the exposed areas in the city were informal settlements in 1994, and this was measured with 80% in 2018, indicating that formal settlements are being established on exposed slopes more often than before.

Thirdly, for the assessment of population density, we produced multitemporal pixel-level population maps using urban masks and official population counts. We validated the disaggregation approach using population data on two spatial levels in 2018: We measured an error of 30% in the mean population at the section level, which is lower than in other studies (e.g., Grippa et al. 2019; Stevens et al. 2015). This shows that this approach is able to estimate the population at a very high resolution with a high accuracy. This is crucial because landslides are often small-scale, local events, and therefore estimates on an administrative level are spatially unsuitable. The population maps were estimated both, for the entire city and particularly for the informal settlements. We found that in general population has grown immensely since 1994, in total 30% percent, which is more than half a million inhabitants. In the first place, we measured a sharp increase in the exposed population between 1994 and 2018 from 6 to 13%. Figures of people exposed have doubled with more than 180 thousand new people living in landslide-prone areas in 2018. This result implies that population growth is taking place in hazardous areas. This is probably due to uncontrolled, informal urbanization and lack of available land forcing people into unsuitable, exposed areas. In the second place, we were able to estimate the share of people living in informal settlements. In 1994, 35% of the city population was living in informal settlements, and this amount raised to 41% in 2018, which is around 300 thousand more people living in precarious settlements. It shows that informal urban growth exceeds formal urban growth in Medellin. This 41% is higher compared to the share of informal settlement areas (26%) because these tend to be more densely populated. What is more, 6% (98,589) of the total population in Medellín was exposed to landslides and at the same time had a high social vulnerability in 1994, which increased to 10% (235,973) in 2018. In absolute terms, this is an increase of 137 thousand people in 24 years. A similar number was found by URBAM (2012). They stated that around 284,000 people were at risk of landslides and at the same time had a higher social vulnerability in the Aburrá Valley in the year 2010. This is similar to our estimations, considering that we are only working in the city of Medellín and not the entire valley. Focusing on the exposed population, this study documents how unequal and different risks are for different social groups, and specifically, people living in informal settlements are more exposed to risks. One out of ten people in Medellin lived in informal settlements and at the same time was exposed to landslide risks in 2018. However, the share of informality in exposed populations decreased from 90% in 1994 to 82% in 2018. This result is in-line with the areal analysis, showing that less vulnerable populations as well are settling in exposed areas of the city in the last decades.

With this analysis, we showed the importance of quantifying populations at a high spatial resolution, as natural hazards like landslides do not follow administrative units and are often very local events. Area-wide analyses beyond official borders are also particularly important, as urban agglomerations, especially in the dynamic cities of the global south, outgrow these units. Due to the delimitation of jurisdictional spatial entities, exposed dwellers close to the official administrative city, therefore, get neglected in statistics and thus risk management approaches. As we could show, these are mainly people living in settlements with an informal character, built at the same time in the most exposed areas. On the one hand, our study, thus, confirms individual results of previous analyses. For example, Garcia et al. (2016), Bhaduri et al. (2007), and Dobson et al. (2000), also see the necessity to quantify population in sufficient resolution outside of administrative units for risk management approaches. And Müller et al. (2020) and Baker (2012), exemplarily, found evidence that informal settlements are more exposed to natural disasters, such as hazardous events like floods or landslides. On the other hand, we combine these results and approaches and with it go beyond: mapping the most affected areas and population by potential landslides over the course of 24 years, detached from administrative boundaries, and in comparatively high resolution.

In this sense, we have seen that the combination of EO data with other data sets has a huge capacity to improve knowledge of natural hazard risks of many urban dwellers. Our results are a reliable citywide estimation of exposed locations, urban structures, people, and social groups. We are aware that these assessments are influenced by the spatial resolution of the urban masks or misclassifications and errors in the population estimation. While due to the dynamics of the urbanization process, data, and methodological issues, one might take the absolute numbers cautiously; however, we expect our relative results to be a realistic picture that is consistent in itself. With it, we provide an approach for a comprehensive picture of environmental, economic, and social risks as a basis for informed decision-making leaving no one behind. An extension of interdisciplinary approaches, e.g., with demographers, landslide modelers, and structural engineers, among others promises great potential for further development. In the domain of open EO data and techniques, our approach was developed to increase the potential transferability to other geographical regions in the world. This is especially interesting as we showed that the results from the exposure and vulnerability analyses based on the area and population are quite similar and show similar trends. This means that the methods based on urban areas over time could be replicated in areas where no population data is available to obtain good estimations of populations at risk.

5 Conclusion

In this study, we demonstrated how a long-term analysis of a city's landslide risk can be mapped and quantified with high accuracy from the combination of remote sensing data, hazard maps, and census data. We documented how the total population as well as the total urban area increased considerably from 1994 until 2018 in Medellín, Colombia. We identified that every year more and more settlement areas and people are exposed to landslide hazards. However, this is especially critical for the social group of higher vulnerability, which accounts for the majority of occupied landslide risk areas. We observed that the total number of socially vulnerable people at risk doubled between 1994 and 2018. Although in recent years, the share of population in formal settlements in hazardous landslide areas has increased as well. To conclude, our analysis shows how inequality can be mapped and measured with these heterogeneous data. It is a way to bring this inequality into the spotlight and provide decision-makers with better information to develop socially responsible policies.