Background

Malaria is a major public health problem across sub-Saharan Africa (SSA), and a great variety of spatial methods are being developed for mapping its transmission risk at different scales, ranging from global [1] to sub-national and local levels [2]. With the rapid pace of urbanization in SSA and the ongoing adaptation of malaria vectors to urban environments, there is a growing need for finer granularity in urban malaria exposure mapping, accounting for the influence of urban complexity and heterogeneity [3,4,5,6,7]. Although urban environments are considered to be less favourable to most dominant malaria vectors than rural areas, it is known that malaria transmission occurs across several urban and peri-urban sub-Saharan settings, often around and in the vicinity of breeding sites [8,9,10,11]. In such low transmission intensity settings, spatially-explicit methods relying on vector ecology and on the study of vector habitat suitability determinants can thus be a good complement to methods based on spatial epidemiology to better understand the spatial distribution of malaria transmission risk [12].

Species distribution models (SDMs), also known as ecological niche models (ENMs), use algorithms to make spatial predictions of species based on species location data and a set of spatial abiotic and/or biotic covariates. They cover a variety of methods, among which the well-established maximum entropy (MaxEnt) and generalized additive models, along with more recent machine learning-based models [13, 14]. SDMs are being applied to mosquitoes worldwide, including in Africa, with Kenya and Tanzania being the most covered countries in SSA [15]. However, most existing SDM-based studies involving malaria vectors were conducted in rural settings and/or at a coarse scale [15,16,17,18,19], while very few addressed fine-scale modelling in urban areas [20,21,22,23] where more research on mosquito habitat is dedicated to Aedes aegypti, the vector of dengue, zika and chikungunya viruses. As routine entomological surveillance is not generalized, there is a dearth of spatial data on malaria vector presence and abundance in urban settings both at the larval and adult stages, which may partly explain the paucity of studies [24]. Yet, taking certain methodological requirements into account (e.g., similar ecological conditions, selection of suitable predictor types), the missing data issue could be circumvented through the development of spatially transferable SDMs trained on areas for which data are available [25]. Another conceivable way of addressing this issue is to focus on knowledge-driven deductive approaches that are underpinned by vector ecology knowledge and imply the involvement of local stakeholders [26].

This study relies on vector ecology knowledge and proposes a geospatial framework for fine-grained mapping of urban malaria exposure, as a combination of hazard and population. Hazardous areas are defined as areas with suitable habitat conditions for the adult vector Anopheles gambiae, and the exposed population is the people living in these areas. While it is not possible to derive all the dimensions of socioeconomic vulnerability from Earth Observation (EO) without ancillary data, urban morphological deprivation is used as a proxy, i.e., the dimension of deprivation that is reflected in morphological/physical characteristics of the urban fabric [27]. An extensive list of habitat suitability criteria is provided for SSA in general, along with specific information relating to Dakar, and the corresponding geospatial layers/proxies that can be derived from very-high resolution (VHR) satellite imagery. Alternative open products that could replace VHR products in applications where cost reduction is a priority are suggested, although their use would involve several limitations. The proposed framework is an effort towards methods that have a potential for sustainable impact. It can be implemented in data-scarce settings with free open-source software (FOSS), and it employs simple geospatial modelling techniques that do not require advanced geostatistical training, which facilitates their interpretation by non-specialists. The baseline can be adapted to local specificities through the active involvement of local stakeholders, experts, and/or decision makers. The workflow is evaluated through an application to the metropolitan area of Dakar, Senegal, using layers derived from VHR satellite imagery and open geospatial data. The output of larval habitat suitability modelling is validated with existing entomological survey data [28], and the hazard and exposure maps are assessed by local experts. Processing is carried out using mostly GRASS GIS [29] and R [30] functions, and the web-based computational environment Jupyter Notebook [31].

In SSA, most malaria-related deaths are caused by P. falciparum which is mostly transmitted by Anopheles gambiae [9]. Therefore, understanding the breeding, resting, and feeding patterns of this major malaria vector and including their determinants in the process is essential. The graphic illustration of the vector life cycle presented in Fig. 1 summarizes these patterns. The cycle consists of four stages: egg, larva, pupa, and adult. The first three stages are aquatic and last 5–14 days depending on the temperature. After emergence, both male and female mosquitoes seek a nectar meal to replenish their energy reserve. Following mating (24–48 h after emergence), females seek a blood meal source. After the blood meal, and after resting during the digestion of the blood, the females seek a suitable breeding site where to lay eggs. After oviposition, the females seek a blood meal source again and the cycle is repeated. The time between two blood meals (i.e., the gonotrophic cycle) is shorter if the distance between dwellings and breeding sites is short. Adult female Anopheles responsible for malaria transmission generally do not live more than 3 weeks under natural conditions, depending on the environment [32]. Their malaria transmission potential is linked to their longevity, as only older females are likely to transmit the parasite P. falciparum. [33]

Fig. 1
figure 1

Anopheles life cycle, adapted from [33]

Methods

Study area and data

The Dakar metropolitan area, Senegal, was selected as a case study for testing the framework. In Dakar, where the main vector is An. arabiensis (a member of the complex An. gambiae s.l.), malaria transmission is low, spatially heterogeneous, and highly focal [10, 34]. Urban transmission has long been studied and demonstrated, in the city centre [35, 36] as well as in the suburbs that are prone to flooding due to a shallow water table and unplanned urbanization in the lowlands [28]. The hot, wet season spans from June to November, and the cool, dry season from December to May. The area of interest (AOI) includes the departments of Dakar, Guediawaye and Pikine, and part of the department of Rufisque (Fig. 2). The spatial extent of the output maps is limited to the extent that is common to all input layers.

Fig. 2
figure 2

Overview of the area of interest in the Dakar metropolitan area, and larvae presence points. Base layer: Pléiades 0.5 m natural colour composite ©CNES (2015), Distribution AIRBUS DS.

The image used is a Pléiades pan-sharpened tri-stereo triplet acquired during the hot, wet season of 2015, with a spatial resolution of 0.5 m, and a set of already existing layers derived from it: (i) a digital terrain model (DTM) resampled to 5 m; (ii) a land-cover (LC) map (0.5 m) produced through a semi-automated open-source processing chain for object-based image analysis and supervised machine learning classification [37, 38]; (iii) a map of the dominant land use (LU) at the street-block level [39]; and (iv) a 100 m ×100 m gridded map of population distribution, predicted by top-down dasymetric redistribution of census population data [40]. The land-cover, land-use and population maps are available from the Zenodo scientific repository [40,41,42]. Besides, an open access layer of soil properties was also used, namely the open iSDAsoil layer of soil pH in Africa, predicted at 30 m resolution, at a depth of 0–20 cm [43].

The entomological data used for validating the larval habitat suitability map were collected for a study aiming to locate and characterize anopheline larval habitats in the Dakar suburbs [28]. During the 2013 rainy season, 908 water bodies were surveyed and geolocated with GPS, among which 575 were positive for anopheline larvae. Thirteen types of water bodies were included: basins, canals, market-garden wells, puddles, lakes, flooded abandoned houses, ponds, backwaters, wells, ravines, drain channels, streams, and holes. Anopheline larvae were found in 63% of them, and all water body types hosted larvae to some extent. Here, only the positive samples are considered.

General framework

A geospatial framework is proposed for modelling urban malaria exposure (Fig.  3), defined as the contact risk between adult vector Anopheles gambiae (i.e., the hazard) and human population. A single dimension of vulnerability is included, namely morphological deprivation. Considering entomological data scarcity in sub-Saharan African cities, a deductive spatially-explicit multi-criteria decision analysis (MCDA) [44] is implemented for mapping vector habitat suitability, relying on the Analytical Hierarchy Process (AHP) [45]. Spatially explicit MCDA is recognized as a powerful tool with great potential for supporting decision-making in public health [26]. This type of analysis does not require the use of species presence data for training the model. Such data are used only for validation purposes if availability permits. Another advantage of AHP is that it allows for the active involvement of multiple stakeholders, experts and/or decision makers who can make their voices heard. Ensuring that their input is accounted for in the analysis and that the process is overall interpretable is likely to favour acceptance and uptake of the method [26]. The first step in this type of analysis, once the problem and stakeholders have been determined, is the identification of criteria that are of two types, i.e., factors influencing habitat suitability, and Boolean constraints used for masking out areas that must be excluded. After creating the factor and constraint layers, factor layers are scaled and weighted before being aggregated along with constraints into a habitat suitability map. The workflow is detailed in the next sub-sections. Most steps were automated through the development of a processing chain that relies on open-source software. MCDA processing was carried out at a resolution of 5 m and the final output gridded maps (hazard, population and exposure) have a resolution of 100 m.

Fig. 3
figure 3

Geospatial framework for mapping urban malaria exposure (i.e., contact risk between vectors and human population)

Hazard–a/Larval habitat suitability

Identifying a set of criteria (factors and constraints), and obtaining or producing the corresponding geospatial layers

The first step for mapping hazard is the prediction of larval habitat suitability. The main criteria that influence habitat suitability of the main urban malaria vector in sub-Saharan Africa (i.e., An. gambiae) were identified based on literature and local expert knowledge. The identification of criteria and their translation into geospatial layers for locating sites conducive to vector breeding are the foundation of this analysis. The fine-scale heterogeneity of urban malaria requires going beyond determinants typically used for mapping malaria exposure over large zones, e.g., in rural areas. A relevant selection was made in this respect, also ensuring that producing or obtaining the necessary geospatial data with a sufficient level of detail is reasonably feasible. Eight layers were used to represent the main factors (Fig. 4, and Tables 1, 2, 3, 4), namely (i) a land-cover map (categorical), (ii) a land-use map (categorical), (iii) a landform map (categorical), (iv) the topographic wetness index (TWI) as a steady-state proxy for soil moisture (continuous), (v) the distance to buildings (continuous), (vi) the distance to trees (continuous), (vii) the distance to dumpsites as a proxy for water pollution (continuous), and (viii) the soil pH (continuous). These layers, except for soil pH, are all derived from Pléiades imagery. The existing land-cover layer was adapted to the needs of the analysis by merging the classes low buildings and medium and high-rise buildings into a single class buildings, splitting the class water bodies into small water bodies, medium water bodies, large water bodies, water courses (using OpenStreetMap [46] data as ancillary information), and marine waters (based on local expert knowledge), splitting the class low vegetation into grass and scrub/shrub employing a metric of homogeneity calculated from the Pleiades near-infrared band (i.e., GLCM homogeneity, 11 ×11 pixels), and adding a class dumpsites containing the only large landfill of the city (extracted from OpenStreetMap data). The existing land-use layer was used without any adaptation. Landforms were computed from the existing Pléiades DTM using Geomorphons, a machine vision approach that uses ternary patterns [47]. The two main parameters, namely the outer search radius and flatness threshold, were set heuristically by testing a range of values and checking the result over a part of the area of interest where the relief is marked. SAGA GIS was utilised for producing TWI as it offers a broader choice of algorithms than GRASS GIS for this purpose. The guidelines proposed in a study that assesses the effects of different algorithms on the relation between TWI and soil moisture were followed [48]. DTM sinks were filled with the Fill Sinks XXL algorithm, flow accumulation was computed with the Multiple-flow algorithm, slope gradient with the Haralick (10 parameters) algorithm, and TWI with the Standard method, with cell size area conversion (pseudo specific catchment area). Three distance layers (distance to buildings, distance to trees, distance to dumpsites) were produced from the corresponding land-cover classes. For soil pH, no processing was necessary as the open iSDAsoil layer was used. Factor multicollinearity was assessed with the Variance Inflation Factor (VIF) for avoiding redundancy. VIF ranges from 1 upwards. A value of 1 for a factor can be interpreted as an absence of correlation with the other factors, values between 1 and 5 as low to moderate correlation with at least one other factor, and values greater than 5 as high correlation with at least one other factor.

Fig. 4
figure 4

Larval habitat suitability factors (subset). (a) land cover, (b) land use, (c) landforms, (d) TWI (on hillshaded DTM), (e) distance to buildings, (f) distance to trees, (g) distance to dumpsites, (h) soil pH

Table 1 Land-cover classes derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on larval habitat suitability (from literature and experts)
Table 2 Land-use classes derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on larval habitat suitability (from literature and experts)
Table 3 Landforms derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on larval habitat suitability (from literature and experts)
Table 4 Continuous variables derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on larval habitat suitability (from literature and experts)

Since processing satellite imagery for producing spatially explicit criteria may not be an option in some applications, alternative existing open products are suggested, although they currently have a coarser spatial resolution than those used in this study (as far as rasters are concerned): Open Buildings [49], Esri 2020 Land Cover (10 m) [50], WorldCover (10 m) [51], WUDAPT LCZ (100 m) [52], SRTM (~ 30 m) [53], Global SRTM Landforms (90 m) [54], and Global SRTM mTPI (270 m) [54]. The suggested replacements are detailed in Tables 1, 2, 3, 4, 5, 6, 7.

Using coarser products as input implies several limitations, including the fact that small features cannot be accounted for as they are absent from these open products.

Several identified suitability criteria were excluded from the study, either due to the high cost of the data sources involved (e.g., LiDAR, hyperspectral imagery), their limited geographic coverage (e.g., drone imagery), the complexity of the modelling processes involved for obtaining a sufficient level of detail (e.g., urban meteorological determinants such as air temperature, wind speed, precipitation, and relative humidity), or the lack of in situ data for calibration (e.g., surface water parameters). They are listed in Additional file 1: Table S1 as they could prove usable in future work due to advances in Earth Observation and increased availability of open big data. Moreover, absolute elevation is not accounted for in the selected case study as it is unlikely to have an influence on vector habitat suitability, Dakar being a coastal city with an overall low elevation.

Boolean constraints were created from the land-cover classes buildings, paved surfaces, trees, water courses, marine waters and for a narrow strip along the coastline that includes highly unsuitable features such as beaches and rocks.

Scaling factors

Since factors are different in nature, it is necessary to normalize them to a common scale of values ranging, e.g., from 0 (least suitable) to 100 (most suitable) before aggregating them. The continuous factors TWI and soil pH were rescaled by min–max normalization, and linear membership functions were applied to the distance to buildings, distance to trees and distance to dumpsites. In MCDA applications, scaling criteria using membership functions is a common procedure aiming at reflecting human thought that is able to deal with fuzziness [86]. In fuzzy set theory, real numbers can be mapped to a membership degree in some fuzzy set using a parametric function (e.g., a trapezoidal function). Here, membership functions attempt to capture the fuzziness (or imprecision) of judgements concerning the variation in criteria score that occurs as the distance from objects of interest (e.g., buildings, trees, dumpsites) increases. Categorical factors were rescaled through AHP. Five experts with a strong background in vector ecology filled out pairwise comparison matrices (PCMs) using Saaty's fundamental rating scale [87] (Fig. 5) for comparing sub-factors in terms of suitability, i.e., each land-cover class to other land-cover classes, each land-use class to other land use classes, and each landform to other landforms.

Fig. 5
figure 5

Saaty’s fundamental rating scale

In the first iteration, the experts filled out PCMs following their individual judgements, without consulting their pairs. The consistency of expert judgements was assessed by computing the Consistency Ratio (CR) of each PCM [88]. CR is based on the calculation of a Consistency Index (CI)

$${\text{CI = }}\frac{{\lambda_{\max } - n}}{n-1}$$
(1)

where \({\lambda }_{\mathrm{max}}\) is the principal eigenvalue of the positive reciprocal matrix, and n is the number of factors. CR is the ratio of CI to a Random Index (RI) available from literature that was derived from a large set of random PCMs

$${\text{CR = }}\frac{{{\text{CI}}}}{{{\text{RI}}}}$$
(2)

As a rule of thumb, matrices with CR > 0.10 (i.e., more than 10% as inconsistent as a random matrix) are considered too inconsistent for AHP. However, previous studies have highlighted the difficulty to reach such low values in practical applications, in particular for large PCMs [89]. Moreover, while an elevated level of consistency is desirable, it is also important to respect expert judgements and, therefore, to adapt consistency cut-off values to a level that is deemed acceptable for the study. Here, a second iteration was necessary, to provide some of the experts with the opportunity to revise their judgements in PCMs with CR > 0.15 (CR > 0.20 for the large land-cover PCM with 14 sub-factors) and reaching an acceptable level of consistency. As experts filled out PCMs without consulting their pairs, they functioned as individuals and not as a group. In this case, the aggregation of experts’ opinions is obtained by Aggregation of Individual Priorities (AIP), as opposed to Aggregation of Individual Judgements (AIJ) [90]. AIP can be achieved by calculating their weighted geometric mean (WGM) to obtain a representative priority vector (i.e., the weight vector) for each PCM [91]. The importance assigned to each expert can also be weighted according to their level of expertise. However, since experts who contributed all have a broad expertise and excellent knowledge of the AOI, their judgements were considered equally important and received equal weights.

Weighting factors

Two factor weighting scenarios were considered and compared for assessing the merit of local expert knowledge and knowledge derived from literature, respectively. In the first scenario, AHP was implemented for deriving the relative importance of the factors, as described above. In the second scenario, the weights were derived by an EO scientist based on a literature review, following the same approach.

Aggregating criteria

For each scenario, two HSI maps were produced, the first by calculating the weighted sum of factors

$${\text{HSI}} = \Sigma_{i=1}^{n} (w_{i} x_{i})$$
(3)

where \({w}_{i}\) are the factor weights and \({x}_{i}\) are the factor scores, and the second by multiplying the weighted sum of factors by the product of Boolean constraints

$${\text{HSI}} = \Sigma_{i=1}^{n} (w_{i} x_{i}) *\Pi_{j=1}^{m} c_{j}$$
(4)

where \({c}_{j}\) are the Boolean constraints.

Aggregating HSI to grids and validating the gridded maps

The HSI maps were validated using the 575 samples positive for anopheline larvae. The validation area was spatially restricted to the part of the metropolitan area where the samples were collected. It was delineated by performing a spatial clustering of the sampling points, calculating a concave hull around the 2 resulting point clusters, and adding a 100-m buffer to include the sampling points located on the hull outlines. Inaccessible areas where no sampling could be organized were excluded, e.g., large water bodies. Validation was conducted based on mean HSI calculated in grid cells of increasing sizes (15 m, 25 m, 45 m, 95 m, i.e., from the smallest possible aggregation (3 ×3 pixels) to about 1 ha) to evaluate how spatial uncertainties (such as the precision of survey points' coordinates) affect the accuracy of fine-grained predictions, and what would be a suitable aggregation level for the output gridded map. Accuracy was assessed by computing the Continuous Boyce Index (CBI) [92, 93] with the ecospat.boyce function included in the R Ecospat package [94]. CBI requires observed presence only and assesses to what extent model predictions differ from a random distribution of observed presence data across the prediction gradient. It was proved to be a reliable accuracy measure of presence-only predictions, and previous study showed that it outperforms other evaluators [93]. It takes as input on one hand all predicted suitability values, and on the other hand predicted suitability values at presence records. CBI score varies between -1 and 1, with negative values indicating a poorly performing model, values close to 0 implying similarity to a random model, and positive values increasing with the model's ability to output predictions consistent with the observed presence data. The ecospat.boyce function also outputs the F-ratio that is the ratio of Predicted frequency (P) to Expected frequency (E), allowing to plot the P/E curve as a function of HSI. The second indicator of model performance is the shape of the P/E curve. It complements CBI score, as the latter is not affected by curve shape as long as the curve is monotonically increasing, whereas any divergence from the straight line reveals a lowered ability to distinguish different suitability classes.

Classifying HSI into suitability classes

Providing a map with continuous HSI values to end-users could give them a spurious impression of precision and be misleading. Therefore, the best map of continuous HSI values was converted into a map with four suitability classes: unsuitable, marginal, suitable and optimal, following the method proposed by [93] that relies on the examination of the P/E curve.

Hazard–b/Adult vector habitat suitability

Identifying a set of criteria (factors and constraints), and obtaining or producing the corresponding geospatial layers

A similar approach was adopted for mapping adult habitat suitability, drawing from literature and expert knowledge to select the criteria, and considering the feasibility of obtaining or creating the corresponding spatial layers. In urban areas, the dispersal range of adult vectors around breeding sites is short (up to a few hundred meters [8, 33, 64]), as human hosts are widely available for blood meals. Therefore, the first factor is the distance to larval habitats, as extracted from the best larval habitat suitability map in terms of CBI score. Two layers were created, i.e., the distance to optimal larval habitats, and the distance to suitable and optimal larval habitats. The second factor is the distance to buildings, as a proxy for distance to human hosts. The third factor is the land cover, for which the same layer as for larval habitats was used, with different adaptations. Buildings were not merged into a single class, as low buildings (as a proxy for poorly built dwellings) are more likely to indicate a lower socioeconomic status and are more prone to openings that could let mosquitoes in, thus providing potential feeding and resting opportunities. Trees and shrub/scrub were merged into a single class of leafy vegetation potentially providing suitable sites for mosquitoes resting outside. Water bodies were also merged into a single class as they are considered mostly unsuitable habitats for adult vectors. The fourth factor is the land use, and it did not require adaptations. The factors are presented in Tables 5, 6, 7. No constraints were considered in the analysis of adult habitat suitability for excluding areas. The suggested alternative open products are the same as for larval habitat suitability, and include in addition WSF3D [95] that estimates average building height in 90 m × 90 m grid cells.

Table 5 Continuous variables derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on adult vector habitat suitability (from literature and experts)
Table 6 Land-cover classes derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on adult vector habitat suitability (from literature and experts)
Table 7 Land-use classes derived from VHR imagery, with suggested open alternatives, and knowledge relating to their influence on larval habitat suitability (from literature and experts)

Scaling factors

The distance to suitable larval habitats was scaled using a membership function derived from a study where adult vector density in dwellings was calculated for 7 distance intervals along a transect of 910 m starting from the edge of a large permanent urban wetland (the Great Niaye of Pikine) [36]. The distance to buildings was rescaled with a linear function. For categorical factors (land cover and land use), the same AHP approach as for larval habitat suitability was used.

Weighting factors, aggregating criteria, aggregating HSI to grid, verifying, classifying into suitability classes

As for larval habitat suitability, relative factor importance was assessed by vector ecology experts through pairwise comparisons. The HSI map was produced from a weighted sum of factors, but Boolean constraints were not included. HSI was aggregated to grid cells of 100 m × 100 m to match the resolution of the human population map, and binned into four classes, i.e., unsuitable, marginal, suitable, and optimal corresponding to hazard levels very low, low, medium, and high, respectively. Due to the unavailability of data on the presence of adult vectors having an extensive spatial coverage, the output was visually verified by experts having in-depth knowledge of the area under study and its entomological conditions.

Population and vulnerability

Several global gridded layers of human population distribution are openly available [105] and can be used for mapping human population exposed to the risk of contact with an urban malaria vector. Alternatively, a site-specific map can be created when demographic data and spatial co-variates are available. Here, an existing site-specific gridded population map (Fig. 6) was used. It was produced by redistributing population counts from administrative units in 100 m ×100 m grid cells using a top-down dasymetric mapping approach [41]. Population density was divided into three classes, i.e., high, medium, low. Population values were log-transformed, and the class breaks were defined using the standard deviation algorithm. Due to overall limited availability of timely spatial data on population socioeconomic status, mobility, acquired immunity, awareness level, access to drugs, use of larvicides and insecticides, use of insecticide-treated bed nets, etc. the inclusion of vulnerability dimensions was limited to area-level morphological deprivation. The latter is represented by the land-use class deprived urban areas

Fig. 6
figure 6

Population per hectare estimated through dasymetric mapping, and extent of deprived urban areas

(Fig. 6) that is accounted for in both larval and adult habitat suitability mapping. The relationship between urban deprivation and urban malaria risk is strong, as highlighted by several authors [106,107,108]

Urban malaria exposure

The final output is a 100 m × 100 m gridded map of urban malaria exposure that results from combining hazard levels with population density classes into a bivariate map. Since a single dimension of vulnerability is included in the framework, the term 'exposure' rather than 'risk' is conservatively adopted. The predicted variations in the risk of contact between humans and vectors across the metropolitan area were visually verified by local experts. It is important to consider that the levels of hazard and exposure are not absolute but relative. A high level of hazard in Dakar, an urban area with low endemicity, does not compare to, e.g., a high level of hazard in rural areas with high endemicity.

Results

Hazard–a/larval habitat suitability

No factor had to be discarded due to multicollinearity, as VIF was close to 1 for each of them. The scores of categorical sub-factors obtained from AHP emphasize the high suitability of LC classes small water bodies and medium-sized water bodies, LU classes wetlands, agricultural areas and deprived residential areas, and concave landforms pits and valleys (Table 8). The membership functions used for scaling distance layers are presented in Fig. 7. According to scenario 1 (involving five experts), the factors with the highest relative importance are soil moisture and water pollution, whereas land cover and landforms are the highest ranked in scenario 2 (involving an EO scientist) (Fig. 8).

Table 8 Suitability scores of categorical sub-factors for larval habitat suitability
Table 9 Suitability scores of categorical sub-factors for adult vector habitat suitability
Fig. 7
figure 7

Membership functions for scaling continuous factors (larval habitat suitability)

Fig. 8
figure 8

Relative importance of factors derived though AHP, according to scenario 1 (left) and scenario 2 (right) (larval habitat suitability)

For each scenario, an HSI map was produced and validated using anopheline larvae presence data. Four survey samples were discarded due to geolocation error, leaving 571 usable presence points. The first validation step consisted in comparing the CBI scores of both scenarios in four cell sizes, using only the weighted sum of factors, without Boolean constraints (Fig. 9). CBI scores reached the highest values in small cells, with a sharp decrease as cell size increases (except for scenario 2 at 25 m), which indicates the reliability of fine-grained larval HSI predictions. The best CBI score was obtained by scenario 1 at 15 m (i.e., 3 × 3 pixels), confirming that the involvement of local experts is the best option for producing accurate fine-grained larval HSI maps. Nevertheless, scenario 2 also reaches high CBI scores for small cells, peaking at 25 m, which indicates that drawing on literature is a valid alternative in the case where it is not possible to involve a panel of experts in the analysis.

Fig. 9
figure 9

CBI score for both scenarios (weighted sum of factors, no constraints) in four cell sizes

The impact of adding constraints was assessed by examining the P/E curves. In an ideal model, the P/E curve would be linearly increasing, whereas in a random model, it would be flat. In actual models, curves may exhibit other shapes, as is the case here where they are exponential, implying a better discrimination between high-suitability habitats than between low-suitability habitats. An example is provided in Fig. 10 for scenario 1 at 15 m, both without and with constraints. It appears that constraints mitigate overpredictions in low HSI value ranges, and increase the maximum value reached by the P/E curve (known as the F-value). The F-value is an indicator of deviation from randomness, i.e., an indicator of significance [93]. Similar effects were also generally observed for the other scenario and cell sizes.

Fig. 10
figure 10

Effect of constraints on the P/E curve, scenario 1 (15 m × 15 m)

The next step consisted in converting the continuous HSI into suitability classes, based on the P/E curve [93]. With exponential curves, a broad ‘unsuitable’ category can encompass the plateau (P/E < 1), whereas a finer categorization can be made in the growing part of the curve, e.g., ‘marginal’ (plateau around P/E = 1), then ‘suitable’ up to a change in slope around P/E = 15, and ‘optimal’ for P/E > 15, as shown in. (Fig. 11)

Fig. 11
figure 11

Suitability class boundaries, set according to the P/E curve. The orange horizontal line indicates the performance of a random model. Top: Scenario 1 with constraints (15 m × 15 m). Bottom: Scenario 2 with constraints (15 m × 15 m)

The P/E curves also demonstrate that scenario 1 performs better than scenario 2 for high HSI values. Consequently, the fine-scale map produced from scenario 1 was retained to proceed with the analysis. Figure 12 shows presence points overlaid on larval habitat suitability. Points located close to the edges of suitable areas rather than inside them were likely marked on the shores of flooded zones. An example of optimal area is shown in Fig. 13.

Fig. 12
figure 12

Subset and situation map of larval habitat suitability (5 m), scenario 1 with constraints. The shades of green reflect the different suitability classes

Fig. 13
figure 13

Left: An example of area characterised by optimal larval habitat suitability: highly populated, prone to flooding, with unplanned urbanisation and poor sanitation conditions. Right: Typical small breeding sites of An. gambiae s.l

Hazard–b/adult vector habitat suitability

The suitability scores of categorical sub-factors (Table 9) confirm the strong relationship that exists between urban deprivation and malaria hazard, with LC class low buildings and LU class deprived residential areas obtaining the highest scores. On the other hand, very low scores were obtained for paved surfaces, bare soil, and swimming pools for LC, and for non-residential built-up areas and non-agricultural areas with sparse or no vegetation for LU. More unexpectedly, high-density planned residential areas are judged more suitable than low-density planned residential areas and even agricultural areas. The membership functions for scaling distance factors are presented in Fig. 14.

Fig. 14
figure 14

Membership functions for scaling continuous factors (adult habitat suitability)

Regarding relative importance, the factor with the highest score is by far the distance to breeding sites, followed by the distance to buildings (Fig. 15).

Fig. 15
figure 15

Relative importance of factors derived through AHP (adult vector habitat suitability)

Two adult vector habitat suitability maps were produced, where the suitability classes reflect the hazard levels (i.e., unsuitable corresponding to very low hazard, marginal to low hazard, suitable to medium hazard, and optimal to high hazard). The first map uses the factor distance to optimal larval habitats as input (Fig. 16). It is more restrictive than the second that uses the factor distance to suitable and optimal larval habitats (Additional file 2 Figure S1). It should be noted that the hazard levels are relative, and specific to the urban context of the Dakar metropolitan area that is overall a low transmission setting. The maps reflect the low dispersal of adult vectors from their breeding sites. This phenomenon is explained by the proximity of their blood meal source [33, 109, 110]. During the field survey in the suburbs of Dakar, more than 90% of anophelines’ breeding sites were found at a distance smaller than 10 m from human dwellings. Moreover, the areas where anopheles mosquitoes' breeding sites were particularly abundant during the rainy season were correlated to the presence of flooded abandoned houses that served as resting places [28].

Fig. 16
figure 16

Adult vector habitat suitability (i.e, hazard) (100 m), based on distance to larval habitat suitability class “optimal” and other factors

Urban malaria exposure

The bivariate urban malaria exposure maps resulting from a combination of hazard levels with population density classes characterize the likelihood of contact between adult vectors and humans. Since areas that are optimal for adult vector habitat are also generally areas that are densely populated, the hazard maps (Fig. 16 and Additional file 2 Fig. S1) and the exposure maps (Fig. 17 and Additional file 2 Fig. S2) display similar patterns. The areas that combine high hazard with high population density are mostly located in suburbs prone to flooding due to their unfavourable situation in lowlands. This finding is consistent with previous epidemiological studies [77, 111]. In Dakar, 62% of the urban population live in the suburbs, thus causing strong demographic pressure associated with uncontrolled urbanization [112]. This leads to the proliferation of deprived overcrowded neighbourhoods with poor sanitation infrastructures. Several areas combining high hazard with medium population density are found close to humid zones, e.g., zones devoted to market gardening. Previous work in the Dakar suburbs has shown the importance of micro-ecological conditions, in particular the presence of breeding sites, on the intensity of malaria transmission. The risk of being bitten by infected Anopheles females was higher in the area where the presence of breeding sites was higher [71]. Fig. 17 highlights a large area located in Pikine that combines high hazard with high population density. It is the least urbanized in terms of infrastructure and actually has the highest levels of population density.

Fig. 17
figure 17

Urban malaria exposure (100 m), based on adult habitat suitability derived from the larval habitat suitability class “optimal” and other factors. Areas of very low to low hazard are not emphasized

Discussion

Application

Applying the framework to Dakar using VHR imagery resulted in three types of output. The first output is composed of the larval habitat suitability maps at a resolution of 5 m that were validated with entomological survey data. The results shown in Fig. 12 are consistent with previous field observations on the distribution of Anopheles breeding sites [28]. Indeed, the most suitable areas for anophelines breeding sites across the studied urban setting consist of rain-filled shallow water bodies. Moreover, the proximity of such stagnant water bodies to densely populated areas contribute to the proliferation of oviposition sites readily accessible to gravid females of An. arabiensis, the main vector of malaria in Dakar [71]. The location of breeding sites is also linked to rapid uncontrolled anthropisation with inappropriate land use planning and poor sanitation, another key factor influencing the abundance of breeding sites of malaria vectors. Nevertheless, suitable areas were identified not only in the flood-prone deprived suburbs but also, to a lesser extent, in planned urbanized areas. On the other hand, the low occurrence of anopheline breeding sites in some areas could be linked to a soil texture that favours the infiltration of rainwater, or to improvements of the water draining system [113, 114] that reduce the number of stagnant water bodies. These aspects were not accounted for in this study. Puddles likely play the most important role in the production of Anopheles larvae. However, identifying every puddle would require the use of images with an even finer resolution than Pléiades (e.g., drone imagery), and frequent acquisitions to account for rapid changes, which seems costly and unrealistic. Instead, a more effective approach was put forward that uses a conjunction of factors for identifying areas that are prone to the formation of puddles. TWI, as a proxy for soil moisture, and concave landforms play an important part in this process. Besides, water pollution is also identified as a crucial factor, although it is known that vectors are adapting to it [9, 12, 82, 83]. The second output is the adult vector habitat suitability maps at a resolution of 100 m (i.e., the hazard maps) that were verified by experts. The proximity of the three essential elements of the gonotrophic cycle, namely the breeding sites, the source of blood meals and the resting places explain the high habitat suitability, in the areas highlighted by the map as hazardous. The distance to breeding sites is considered the main factor to account for in adult vector habitat suitability mapping, and the developed approach allows for deriving it from suitable larval habitats. The other factors help refine dispersal patterns according to the availability of hosts for blood meals and resting sites. Low buildings (likely to indicate a lower socioeconomic status in Dakar, although they could reflect certain types of affluent neighbourhoods in other contexts) and deprived urban areas offer suitable conditions in this respect. The third output is the urban malaria exposure maps at a resolution of 100 m. The patterns depicted by both the hazard and exposure maps display similarities and are consistent with findings of previous epidemiological studies. The proliferation of breeding sites increases the probability of high adult vector densities in their vicinity, which in turn exacerbates exposure in areas with high population density and poor sanitation.

Limitations of the approach

The approach has some limitations that must be acknowledged. First, some of the identified criteria were discarded, e.g., those that imply a high production cost, or require access to in situ data, as the aim was to propose a method that can be replicated in other cities under cost and data availability constraints. In addition, a better indicator of water pollution than distance to landfills should be considered in future studies, to account for the influence of household and industrial wastewater. Besides, uncertainties are present at several stages of the process, starting with the input datasets that are derived from modelling. In particular, the weights of factors and sub-factors strongly influence the results, and they are likely to suffer from inconsistencies. This was mitigated by collecting multiple judgements from a panel of experts and allowing these experts to revise their judgements whenever inconsistency exceeded a predetermined threshold. The impact of changes in the relative importance of factors on the result was also tested. In addition to thematic uncertainties, spatial uncertainty is also present, notably due to the different spatial resolutions of the data used. Therefore, discrete 100 m × 100 m gridded hazard and exposure maps were produced instead of continuous maps with a finer resolution, in view of reducing both spatial and thematic uncertainty.

Replicability

To facilitate replication, a baseline workflow relying on open-source software functions was put forward. Adaptations will be required for every future application, depending on input data availability and local specificities. To circumvent the obstacle of VHR satellite imagery cost, alternative open data were suggested, although their use involves limitations such as the inability to account for small features (e.g., small water bodies that are among the most important factors), and the missing land use classes (e.g., deprived urban areas). In future applications, the choice between using a mix of data derived from satellite imagery and from open data or relying entirely on open data will depend on the level of detail that needs to be attained, as well as on the budget and EO skills at hand. With the current rapid increase in the availability of broad-coverage geospatial datasets, the need for pre-processing and processing of EO data is expected to diminish, as finer-scale readily usable open data covering a variety of themes continue to be released. The main bottleneck is the limited availability of accurate and timely spatial data on urban deprivation. Nevertheless, research is underway in this field and it is likely that such data will be made available in the near future [115].

Perspectives

Perspectives for future research include testing the workflow using only open data and testing the replicability of the approach in other cities having a different profile, more particularly secondary cities and cities located in different climate zones. Scalability should also be investigated, e.g., using cloud computing platforms such as Google Earth Engine or Microsoft's Planetary Computer. Adding temporal moisture indices, e.g., from Sentinel-1/2, as a complement to steady-state TWI may also be beneficial for adjusting the results according to seasonal variations. Subject to data availability, more dimensions could be included in the vulnerability component, such as immunity, behaviour, movements, and proper use of Long-Lasting Insecticidal Nets (LLINs). Furthermore, since policies are being established for more systematic collection of epidemiological data in the future, a combination of methods based on vector ecology knowledge with methods implementing fine-grained spatial epidemiological modelling [4] may prove essential to support evidence-based urban malaria control.

Conclusions

In an effort to bring geospatial research output closer to effective support tools for evidence-based policies and targeted interventions, a spatially explicit approach was developed and systematized for mapping urban malaria exposure in a context of epidemiological and entomological data scarcity. While it relies on well-established methods, its novelty resides in (i) the key role played by expert knowledge in vector ecology, (ii) the broad set of criteria identified and used, (iii) the fact that hazard is not directly derived from larval habitat suitability but from adult vector habitat suitability, (iv) the inclusion of urban deprivation as a proxy for vulnerability, and (v) the fine spatial resolution of the results, as required to account for the high degree of heterogeneity observed in urban areas. The application of this approach to a case study demonstrated its potential for sub-Saharan African cities and highlighted that in addition to the influence of environmental factors, urban deprivation also plays an influential role in urban malaria exposure. A baseline workflow for favouring further applications was proposed, and as the recent trend in fast-increasing availability of open, broad coverage, ready-to-use spatial layers derived from EO is expected to continue, it will contribute to reduce the need for EO data processing. Last but not least, building or strengthening the capacities of local actors in geospatial methods is essential to foster the sustainable uptake of approaches such as the one developed in this study.