Introduction

A large number of neglected tropical diseases (NTD) in sub-Saharan Africa account for approximately 200,000 deaths annually as well as 57 million lost life-years1.The most significant of these diseases, schistosomiasis, is the second most prevalent parasitic disease only after malaria in several sub-Saharan African countries1,2, severely affecting low-income rural communities with poor sanitation3. Schistosomiasis negatively impacts child development, pregnancy outcomes, and agricultural productivity, perpetuating poverty for millions of Africans1,3,4. In spite of only making up 13% of the global population, sub-Saharan Africa accounted for 90% of schistosomiasis cases5.

Human schistosomiasis is caused by species of schistosome trematode worms: Schistosoma mansoni, S. haematobium, S. japonicum, S. intercalatum, and S. mekongi. These infections manifest in two main forms: intestinal schistosomiasis, attributed to S. haematobium, and urogenital schistosomiasis, associated with other species such as S. mansoni6,7. The life cycle of Schistosoma initiates when parasitic eggs from infected human feces or urine enter freshwater sources. Under favorable environmental conditions, these eggs hatch into miracidia, which actively seek out and penetrate suitable IH snails. Asexual reproduction occurs within the snails, leading to the development of cercariae. At this advanced stage, the cercariae are released into the water as free-living parasites and can penetrate human skin, thereby completing the cycle and causing the disease6. Notably, Bulinus and Biomphalaria snails act as IHs for S. haematobium and S. mansoni, respectively7. Schistosoma haematobium and S. mansoni are prevalent in Sub-Saharan Africa, significantly contributing to the burden of schistosomiasis.

While the 2020 goal for schistosomiasis elimination proved elusive8, control efforts in Sub-Saharan Africa, specifically East Africa, have predominantly relied on mass chemotherapy, particularly for school-aged children9,10. However, recognizing the inadequacy of this approach alone, there is an urgent call for alternative strategies10. The integration of One Health into the WHO 2030 NTD roadmap, encompassing human treatment, livestock treatment and/or vaccination, environmental management, and snail control, has garnered increased recognition for its potential impact11,12. From an ecosystem perspective, factors influencing the presences of Schistosoma parasites and their snail hosts can significantly impact the transmission dynamics of schistosomiasis12. Investigating such factors aligns with the WHO 2030 NTD elimination strategy11. Nonetheless, targeting IH snails, demonstrated as effective13, holds promise. However, our understanding of various aspects related to snail hosts remains limited, with a scarcity of studies providing comprehensive prevalence data and identifying significant features influencing the distribution of Biomphalaria and/or Bulinus IH snails14,15. This knowledge gap is particularly pronounced in East Africa overall, with persistent schistosomiasis hotspots in Kenya and Tanzania9.

Machine learning techniques, particularly random forest (RF)16, have gained wide application in various scientific domains for classification and regression analyses pertaining identification of significant features14,17,18,19,20. In classification tasks, RF has demonstrated superior predictive accuracy compared to other methods, such as logistic regression21,22,23. RF is resilient to multicollinearity, a common issue in ecological datasets24. It also offers effective solutions for addressing missing data25. RF aids in discerning predictor variables with substantial influence on response variables, distinguishing them from those that may not contribute significantly. Therefore, this research aims to provide comprehensive insights into the distribution of IHs in the East African region using RF to identify the spatial distribution of IH snail distribution and the significant features driving their distribution.

Currently, only one documented study exists for East Africa region as a whole, albeit restricted to Biomphalaria IHs and a limited number of surveyed locations, considering just eight predictor features26. This previous study gives a first impression, however, obtaining robust results may necessitate the inclusion of a broader array of potential features in the analysis. This challenge becomes especially complex in regions like East Africa characterized by variable occurrences of both Biomphalaria and Bulinus IH species, in conjunction with diverse geographical, climatic, environmental, and anthropogenic factors. This highlights a substantial gap in understanding the distribution of the two genera, which are the primary contributors to the schistosomiasis burden in the region. To address this knowledge gap, our study has two primary objectives: a) to assess the significance of a broader array of potential features, including climatic, environmental, topographic, and human impact factors, in influencing the distribution of IH snails of both Bulinus and Biomphalaria snails in East Africa, and b) to determine the anticipated probability of occurrences for the pertinent species within the genera based on the most significant factors.

Material and methods

Description of study area

The study area spans the East African region, including Uganda, Kenya, and Tanzania, situated within the Tropics of Cancer and Capricorn. East Africa covers an extensive area of approximately 6667 Mio \({{\text{km}}}^{2}\) and is home to roughly 488 million people, making it the most densely populated sub-region in Africa27. This region is rich in freshwater sources, such as swamps, rivers, and (crater) lakes, but also man-made structures such as dams and irrigation schemes, serving as potential habitats for IH snails14,28,29. In addition, East Africa exhibits a diverse range of geographical, climatic, hydrological, and human-induced factors, all of which are highly relevant for the distribution of IH snails. Importantly, both S. mansoni and S. haematobium are major disease burdens in the region associated with the presence of both Bulinus and Biomphalaria species14,30.

Occurrence and geographic data

The geographic distribution of occurrence data for the Biomphalaria and Bulinus IH snails in the study area can be found in the Supplementary File S1 Fig. 1. We collected geographic data (longitude and latitude), pertaining to Bulinus and Biomphalaria distribution in the three East African countries Uganda, Tanzania and Kenya, including data previously reported by Chibwana et al.31, Tumwebaze et al.32, Tabo et al.14, as well as those reported in the Global Biodiversity Information Facility (GBIF), that include recent data from the museum specimens and DNA barcodes33. The information obtained from GBIF constitutes secondary data retrieved online, whereas the remaining three sources involve primary data collected through field surveys. This dataset encompassed all Biomphalaria species, universally acknowledged as hosts, and selectively featured specific well-documented host species of Bulinus (see the Supplementary Table S1). After obtaining the data, we imported it into the R statistical environment, version 4.0.334, and conducted a thorough data cleansing process by removing duplicate records. Subsequently, we harnessed the processed geographic data to extract environmental, climatic, topographic, soil content, and human influence drivers associated with occurrence data of IHs using the R programming language, Google earth engine35, and the ArcGIS Pro geographical information systems (GIS), as briefly described in Sects. "Climatic and environmental features"–"Human impact features".

Climatic and environmental features

Climate factors such as temperature, precipitation, and natural habitat conditions are recognized for their impact on host snail distribution patterns36,37,38. To account for the potential preference of IH snail species for climatic variations, we obtained high-resolution bioclimatic data from the WorldClim (v2.1) global dataset, typically spanning records from 1970 to 2000 with a spatial resolution of 340 km2 (10-arc minutes)39, within the R statistical environment. We excluded most bioclimatic features and selected mean annual temperature (BIO1), temperature of the warmest month (BIO5), temperature of the coldest month (BIO6), annual precipitation (BIO12), precipitation of the wettest month (BIO13), and precipitation of the driest month (BIO14), which have been extensively documented for their impact and the biological relevance for the presence and distribution of IH snails14,37,38. In addition, we computed the mean land surface temperature (LST) using the MOD11A1.061 Terra Land Surface Temperature and Emissivity Daily Global 1 km dataset within Google Earth Engine , an indicator of energy exchange at the land surface-atmosphere interface known for its influence on climate and ecosystems40. We have averaged all LST data for the years 2000, 2010, and 2020, accommodating any temperature and emissivity fluctuations over the past two decades. In the Google Earth Engine platform, we scripted the extraction of the Normalized Difference Vegetation Index (NDVI) from the MODIS product MOD13Q1 (2021) V6.1, offering valuable information at a 250 m pixel resolution41. We have averaged NDVI data for the years 2000, 2010, and 2020, accounting for any fluctuations in the index over the past two decades. The NDVI is a widely-used indicator for the quantification of vegetation health and density42,43.

In addition, land cover, which is known to significantly impact snail habitat suitability44, was considered and extracted from the MODIS Land Cover Type Yearly Global 500 m dataset via Google Earth Engine45. The land cover classification employed in this study distinguishes 17 land cover classes, including 11 natural vegetation classes (such as forests, open herbaceous areas, and wetlands), 3 human-altered classes (comprising agricultural land and built-up areas), and 3 non-vegetated classes (including snow, rocks, and water bodies). Furthermore, various physiochemical properties previously studied for their effects on IH snail distribution14,26,46 were integrated into our analysis. This included soil pH, soil organic carbon content in fine earth, and soil cation exchange capacity obtained at a 30 m resolution at a depth of 0–20 cm and 20–50 cm from the Innovative Solutions for Decision Agriculture Ltd (iSDA) data set via Google Earth Engine47. Additionally, data on soil composition, including clay, sand, silt, nitrogen content, and pH (measured in H2O) at a depth of 0-5 cm, were sourced from the International Soil Reference and Information Centre (ISRIC), the World Soil Information Service48.

Topographic features

We included topographic metrics, such as altitude, slope, and distance to the next water body as surrogate indicators of biogeographical isolation, which can influence colonization and limit dispersal, potentially impacting IH establishment in the region14,49. Altitude data, a key topographic factor affecting snail host distributions and prevalence of schistosomiasis50, was obtained from the WorldClim database. Slope was derived from the Shuttle Radar Topography Mission (SRTM) digital elevation data using Google Earth Engine at approximately 30 m resolution51. The nearest distance from occurrence points to surface water bodies was calculated using the "Near" tool in ArcGIS52.

Human impact features

We integrated two significant indices, the Human Influence Index (HII) and the Human Footprint Index (HFI), to assess the impact of human activities on the distribution of IH snails. We obtained HII Data from the Last of the Wild Project (version 2, 2005) at a spatial resolution of 1 km from NASA's Socioeconomic Data and Applications Center (SEDAC). This dataset quantifies relative human impact within each terrestrial biome using scores, derived from 9 global data layers. These layers include factors such as human population pressure (population density), human land use and infrastructure (built-up areas, nighttime lights, land use/land cover), and human access (coastlines, roads, railroads, navigable rivers)53. Scores range from 0 to a maximum of 72, with higher scores indicating greater human influence and lower scores suggesting less human influence. Likewise, we acquired HFI data from the Last of the Wild Project (version 3, 2009) through SEDAC (NASA) with a spatial resolution of 1 km. The dataset encompasses eight variables, such as built-up environments, population density, electric power infrastructure, crop lands, pasture lands, roads, railways, and navigable waterways. Scores within the range of 0 to 50 were assigned, where higher scores signify increased human influence and lower scores indicate less human influence53. We acquired region-specific data for both HII and HFI in a geographic coordinate system (GCS) from the SEDAC webpage, then extracted pixel-level data for both indices using the "Extract Values to Points" tool in ArcGIS. Note that SEDAC was preferred because it provided the most recent spatial/geographic data for both HII & HFI.

Data analysis

For assessing the importance of predictor features in both Bulinus and Biomphalaria RF models, we applied a cross-validation based on presence or absence (1/0) feature sensitivity, a widely-used resampling technique to evaluate generalization capabilities and prevent overfitting54. Cross-validation serves to evaluate the stability of variable rankings and mitigates the influence of randomness in the assessment process. The significance of individual parameters in the overall RF models was evaluated using two crucial metrics, Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini (MDG)17. MDA is suitable when the goal is to maximize the overall accuracy of the classification model while MDG is often used when the goal is to build decision trees that create nodes with high homogeneity, resulting in better separation of classes17. Notably, variations in MDA and MDG outputs are common due to distinct calculation approaches and metrics. Addressing ranking disparities between MDA and MDG, we incorporated both metrics but primarily underscored features deemed significant by both metrics. Thus, when interpreting variable importance, it is advisable to prioritize relative rankings over comparing absolute values between these two measures, ensuring a more comprehensive understanding of feature significance and analytical robustness. In addition, to visually represent how individual predictor features influence the behavior of each IH snail in the region, we employed partial dependence plots14,55. The plots illustrate the relationship between a specific significant variable and the occurrence of the species while keeping all other variables constant.

Results

Occurrences of IH snails and associated predictor features

The data consists of a total of 455 recorded occurrences for Bulinus (52%) responsible for the transmission of S. haematobium, and 412 (48%) for Biomphalaria transmitting S. mansoni. Specifically, the dataset encompassed 77, 69, and 309 records for Bulinus species and 134, 143, and 135 records for Biomphalaria species in Uganda, Kenya, and Tanzania, respectively.

Overall, we considered 23 predictor features for the RF model. Their spatial resolution, potential mean value, standard deviation, and range variation are shown for both genera (Table 1). Detailed occurrence data, along with corresponding geographic information for the IH snails, can be found in the Supplementary Table S1. In general, both genera share parameter values that exhibit minimal spatial variation considering range of their potential predictors. This similarity in most features is potentially influenced by the location of the region within the same tropical climate zone favoring both species. For example, the data shows that the altitudinal range for Biomphalaria ranges from 46 to 2342 m.a.s.l, while for Bulinus, it spans from 3 to 2058 m.a.s.l. Additionally, soil conditions in the region, which tend to be alkaline, reflect a complex interplay of various soil components (clay, silt, sand), soil cation exchange capacity, and the bulk density of the fine earth fraction. The high nitrogen content (0.5–4.6 g kg–1) in the area can be attributed to emissions from decomposing organic matter such as vegetation (index range 0.18–0.8), land cover, and human activities like deforestation. Nevertheless, an evaluation of the significance of individual parameters in the cross-validated random forest models for Biomphalaria and Bulinus has been conducted and is presented in Sect. “Variable importance”.

Table 1 The input predictor parameters, their spatial resolution; mean values, standard deviation, and the range: Buli Bulinus species, and Biom Biomphalaria species.

Variable importance

In general, geography, precipitation patterns, temperature variations, and environmental parameters within the region play a significant role in shaping the distribution of both Biomphalaria and Bulinus, although their relative contributions to the two models vary across the region and the method for the detection of variable importance (Fig. 1). Parameters highlighted in blue are considered strong predictors with significant influence, while those in black exhibit minor influence, and those with negative variable importance values in red are considered non-significant predictors according to the MDA metric (Fig. 1, left). We based on the same order of feature importance in MDA to categorize results in MDG into blue, black, and red (Fig. 1, right). Specifically, the most influential features affecting the distribution of Biomphalaria IH snails include altitude, and mainly climatic features, i.e. precipitation during the wettest month (BIO13), mean annual temperature (BIO1), and mean annual precipitation (BIO12). Features with minor contributions to the Biomphalaria model include the remaining climatic features like precipitation during the driest month (BIO14), temperature of the coldest month (BIO6), temperature of the warmest month (BIO5), and soil related features, as well as land cover. Additionally, the Human Footprint, land surface temperature, water distance, and vegetation index were found to have a lesser significance only with the MDA metric. The other parameters were found to be non-significant for the Biomphalaria model.

Figure 1
figure 1

Contributions of the predictor features to the distribution of Biomphalaria (upper panel) and Bulinus (lower panel) considering the variable importance by mean decrease in accuracy (MDA, left)) and mean decrease in Gini (MDG, right). The prominently significant features are highlighted in blue, those with minor influence are marked in black, and those in red are considered non-significant. For abbreviations of features see Table 1.

For Bulinus, the most significant parameters influencing its distribution are altitude, and again climatic features such as precipitation during the wettest month (BIO13), mean annual temperature (BIO1), mean annual precipitation (BIO12), as well as some soil features (nitrogen concentration, clay content). All other features are less relevant. Of these, parameters like land surface temperature, water distance, vegetation index, bulk density of fine earth fraction, and slope were found to be significant only when using the MDA method and only to a minor degree. Parameters that were not found to significantly impact the Bulinus IH species distribution at all include soil pH, organic carbon content, and the Human Footprint, amongst others (Fig. 1).

Predicted probabilities for the occurrence of IH snails

The simulated probabilities of genus occurrence in relation to the significant features identified in Fig. 1 demonstrate non-linear relationships, for both Biomphalaria and Bulinus IH snails. The likelihood of encountering Bulinus species is generally higher than that of Biomphalaria species in the region based on their probability values (Fig. 2). Nevertheless, the predicted probabilities for both genera exhibit consistent patterns concerning the importance of altitude, precipitation during the wettest month (BIO13), mean annual temperature (BIO1), and annual precipitation (BIO12). As altitude increases, the probabilities of occurrence for both IHs exhibit a steep rise up to an elevation of approximately 500 m.a.s.l. Beyond this point, the occurrence gradually increases, albeit at a very gradual rate, until approximately 1500–1800 m.a.s.l., where the trend peaks with a noticeable decrease in the likelihood of encountering these species (Fig. 2).

Figure 2
figure 2

Likelihood of Biomphalaria species (1st panel) and Bulinus species occurrence (2nd and 3rd panels) in relation to the significant featuresfeatures identified by both importance metrics (MDA and MDG) in the Biomphalaria and Bulinus models. (Compare Fig. 1; see Supplementary S1 Fig. 2 for Biomphalaria and Fig. 3 for Bulinus predicted probabilities for the remaining predictors.

Conversely, for the occurrence of both IHs, the predicted probabilities decrease with a rise in precipitation levels less than 300 mm in the wettest month (BIO13) and annual precipitation (BIO12) of less than 1000 mm. This is followed by a slight increase at the end of the trend for BIO13 and a strong increase for BIO12 between around 1250 to 1900 mm. The feature mean annual temperature (BIO1) shows a complex relationship, indicating a gradual increasing trend towards higher values between 20 and 25 °C, followed by a steep probability decrease. Additionally, the association with the temperature of the coldest month (BIO6) indicates a decreasing probability of encountering Bulinus between 8 and 20 °C, followed by a slight increase up to 22 °C. The probability of encountering Bulinus increases with an increase in the soil nitrogen content, with a high probability occurring above 2 g/kg. However, the association with clay soils is complex, generally exhibiting an increasing trend that peaks at ~ 50% content of clay soils, followed by a slight decrease up to 60%.

Discussion

In this research, we relied on geographical data sourced from literature and the GBIF database to investigate the distribution of Biomphalaria and Bulinus IH snails for Schistosoma within the East African region. We observed minimal variation in the potential determinants of the distribution of both Biomphalaria and Bulinus snails across the regional scale. Geography and climate played a significant role in the distribution of Biomphalaria, while geography, climate, and to some extent, several soil factors, were crucial factors shaping the presence of Bulinus snails. However, it is crucial to note that the varying significance of parameters, highlights the intricate nature of snail behavior and distribution. Numerous interacting factors can convolute the straightforward impact of specific parameters potentially attenuating their effects in the model. In the following sections, we discuss IH snail occurrence in relation to the significant, minor, and non-significant predictor features within an ecological context.

Most significant predictor features of IH snail occurrences

The identification of significant features for both IH groups relied on high variable importance values, and similar results in both MDA and MDG metrics. Nonetheless our findings reveal that both genera thrive better below 500 m.a.s.l of altitude, potentially because lower altitudes promote stagnant water, facilitating breeding, while higher altitudes facilitate water flow56, a reflection of the dispersal patterns of the IH snails14. Thus, the variation in the altitude of the study area plays a pivotal role, although it is important to note that Abe et al.57 found that altitude did not significantly impact the distribution of Bulinus snails which they associated with the lack of altitude variation in their study area. Nonetheless, our findings complement the previous research studies which have reported differing upper altitude limits for IH snail occurrence in Uganda, with values ranging from 1400 m a.s.l58, to more than 1600 m a.s.l14, and even above 2000 m.a.s.l50. Notably, Bulinus species have been documented at exceptionally high altitudes (3997 m.a.s.l)32, showing favorable conditions at such altitudes. People in high-altitude populations are at risk of disease exposure, yet often receive minimal attention from health authorities and vector control programs, posing a significant concern for their health. Therefore, dedicated research is needed to establish an upper limit for both forms of schistosomiasis and assess their potential impact on host-parasite interactions and transmission of the disease. Additionally, further investigations are required to determine whether the observed and assumed shifts in altitudinal thresholds are attributable to climate change or other factors.

Furthermore, the foremost significant drivers affecting the distribution of both Biomphalaria and Bulinus snails according to our study are the climate features, temperature, and precipitation. In contrast, a locally restricted study in western Uganda14, assigned a lesser degree of importance to climate. This, suggests that the precise impact of climate change on IH snails and schistosomiasis is likely to exhibit variations based on geographical or spatio-temporal scales under consideration59. Precipitation serves as a critical metric for assessing the availability of suitable water bodies that snails are known to inhabit36. For example, climate change can lead to fluctuations in regional precipitation levels, which may in turn modify transmission patterns and the onset of schistosomiasis36,38. Nonetheless, an increase in precipitation levels contributes to the proliferation of breeding sites by increasing surface runoff into freshwater ecosystems60, thereby enhancing the supply of organic matter, which serves as food for the snails, ultimately promoting their growth and fecundity60,61. Moreover, precipitation events provide suitable conditions for snails to emerge from estivation within temporary breeding sites, coinciding with a higher peak of reproduction among these organisms62. This would also explain the strong increase of IH snails’ occurrence with precipitation features we found in our analysis. However, it is worth noting that excessive precipitation can also have adverse consequences on the distribution of IH snails60. Heavy rainfall can cause the breeding sites to be flooded, which dislocates snails and leads to a decline in snail populations. Consequently, snails disperse to new locations, establishing new areas for these vectors and posing a risk for the renewed transmission of schistosomiasis60. In contrast, during dry seasons, precipitation levels are low, and snails need to adapt, can undergo aestivation and their occurrence reduces, this may be a possible explanation for the negative correlation with the precipitation during the warmest months.

In a comparative context, our study emphasizes the importance of temperature in shaping snail distribution patterns across the broader East African region. Generally, freshwater snails are ectothermic, meaning their body temperature is regulated by the surrounding environment12. Temperature plays a crucial role in determining the development, survival, and reproductive rates of snails, as corroborated by multiple studies10,36,37,38,56,63. Interestingly, within the more confined geographical scope of Western Uganda, temperature exhibited a considerably weaker influence on the distribution of IH species14. This could be attributed to the more consistent temperature fluctuations compared to the broader variations seen in larger-scale studies like ours. At a broader spatial scale, our study reveals a pronounced prevalence of intermediate host snails when mean annual temperatures range between 20 and 25 °C. In prior studies, a temperature of 25 °C has been associated with an increase in snail populations64,65. In addition, Malone65 noticed an ideal temperature range of 20–27 °C for the intramolluscan development of S. mansoni within Biomphalaria spp. snails. On the other hand, decreased probability of IH snail presence during warm seasons exceeding 29 °C as shown in our study can be attributed to elevated snail mortality, diminished reproductive capacity, and inhibited snail growth, ultimately resulting in reduced schistosomiasis cases in such seasons63,66.

In addition, the presence of clay in the soil was a significant factor in the Bulinus model, consistent with prior research by Stensgaard et al.36, which associated clay-rich soils with higher snail prevalence. However, other studies suggested that clay content in the soil had only a minimal impact on IH snail presence46,67. Nonetheless, clay content in the nearby terrestrial surroundings can influence the distribution of IH snails by affecting soil texture, water retention, and drainage. The presence of heightened clay content may foster waterlogged conditions that are favorable for the proliferation of IH snails56. The strong relationship between soil nitrogen content and Bulinus IH snail distribution implies that even minor variations in soil nitrogen content can significantly impact their distribution. This connection suggests that although snails typically flourish in aquatic environments, the presence of soil nitrogen levels in the nearby terrestrial surroundings might affect the spread of Bulinus snails. Theoretically, increased soil nitrogen often correlates with a greater chance of nitrogen leaching, which could lead to elevated nitrogen levels in streams or floodplain habitats. These conditions could favor the survival and proliferation of these snails within their aquatic environments.

Minor and non-significant predictor features of IH snail occurrence

Certain predictor features held relatively low importance on the distribution of both genera. Discrepancies between the MDA and MDG metrics regarding these parameters were noted. A brief discussion of possible explanations for the limited and non-significant significance of these parameters on the distribution of IH snails is provided, taking into account conflicting findings in the literature. The limited impact of some climate features like BIO14 and BIO5 during the driest month can be attributed to factors including food scarcity, snail adaptations, and the possibility of aestivation/hibernation, with the likelihood of snail mortality during these driest months62,63,66,68. Scenarios like hibernation often occurs as most temporary breeding sites dry out68. Moreover, the feeding habits of freshwater snails can be influenced by cold temperatures (BIO6), leading to a potential decrease in their reproductive activity61. In fact, studies typically indicate that precipitation and temperature play a minimal role or lack statistical significance in influencing the distribution of intermediate host snails14,46. This can be linked to the smaller geographical scope examined in prior studies, where similar climatic changes were observed, resulting in collinearity in the climate data14,46. Consequently, there was limited variation in the data, hindering the reflection of the significance of climate variables as primary drivers for snail distribution. In contrast, our regional and larger-scale study provides a more comprehensive perspective. However, it is important to note that the distribution of IH snails may not solely be driven by all climate features but can also be influenced by a complex interplay of various factors including ecological, topological, and human factors12,59.

Sand content, as observed, emerged as a significant yet a minor feature in both the Biomphalaria and Bulinus models. This finding is in line with the research conducted by Stensgaard et al.36, which highlights the significance of specific levels of sand content in snail distribution. Sandy soils, due to their inherent characteristics that enhance drainage, significantly impact the suitability of habitats for snails36. However, sand content, representing fine soil particles, may not consistently exert a strong influence on the distribution of IH snails, with its impact varying potentially based on its content for example 34–39% in our study area. On the other hand, the significance of silt content in Bulinus presence was notably lower, as indicated only by MDA. This finding aligns with the results reported by Deka46, underscoring the limited contribution of silt content to defining the presence of IH snails. On the contrary, Olkeba et al.67 observed higher Bulinus globosus populations in regions with higher silt content. However, it is crucial to acknowledge that the association between soil texture (silt, clay, sand) and snail distribution represents only one aspect within a larger ecological framework. This framework includes various factors like water chemistry, vegetation, and climate.

Furthermore, we observed that soil pH (levels 5.1–9.2), had minimal significance in the distribution of Bulinus snails and was not significant at all in the Biomphalaria model. The limited impact in our study could be attributed to the varying alkaline nature of the soils. Likewise, the restricted importance of both bulk density of the fine earth fraction and soil cation exchange capacity, as constituents of soil compositions, can be linked to the limited influence exerted by the soil content parameters (sand, silt, clay). It is essential to acknowledge that land use distribution involves various classes, which vary by region and over time69. The potential impact of land use on the distribution of IH snails, such as waterbodies and cropland vegetation mosaics, may be limited by superimposing effects from irrelevant factors like savannah and barren land56. The relatively minor impact of the human footprint, which was a weaker predictor for Biomphalaria snail distribution (by MDA), is in line with findings from Olkeba et al.67 and Krauth et al.70. Nonetheless, humans often play a crucial role in introducing snails into new environments and serve as passive dispersal vectors70,71 through expansion of irrigation agriculture, settlement and fishing activities. Conversely, a study by Tabo et al.14 did not identify human influence as a significant factor affecting IH snail distribution, potentially because some of the habitats are in reversed areas and in game parks where human activities are limited14. Furthermore, this variance may be attributed to the limited spatial scope of their case study, which may not comprehensively capture the full extent of human impact on snail distribution. While Deka46 emphasized the importance of proximity to the nearest water body as a significant variable, our research indicates its limited influence on the distribution of both genera. Surprisingly, Tabo et al.14 reached a similar conclusion regarding the insignificance of this variable in IH snail distribution. These disparities may stem from the distinctive geographical and landscape characteristics considered.

In our study, with an NDVI range of 0.18 to 0.83 and an LST range of 3.4 to 43.6 °C in the region, both parameters exhibited low significance in determining the distribution of both genera (by MDG). This observation aligns with previous studies conducted by Magero et al.26, Boitt and Suleiman56, and Deka46, all of which found a similar limited influence of these two parameters on the presence of IH snails. Nonetheless, it is important to consider that we have observed in this study that land cover has a limited influence at all. Moreover, Boitt and Suleiman56 have pointed out that land surface temperature (LST) is significantly shaped by land cover, while NDVI indirectly reflects land cover characteristics. This interrelationship may help explain the relatively modest impact of both LST and NDVI on snail distribution in our study.

While the study provides valuable new insights and results, it is limited by the scarcity of accessible physico-chemical data from online spatial databases or literature in the entire region or from major parts of the study area. The sole available physico-chemical data from a survey field study14 is constrained to a localized area in Western Uganda within our study region. Nevertheless, we advocate for extensive field sampling studies across East Africa.

Conclusion

Our comprehensive analysis highlights the significance of geographical, climatic, environmental, and human factors in understanding the distribution of IH snails for schistosomiasis. Such factors can influence not only the occurrences of the genera but specifically their speciation, extinction and dispersion processes in an ecosystem. Our machine-learning approach disentangled key drivers, revealing that topography and climate predominantly influence Biomphalaria, while topography, climate, soil content, and nitrogen concentration collectively affect the presence of Bulinus. The intricate relationship with topography (altitude) may reflect dispersal limitations or environmental filtering, while positive associations with precipitation patterns and temperature variations suggest the prevalence of IH snails in East African ecosystems, especially within the tropical climate zone. Furthermore, clayish soil content and high nitrogen levels favor IH snail distribution in freshwater habitats. It is crucial to acknowledge the multifaceted nature of IH snail distribution, influenced by diverse ecological, climatic, topological, and human factors with varying contributions. These findings provide a foundational dataset for future research and risk mapping, supporting targeted prevention and control efforts against schistosomiasis. In addition, the findings have significant implications for public health. Policy makers and stakeholders should consider habitat suitability and prioritize actions on features identified as significant for the distribution of IH snails in the region. It is crucial to integrate approaches and enhance community awareness regarding these significant factors, leading to the design and implementation of integrative measures for the control of IHs and, consequently, the prevention of schistosomiasis.