Applying spatial regression to evaluate risk factors for microbiological contamination of urban groundwater sources in Juba, South Sudan
This study developed methodology for statistically assessing groundwater contamination mechanisms. It focused on microbial water pollution in low-income regions. Risk factors for faecal contamination of groundwater-fed drinking-water sources were evaluated in a case study in Juba, South Sudan. The study was based on counts of thermotolerant coliforms in water samples from 129 sources, collected by the humanitarian aid organisation Médecins Sans Frontières in 2010. The factors included hydrogeological settings, land use and socio-economic characteristics. The results showed that the residuals of a conventional probit regression model had a significant positive spatial autocorrelation (Moran’s I = 3.05, I-stat = 9.28); therefore, a spatial model was developed that had better goodness-of-fit to the observations. The most significant factor in this model (p-value 0.005) was the distance from a water source to the nearest Tukul area, an area with informal settlements that lack sanitation services. It is thus recommended that future remediation and monitoring efforts in the city be concentrated in such low-income regions. The spatial model differed from the conventional approach: in contrast with the latter case, lowland topography was not significant at the 5% level, as the p-value was 0.074 in the spatial model and 0.040 in the traditional model. This study showed that statistical risk-factor assessments of groundwater contamination need to consider spatial interactions when the water sources are located close to each other. Future studies might further investigate the cut-off distance that reflects spatial autocorrelation. Particularly, these results advise research on urban groundwater quality.
KeywordsSub-Saharan Africa Health Microbial processes Statistical modeling Urban groundwater
Application de la régression spatiale pour évaluer les facteurs de risque de contamination microbiologique des sources d’eau souterraine en milieu urbain à Juba, Sud Soudan
Cette étude a développé une méthodologie pour évaluer du point de vue statistique les mécanismes de contamination des eaux souterraines. Elle met l’accent sur la pollution microbienne des eaux dans des régions à faible revenu. Les facteurs de risque pour la contamination fécale des eaux souterraines alimentation les sources d’alimentation en eau potable sont évalués pour le cas d’étude de Juba, dans le Sud Soudan. Cette étude est basée sur le dénombrement des coliformes thermotolérants dans les échantillons d’eau de 129 sources, recueillis par l’organisation d’aide humanitaire Médecins Sans Frontières en 2010. Les facteurs comprennent les paramètres hydrogéologiques, l’occupation du sol et les caractéristiques socio-économiques. Les résultats montrent que les résidus d’un modèle classique de régression par probit présentaient une autocorrélation spatiale positive significative (Moran’s I = 3.05, I-stat = 9.28). Par conséquent, un modèle spatial a été développé avec une meilleure qualité d’ajustement aux observations. Le facteur le plus significatif de ce modèle (valeur de p 0.005) était la distance entre une source d’eau et la zone de Tukul la plus proche, une zone où les établissements informels manquent de services d’assainissement. Il est donc recommandé de concentrer les efforts en matière de futurs assainissements et de suivi dans la ville, dans ces régions à faible revenu. Le modèle spatial différait de l’approche classique: contrairement à ce dernier cas, la topographie des plaines n’était pas significative au niveau de 5%, la valeur de p étant de 0.074 dans le modèle spatial et de 0.040 dans le modèle classique. Cette étude a montré que les évaluations statistiques des facteurs de risque des contaminations des eaux souterraines doivent tenir compte des interactions spatiales lorsque les sources d’eau sont situées à proximité l’une de l’autre. Les études futures pourraient examiner la distance de coupure, qui reflète l’autocorrélation spatiale. En particulier, ces résultats apportent des conseils sur la recherche de la qualité de eaux souterraines en milieu urbain.
Aplicación de la regresión espacial para evaluar los factores de riesgo de contaminación microbiológica de las fuentes de agua subterránea urbanas en Juba, Sudán del Sur
Este estudio desarrolló una metodología para la evaluación estadística de los mecanismos de contaminación del agua subterránea. Se centró en la contaminación microbiana del agua en las regiones de bajos recursos. Los factores de riesgo para la contaminación fecal de fuentes de agua potable alimentadas con agua subterránea fueron evaluados en un caso de estudio en Juba, Sudán del Sur. El estudio se basó en los recuentos de coliformes termotolerantes en muestras de agua de 129 fuentes recolectadas por la organización de ayuda humanitaria Médecins Sans Frontières en 2010. Los factores incluyeron los entornos hidrogeológicos, el uso del suelo y las características socioeconómicas. Los resultados mostraron que los residuos de un modelo convencional de regresión probit tenían una autocorrelación espacial positiva significativa (I de Moran = 3.05, I-stat = 9.28). Por lo tanto, se desarrolló un modelo espacial que tenía mejor bondad de ajuste a las observaciones. El factor más significativo en este modelo (valor de p 0.005) fue la distancia de una fuente de agua a la zona de Tukul más cercana, un área con asentamientos informales que carecen de servicios de saneamiento. Por lo tanto, se recomienda que los esfuerzos futuros de remediación y monitoreo en la ciudad se concentren en esas regiones de bajos recursos. El modelo espacial difiere del enfoque convencional: en contraste con este último caso, la topografía de las tierras bajas no fue significativa al nivel de 5%, ya que el valor de p fue 0.074 en el modelo espacial y 0.040 en el modelo tradicional. Este estudio mostró que las evaluaciones estadísticas del factor de riesgo de la contaminación del agua subterránea deben considerar las interacciones espaciales cuando las fuentes de agua están ubicadas próximas unas de otras. Estudios futuros podrían investigar aún más la distancia de corte, que refleja la autocorrelación espacial. En particular, estos resultados aconsejan sobre la investigación en la calidad del agua subterránea urbana.
本研究提出了统计学上评价地下水污染机理的方法。这种方法重点关注低收入地区的微生物污染。在南苏丹朱巴地区一个研究案例中评估了地下水饮用水源的粪便污染风险因素。研究基于来自129个源点的水样中耐热大肠杆菌的计数,这些水样是2010年由人道援助组织Médecins Sans Frontières采集的。风险因素包括水文地质背景、土地利用和社会经济特征。结果显示,常规概率单位回归模型的残差有重要的空间正自相关(Moran’s I = 3.05, I-stat = 9.28)。因此,开发了具有对观测结果有拟合优度的空间模型。这个模型中最重要的因素是(p-值 0.005)水源到最近的Tukul地区,这个地区为非正式的居住点,缺乏卫生设施。因此建议,城市将来的污染整治和监测应集中在这样的低收入地区。空间模型不同于常规方法:与后者情况相比,低地地形在5%的水平上并不重要,因为p-值在空间模型中为0.074,在传统模型中为0.040。这项研究显示,当水源彼此距离很近时,地下水污染统计学上的风险因素评价需要考虑空间相互作用。未来的研究可能进一步调查截止距离,截至距离反映空间自相关。尤其是,这些结果建议对城市地下水水质进行研究。
Aplicando regressão espacial para avaliar fatores de risco para a contaminação microbiológica de fontes de água subterrânea urbana em Juba, Sudão do Sul
Este estudo desenvolveu metodologia para avaliar estatisticamente os mecanismos de contaminação das águas subterrâneas. Concentrou-se na poluição microbiana da água em regiões de baixa renda. Fatores de risco para contaminação fecal de fontes de água potável supridas com água subterrânea foram avaliados em um estudo de caso em Juba, no Sudão do Sul. O estudo foi baseado em contagens de coliformes termotolerantes em amostras de água de 129 fontes, coletadas pela organização de ajuda humanitária Médicos Sem Fronteiras em 2010. Os fatores incluíram cenários hidrogeológicos, uso da terra e características socioeconômicas. Os resultados mostraram que os resíduos de um modelo de regressão probit convencional tiveram uma autocorrelação espacial positiva significativa (Moran’s I = 3.05, I stat = 9.28). Assim, desenvolveu-se um modelo espacial que apresentava o melhor acoplamento às observações. Portanto, desenvolveu-se um modelo espacial que apresentava melhor qaulidade de ajuste às observações. O fator mais significativo neste modelo (valor-p 0.005) foi a distância de uma fonte de água para a área mais próxima de Tukul, uma área com assentamentos informais que não têm serviços de saneamento. Recomenda-se que remediações futuras e esforços de monitoramento na cidade sejam concentrados em tais regiões de baixa renda. O modelo espacial diferiu da abordagem convencional: em contraste com o último caso, a topografia das planícies não foi significativa ao nível de 5%, já que o valor-p foi de 0.074 no modelo espacial e de 0.040 no modelo tradicional. Este estudo mostrou que as avaliações estatísticas de fatores de risco de contaminação das águas subterrâneas precisam considerar interações espaciais quando as fontes de água estão localizadas próximas umas das outras. Estudos futuros podem investigar mais a distância de corte, que reflete a autocorrelação espacial. Particularmente, estes resultados recomendam pesquisas sobre a qualidade das águas subterrâneas urbanas.
Human health is at risk when microbes are present in groundwater-fed sources of drinking water. Borchardt et al. (2003) reported that diarrhoea in children in Wisconsin (USA) was correlated with drinking from a household well contaminated with faecal enterococci. Beller et al. (1997) traced an outbreak of gastroenteritis in Alaska (USA) to water consumption from a contaminated well. The disease burden of water-related infectious diseases is the most severe in developing countries (Batterman et al. 2009). In 2010, diarrheal disease caused an estimated 0.8 million deaths in children under the age of 5 years, with approximately half of these occurring in Africa (Liu et al. 2012). Sorensen et al. (2015) detected DNA from the pathogens Vibrio cholerae and Salmonella enterica (cause of typhoid fever) in 41 and 16% of the analysed samples, respectively, in groundwater in the city of Kabwe, Zambia. In developing countries, groundwater often provides the most important sources of drinking water (Pedley and Howard 1997). In Sub-Saharan Africa, where most of the world’s poorest countries are located, understanding of the mechanisms that cause faecal contamination of groundwater sources is still very limited (Kanyerere et al. 2012; Nyenje et al. 2013). It is thus imperative to improve guidelines and practices related to water and sanitation, particularly in Sub-Saharan Africa. For regions that lack water-quality data, the highest priority is to monitor the performance of improved (protected) sources (Abramson et al. 2013). As much as 86% of the population in low-income countries has access to such improved water sources (WHO/UNICEF 2012), which are typically derived from groundwater.
Statistical models provide means to identify risk factors for groundwater contamination—for example, they can indicate the likely route of contaminant entry, inform future well siting and improve the screening of wells (Hynds et al. 2014). They can also help specify where future monitoring efforts are most needed and the results based on a particular site can be used to guide field investigations in other areas with similar hydrogeology and land use (Mair and El-Kadi 2013). Regression-based models are particularly useful in operational contexts (de Brauwere et al. 2014). Their use is common in the literature in studies on risk factors for microbial groundwater contamination, which have focused on: coliform bacteria in rural wells in Iowa, USA (Glanville et al. 1997), the link between Cryptosporidium and onsite wastewater systems and private wells in New Mexico, USA (Tollestrup et al. 2014), Escherichia coli (E. coli) in 211 wells in the Republic of Ireland (Hynds et al. 2014), E. coli in groundwater sources in northern, rural Malawi (Kanyerere et al. 2012), coliform bacteria in shallow wells in Ibadan, Nigeria (Oguntoke et al. 2013), TTCs and faecal streptococci in shallow groundwater in Kampala, Uganda (Howard et al. 2003), enterococci and TTCs in shallow groundwater sources in Lichinga, Mozambique (Godfrey et al. 2006), and faecal coliform and faecal streptococci in rural areas in Burkina Faso (Guillemin et al. 1991).
Typically, the data used to develop regression models are assumed to be statistically independent, with residuals between observations and model estimates that are independent and identically distributed (iid). However, spatial data have a tendency to be autocorrelated, which implies that the residuals vary systematically over space (LeSage 2000; Mörtberg and Karlström 2005). If spatial effects are ignored, the estimates of the coefficients and the inferences based on such models might be inaccurate. An important characteristic in the current study was that the sources were located relatively close to each other, which might result in spatial interactions between data points, particularly in the event of regional aquifer contamination. Recently, spatial statistics has received increased attention, with applications in geology, economics and epidemiology (Pinkse and Slade 1998). However, to the authors’ knowledge, spatial regression has not been used in research on risk factors for groundwater contamination.
The objectives of the current case study of Juba were to improve understanding of the factors that cause microbiological contamination of protected groundwater sources in areas with tropical climates, low incomes and high population densities and to advance hydrogeological research using statistical modelling as a tool to evaluate mechanisms of urban groundwater pollution. The study investigated the hypothesis that regression models of aquifer pollution should consider spatial autocorrelation when the sources are located near to each other. The risk factor analysis included land use, socio-economic factors and hydrogeological settings.
Case study area
Variables and data sources
The spatial risk factor analysis included site-specific information and regional data, reflecting hydrogeological factors, land use, and socio-economics. The Appendix lists the variables used, their measurement units, and the corresponding reference.
Sample collection and analysis
The water quality data were collected by MSF-B during the wet season, from 6 April to 29 October 2010, with the purpose of identifying boreholes that could potentially spread cholera during outbreak events. Most of the sources were tested on two different dates, with approximately 3 months between sampling. Microbiological contamination was defined as >0 CFU/100 ml, in agreement with the WHO (2011) guidelines for drinking-water quality. To assess faecal contamination, water samples were analysed for TTCs using an Oxfam-DelAgua kit (Oxfam-DelAgua 2009). TTCs are considered acceptable indicators of faecal pollution (WHO 2011), because their populations are dominated by E. coli in most environments. The effect of this assumption was previously discussed in Engström et al. (2015a), which further contains a more detailed account of the water sampling procedure and the microbiological analyses.
The following hydrogeological characteristics were studied: marshlands, the Bahr-el-Jebel river and its tributaries, elevation above sea level, the local topography, and the static water level. The elevation and catchment areas were extracted using topographical data with 30 × 30 m resolution, based on the ASTER Global Digital Elevation Model (NASA Jet Propulsion Laboratory (JPL) 2011). The local topography was based on an on-site assessment by MSF-B at the time of the water sampling. This factor was represented as a Boolean indicator, set to 1 if a water source was located in a lowland area and 0 otherwise. Its importance was investigated using cross-tabulation, which tests the null-hypothesis that a table is independent in each dimension. The static water level was based on data obtained by MSF-B. Independently of the microbiological examination, groundwater sources were examined in 2008, 2009 or 2010 in MSF-B campaigns of boreholes drilling and rehabilitation in cooperation with the government of Southern Sudan, the Ministry of Cooperatives and Rural Development, and the Directorate of Rural Water (MSF-B, unpublished data, 2013). At these evaluations, the static water level was recorded. The static water levels obtained from the rehabilitation and the drilling protocols from 33 sites were used to estimate the depth-to-groundwater elsewhere in Juba (Fig. 3). The groundwater level was calculated by subtracting the static water level from the ground surface elevation, obtained from the ASTER Global Digital Elevation Model. An inverse distance-weighting algorithm was then applied. The resulting raster was subsequently used to extract the static water levels at the borehole locations that were sampled for coliform bacteria.
Land use and socio-economic data
Land cover information was defined via reports by USAID (2005) and JICA (2009a; b). Based on maps in those studies, four land cover categories were identified: bush, open ground or grassland, commercial and market areas, and roads or houses. Furthermore, socio-economic data were included using four land class categories, defined by USAID (2005) as follows: informal Tukul areas, which are low-income areas with squatter housing (532 inhabitants per ha); class 3–4 areas, with a transient, low-income housing mix of permanent and temporary materials (266 inhabitants per ha); class 2 areas, with middle-class cottage homes of simple construction, some with sanitation (200 inhabitants per ha); and Class 1 areas, with permanent structures and colonial-style homes with access to formal sanitation (128 inhabitants per ha). Additionally, the on-site hygiene level was accounted for in the regression. It had been categorized into three levels by MSF-B at the time of water sampling, as presented previously (Engström et al. 2015a). There were 129 water sources accounted for in the current study and 147 locations were evaluated in Engström et al. (2015a); however, spatial data could not be obtained for all sources.
GIS data generation
The spatial features were geographical information system (GIS)-derived using image processing operations on maps. Features were accounted for in variables reflecting shares of circular areas centered on each water source. Different radii were considered to investigate the effect of lateral contaminant transport (30, 100 and 500 m). In some cases, the feature was lacking in the smaller buffers and these radii were omitted from the statistical analysis. The regression also included variables reflecting the Euclidean distance from each water source to the nearest feature.
The statistical associations between contamination and the hydrogeological and anthropological risk factors were investigated. These tests were based on the two-sided Wilcoxon rank-sum test (or Mann-Whitney U-test). This identified the most important risk factors, which were subsequently considered in the multivariable models. The variables with individual significance of p < 0.10 were assessed in these models, in agreement with Mair and El-Kadi (2013) and Hynds et al. (2014). A probabilistic (probit) regression model was developed to estimate the probability of contamination related to these predictors. It included only the factors for which the relationship corresponded with prior theories. The occurrence, defined as the presence/absence of TTCs in 100-ml samples, was considered rather than concentrations, in accordance with Hynds et al. (2014), motivating a binary model with unquantifiable variability within the system.
Conventional probabilistic regression
for i = 1, …, n.
Model evaluation and selection
Testing for spatial autocorrelation
Spatial probit regression
for i = 1, …, n. The individual error terms σi are heteroscedastic and the vector σ follows a multivariate normal distribution with zero mean and variance-covariance matrix [(I – ρW)’(I – ρW)–1] (Amaral et al. 2012). The recursive importance sampling algorithm was applied to calculate the n -dimensional integral in the likelihood function and thus estimate the parameters in the spatial probit model. This method uses random draws of truncated normal distributions (Beron and Vijverberg 2004). This simulator is one of the most efficient techniques for estimating the likelihood function (Pace and LeSage 2011). Other alternative methods include Gibbs sampling (LeSage 2000), the generalized method of moments (Pinkse and Slade 1998), and the expectation-maximization algorithm (McMillen 1992). To assess the relevance of a spatial probit model, confidence intervals (95%) and p-levels were evaluated for the spatial parameter, ρ.
Results and discussion
Results from the bivariate risk factor analyses: variables with p-values < 0.10
Distance to the nearest Tukul area
Tukul areas, share of a 500 m radius buffer
Distance to the nearest river or wadi
Lowland topography (Boolean for lowland or flatland/highland)
Tukul areas, share of a 100-m-radius buffer
Class 3–4 residence area, share of a 100-m-radius buffer
Open ground or grassland, share of a 500 m radius buffer
Distance to the nearest marshland
Description of the different multivariable probit regression models developed
Conventional model with the lowest AIC, considering all combinations of explanatory variables with individual significance of p < 0.10. The included factors were: lowland topography [Boolean]; share of class 3–4 residences [%]; distance to the nearest Tukul area [m]
Spatial model: the included factors were the same as for the corresponding conventional model (1A): lowland topography [Boolean]; share of class 3–4 residences [%]; distance to the nearest Tukul area [m]; in addition to a parameter for spatial interactions [-]. In this model a water source was considered a neighbour to another source if it was located near to it (< 300 m)
Spatial model: the included factors were the same as for the corresponding conventional model (1A): lowland topography [Boolean]; share of class 3–4 residences [%]; distance to the nearest Tukul area [m]; in addition to the parameter for spatial interactions [-]. In this model a water source was considered a neighbour to another source if it was located near to it (< 300 m) and upstream of it
Conventional model with the lowest AIC, considering all combinations of explanatory variables with individual significance of p < 0.05. The included factors were: lowland topography [Boolean]; distance to the nearest Tukul area [m]
Spatial model: the included factors were the same as for the corresponding conventional model (2A)—lowland topography [Boolean]; distance to the nearest Tukul area [m]; in addition to the parameter for spatial interactions [-]. In this model, a water source was considered a neighbour to another source if it was located near to it (< 300 m)
Spatial model: the included factors were the same as for the corresponding conventional model (2A)—lowland topography [Boolean]; distance to the nearest Tukul area [m]; in addition to the parameter for spatial interactions [-]. In this model a water source was considered a neighbour to another source if it was located near to it (< 300 m) and upstream of it
The residuals of the conventional probit models were spatially autocorrelated. For model 1A, Moran’s I was 1.90 (I-stat 3.61) if a source was defined as a neighbour located nearby, and Moran’s I was 2.88 (I-stat 8.29) if a source was defined as neighbor only if it was found both nearby and upstream. Considering model 2A, the corresponding values were 2.08 (I-stat 4.31) for neighbours located nearby, and 3.05 (I-stat 9.28), for neighbours located nearby and upstream. These results indicated that spatial autocorrelation was stronger for the narrower definition of a neighbour, which excluded sources that were located downstream of a reference source. This was anticipated, considering the direction of groundwater flow. These results showed that subject knowledge is important to appropriately define the weight matrix when applying a spatial model.
The two best conventional models (models 1A and 2A) and the corresponding spatial models (models 1B, 1C, 2B, 2C) for explaining TTC contamination of water sources in Juba
β0 constant (SD)
β1 lowland topography (SD)
β2 class 3–4 residences within 100 m (SD)
β3 distance to Tukul area (SD)
ρ spatial interactions (SD)
Contamination mechanisms and hydrogeology
The best model, the one with the lowest AIC, thus incorporated two explanatory variables: the distance to the nearest Tukul area (β3), and the local topography (β1) (model 2C). The siting of the Tukul areas, as specified by USAID (2005), was clearly approximate, seeing that the zones were circular (Fig. 4); nevertheless, considering the negative sign of β3, these results reasonably indicated that if a water source was located at a far distance (measured in m) from all of the Tukul areas, then the susceptibility to contamination was substantially reduced. For water sources located in the Tukul areas the effect of the corresponding variable coherently disappears from the equation.
The statistical significance of a factor could either be linked to the presence of contaminant sources or to transport pathways; of these, it was likely that the effect of the near presence of Tukul areas was primarily related to the former. These areas typically have dense populations that reside in squatter housing and lack access to formal sanitation systems and the surrounding land is often used for rotational crops and subsistence farming (USAID 2005). In the proximity of Tukul areas, this suggests the high relative prevalence of animal and human waste, which provides sources of faecal coliforms. The Southern Sudan Commission for Census Statistics and Evaluation (2006) reported that 64% of the household population in the country used open-air spaces to dispose of human wastes. This part of the population is more likely to reside in informal Tukul areas than in the other zones where a larger share of the residents has access to sanitation.
To identify a region of impact of each feature, different buffer zones were considered in the GIS analyses. In the case of Tukul areas, the most significant factor in the regression reflected the Euclidean distance (Table 3); additionally, the shares of Tukul areas within 500 m radii circular areas around each source were more significant than those within 100 m radii areas (Table 1). This suggested that the characteristics of an area further than 100 m from a borehole might influence its level of contamination. This result thus indicated generalized aquifer contamination, a contamination mechanism articulated by Hynds et al. (2014). Consistently, Batterman et al. (2009) found that the spreading of water-related infectious diseases is related to both ecologic and socio-economic processes, and that distal causes should be accounted for to enable sustainable interventions.
Seeing the positive sign of β1, the results moreover indicated that lowland areas were more prone to contamination than highlands or flatland (Table 3). In the regression model, this factor was represented as a dummy variable, disappearing from the equation for water sources located in highlands or flatlands, and the coefficient would supplement the intercept for water sources located in lowland areas, such as valleys. Assuming the presence of coliforms on the ground, this could be related to ponding in such areas, considering that Engström et al. (2015a) reported that the level of accumulated long-term precipitation was associated with contamination. The hydrogeology in Juba might allow for groundwater pollution. Basement complex aquifers generally imply large variations in groundwater velocities and vulnerability to contamination (Morris et al. 2003). Geological profiles from drilling protocols (MSF-B, unpublished data, 2013) specified that the top soil in Juba contained alluvial sediments with sand, loam, clay and weathered rock, which was underlain by rock of various degrees of weathering, and that the distance to the rock had large local variations. Lineaments in fractured rock do not provide substantial natural protective layers to reduce contamination (Kanyerere et al. 2012). Particularly, laterite zones near the surface can be quite transmissive and unconfined aquifers can enable contaminant transport from the ground towards the water table in a matter of days or weeks, with low attenuation potential and high to extreme pollution vulnerability (Morris et al. 2003).
The water samples considered in this study were collected by MSF-B to monitor the evolution of potential cholera outbreaks and identify high-risk water sources. The sampling focused on areas previously affected by cholera, Kator and Munuki, where all of the water sources were tested. In developing countries, cholera is typically transmitted through water, and infected people could transmit the disease to other individuals via faecal contamination of water (Sack et al. 2004). It is thus reasonable to expect that boreholes contaminated with faecal indicators, such as TTCs, are more likely than clean ones to transmit cholera. Vibrio cholerae and TTCs have important similarities: they are gram-negative, facultatively anaerobic, and have similar size and shape (Cabral 2010), indicating that the strains would be transported in the same manner underground; however, this link requires further research. Nevertheless, the results could support future efforts that aim to reduce diarrheal disease. Cholera outbreaks have taken place in the South Sudan region every year from 2006 to 2009 and in 2014 (WHO 2014a). It remains a public health threat in Sub-Saharan Africa. According to Mengel et al. (2014), Sub-Saharan Africa accounted for 86% of reported cases of cholera and 99% of deaths due to cholera worldwide in 2011 (excluding the Haitian epidemic).
This is the first study to use spatial regression models to assess risk factors for groundwater contamination, to the authors’ knowledge. Hence, there was no previous literature to refer to when specifying the spatial model. The weight matrix should reflect the distance within which the response data are correlated. In theory, the groundwater in Juba might originate from the whole upstream Nile river basin, which would imply a vast zone of impact for each borehole and the possibility of spatial correlation among boreholes located very far from each other. However, the zone of impact would be limited by the fact that faecal coliforms typically die after 20 days in the field at 20–30 °C temperatures, based on Westcot (1997); nevertheless, it is not obvious how this would translate to distance, as discussed more thoroughly by Engström et al. (2015b). Notably, aquifers in weathered basement complexes often have anisotropic properties related to the orientation of the fractures, and pumping from boreholes could induce constricted and elongated zones of contribution (Tearfund 2007). In the current study, as an approximate approach, the presence of a water source within a fixed 300 m distance from a reference was defined as a neighbour and sources further away were not, which allowed for relatively lengthy transport. Shorter transport distances might also have been relevant. Hynds et al. (2012) estimated that the approximate zone of impact of septic tanks extended up to 110 m up-gradient of the wellhead, if high 120-h prior precipitation rates were considered. Conversely, in a review, Pang (2009) reported that the maximum observed E. coli transport distance was as great as 920 m, for sewage polluted groundwater in gravel aquifers in Burnham, New Zealand, at velocities as high as 56–153 m/day (Sinton 1980). Future studies might investigate the cut-off distance for spatial autocorrelation as related to microbial transport in different hydrogeological environments.
The results in this study indicated that a spatial model might be more adequate than one that assumes all data are independent in space. The findings thus contribute to research on risk factors for urban (or peri-urban) groundwater contamination because sources that provide water in such areas are likely to be densely located. This is especially notable because groundwater provides an important component of the water supply system in 12 of the world’s 23 megacities (>10 million inhabitants) (Hirata et al. 2006); in particular, groundwater is an essential water source in peripheral, poorer parts of many cities, which often do not receive piped water or formal sanitation services (Hirata et al. 2006).
The regression resulted in 67% correct predictions using the model with the lowest AIC (model 2C). This was relatively low, indicating that the investigated features did not account for the whole variance in the response variable, which might be an effect of the low resolution of the maps. Other factors than those considered in the current study may have also influenced the water quality.
It is reasonable to expect that microbial contamination of groundwater sources is particularly prevalent in urban areas in developing countries; however, such environments are often relatively disorganized, imposing constraints on access to detailed spatial and temporal data. Batterman et al. (2009) stated that understanding of water-related infectious diseases in developing countries is often limited by knowledge and data gaps and that related analyses are often based on multiple and sparse data sets. The current study also faced some related restrictions. The Comprehensive Peace Agreement was signed between fighting parties in Sudan in 2005, ending decades of civil war. Few records of geological and hydrogeological surveys in Juba were centralized before 2005. The decades of conflict resulted in many internal refugees and very limited resources for systematic monitoring of environmental and socio-economic factors. Therefore, the analysis relied on reports by USAID (2005) and JICA (2009a; b) for spatial information. The resolution in these data varied. Furthermore, the report by USAID (2005) was developed 5 years prior to the sampling in the current study and spatial features could have changed during this time, which means that the exact location of features could not be determined. Instead, inferences need to be based on broad trends in the data.
Missing spatial risk factors
In the regression, it would be preferable to account for the hydrogeological settings in the vicinity of each water source, such as the bedrock and the subsoil characteristics. Fine-resolution spatial data on the location of fracture zones or lithology could not be found, as the accessible hydrological and geological maps were on a country scale. Groundwater levels in Juba had to be estimated based on interpolation of the registered static water level from a limited number of sources. The static water level reported in these protocols varied from 2 m to more than 20 m below ground, with large variations. It was anticipated that the local water table level would be associated with contamination. For example, Kulabako et al. (2007) reported that the level of faecal contaminants increased in areas in Kampala with a higher water table. However, in the current study, the static water level was not significant; nevertheless, the results do not exclude the possibility that local and/or ephemeral aquifers influenced contamination, considering that local variations might not be correctly estimated based on the 33 locations used for estimation of the static water level elsewhere in Juba. Further, the static water level was measured at times other than the microbiological sampling dates and there could be seasonal variations. Future studies would thus preferably account for ephemeral and local aquifers.
Additionally, the results indicated that the proximity of houses or roads was not associated with borehole contamination; however, the map representing their locations did not thoroughly reflect the informal infrastructure in Juba, such as walkways and individual clay huts, which might be important. If possible, future studies should account for such data. Other potential risk factors include the number of users of each source and the locations of small-scale animal farming facilities or cultivated areas where manure might be used for fertilizer. Further, the distance to small ponds near each water source would preferably be included. Studying ponds in rural Bangladesh, Knappett et al. (2011) reported that the water in the majority of the ponds contained unsafe levels of faecal contamination, which was mainly attributed to the proximity of unsanitary latrines (visible effluent or open pits).
Time-variant risk factors
The current study focused on spatial factors, although temporal factors are also likely to be important. Results from Engström et al. (2015a) indicated that both the level of on-site hygiene and contamination of groundwater sources varied considerably with time. The latter was transient in 43% of the investigated sources, and the level of on-site hygiene was a significant factor for contamination in pairwise comparisons of the sources with varying contamination at different times (Engström et al. 2015a). Water sampling was consistently conducted in the wet season in the current study; nevertheless, there are weather variations in this period that might have impact on the susceptibility of wells to contamination. Engström et al. (2015a) found that accumulated long-term antecedent rainfall was associated with contamination events but temperature was not. It is therefore recommended that future studies in similar areas account for time-variant factors that might influence groundwater quality, particularly precipitation, in addition to spatial factors.
Summary and conclusions
This study investigated potential risk factors influencing bacterial contamination of urban groundwater sources. The evaluated variables reflected site-specific information as well as regional land use, hydrogeological setting and socio-economic characteristic data in Juba, South Sudan. A conventional multivariable regression model was developed. This approach resulted in residuals that had significant, positive spatial autocorrelation. Therefore, a spatial model was estimated in which the parameter that reflected spatial interactions was significant (p-value 0.001) and estimated at 0.50 (SD 0.15). This model accounted for the proximity of areas with informal settlements, Tukul areas, as well as the local topography (lowland/no lowland indicator variable). The results indicated that the groundwater below these zones was contaminated. Tukul areas lack formal sanitation systems, rearing animals is common and the surrounding land is often used for subsistence farming, which might explain the increased risk for contamination in their vicinity. The results suggested that generalized aquifer contamination occurred. It is recommended that future remediation efforts and monitoring schemes in cities similar to Juba—in terms of climate, hydrogeology and socio-economic characteristics—focus on such low income and informal settlement areas.
This study contributed to methodological development in the subject area. The results showed that statistical studies of groundwater quality should consider the effects of spatial interactions when the investigated sources are located near to each other. Introducing a spatial term could have important effects on the other parameters in the model. In the current study, the spatial model indicated that the local topography was not significant at the 5% level, in contrast with inferences based on the conventional model. However, when applying spatial regression, it should be emphasized that subject knowledge is important to define the weight matrix that reflects spatial interactions. In this study, the spatial parameter was more significant when the direction of groundwater flow was considered in defining the weight matrix. In the field of groundwater quality, research based on statistical models can inform decision making by identifying priority land-use types and prioritizing remediation efforts. In cities, groundwater quality data are unlikely to be independent in space because the water sources are often located near to each other. Future research should address the mechanisms for urban groundwater contamination; when using statistical models to do so, spatial effects should be accounted for. This is important considering that groundwater provides a large component of the water supply system in a majority of the world’s megacities.
The ASTER Global Digital Elevation Model (GDEM V2) data was retrieved from the online Data Pool, courtesy of the NASA Land Processes Distributed Active Archive Center (LP DAAC), USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, https://lpdaac.usgs.gov/data_access/data_pool. The authors would like to acknowledge Dr. Berit Balfors and Dr. Roger Thunvik for their comments and suggestions.
- Akaike H (1973) Information theory and an extension of the maximum likelihood principle. Paper presented at the 2nd International Symposium on Information Theory, Akademiai Kiado, Budapest, pp 267–281Google Scholar
- Batterman S, Eisenberg J, Hardin R, Kruk M, Lemos M, Michalak A, Mukherjee B, Renne E, Stein H, Watkins C, Wilson M (2009) Sustainable control of water-related infectious diseases: a review and proposal for interdisciplinary health-based systems research. Environ Health Perspect 117:1023–1032CrossRefGoogle Scholar
- Beron KJ, Vijverberg WPM (2004) Probit in a spatial context: a Monte Carlo analysis. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Heidelberg, Germany, pp 169–195Google Scholar
- Fleming MM (2004) Techniques for estimating spatially dependent discrete choice models. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Heidelberg, Germany, pp 145–168Google Scholar
- Glanville TD, Baker JL, Newman JK (1997) Statistical analysis of rural well contamination and effects of well construction. Trans ASAE 40(2):363–370Google Scholar
- Google Maps (2014) https://email@example.com,31.5885441,13z/data=!5m1!1e4?hl=en. Accessed October 23, 2014
- Google Earth (2016) 7.1.7. 04°48.61596′N, 031°35.08757′E. http://www.google.com/earth/index.htm. Accessed September 09, 2016
- Hirata R, Stimson J, Varnier C (2006) Urban hydrogeology in developing countries: a foreseeable crisis. Paper presented at the International Symposium on Groundwater Sustainability (ISGWAS) Alicante, Spain, January 2006Google Scholar
- Hynds PD, Misstear BD, GIll L (2012) Development of a microbial contamination susceptibility model for private domestic groundwater sources. Water Resour Res 48. doi:10.10292012/WR012492
- Japan International Cooperation Agency (JICA) (2009a) Juba urban water supply and capacity development study in the Southern Sudan: final report. http://libopac.jica.go.jp/top/index.do?method=change&langMode=ENG. Accessed 1 April 2014
- Japan International Cooperation Agency (JICA) (2009b) Juba water supply and capacity development study in the Southern Sudan, Interim report 1, presentation, JICA, TokyoGoogle Scholar
- LeSage J, Pace RK (2009) Introduction to spatial econometrics. CRC, Boca Raton, FLGoogle Scholar
- Liu L, Johnson HL, Cousens S, Perin J, Scott S, Lawn JE, Rudan I, Campbell H, Cibulskis R, Li M, Mathers C, Black RE (2012) Global, regional, and national causes of child mortality: an updated systematic analysis for 2010 with time trends since 2000. Lancet 379:2151–2161. doi:10.1016/S0140-6736(12)60560-1 CrossRefGoogle Scholar
- McMichael G (2016) Land conflict and informal settlements in Juba, South Sudan. Urban Studies 53(13):2721–2737Google Scholar
- Morris BL, Lawrence AR, Chilton PJ, Adams B, Calow RC, Klinck BA (2003) Groundwater and its susceptibility to degradation, a global assessment of the problem and options for management. Early Warning and Assessment Report Series, RS. 03-3. United Nations Environment Programme (UNEP), Nairobi, KenyaGoogle Scholar
- NASA Jet Propulsion Laboratory (JPL) (2011) Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model Version 2 (GDEM V2). NASA EOSDIS Land Processes DAAC, USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, SD. http://earthexplorer.usgs.gov/. Accessed September 27, 2013
- Nyenje PM, Foppen JW, Kulabako R, Muwanga A, Uhlenbrook S (2013) Nutrient pollution in shallow aquifers underlying pit latrines and domestic solid waste dumps in urban slums. Environ Manag 122:15–24Google Scholar
- Oxfam-DelAgua (2009) Oxfam-Delagua portable water testing kit user manual (version 4.2). http://www.oxfam.org.uk/equipment/catalogue/resources-included-available/water-and-sanitation/water-treatment-and-testing/Delagua%20english_manual_2000-1.pdf/at_download/file. Accessed 28 November 2014
- Pace RK, LeSage JP (2011) Fast simulated maximum likelihood estimation of the spatial probit model capable of handling large samples. doi:10.2139/ssrn.1966039
- Palamuleni LG (2002) Effect of sanitation facilities, domestic solid waste disposal and hygiene practices on water quality in Malawi’s urban poor areas: a case study of South Lunzu Township in the city of Blantyre. Phys Chem Earth, parts A/B/C 27:845–850. doi:10.1016/S1474-7065(02)00079-7
- Sinton LW (1980) Two antibiotic-resistant strains of Escherichia coli for tracing the movement of sewage in groundwater. J Hydrol N Z 19:119–130Google Scholar
- Southern Sudan Commission for Census Statistics and Evaluation (2006) Southern Sudan Household Health Survey. http://www.bsf-south-sudan.org/sites/default/files/SHHS.pdf. Accessed 10 October 2014
- Sudan Ministry of Energy and Mines Geological and Mineral Resources Department (1981) Geological map of the Sudan. http://eusoils.jrc.ec.europa.eu/esdb_archive/eudasm/africa/images/maps/download/afr_sd2001_ge.jpg. Accessed 4 November 2014
- Tearfund (2007) Darfur: water supply in a vulnerable environment—phase two of Tearfund’s Darfur environment study. Summary report, USAID, Washington, DC; DFID, London; UNEP, NairobiGoogle Scholar
- USAID (United States Agency for International Development) (2005) Juba Assessment Town Planning and Administration Report September–October 2005 CA no. 623-A-00-05-00318, USAID, Washington, DCGoogle Scholar
- Vail JR (1989) Hydrological map of Sudan. South Sheet, series 2201. Ministry of Energy and Mines, Geological and Mineral Resources Department, Khartoum, SudanGoogle Scholar
- Westcot DW (1997) Quality control of wastewater for irrigated crop production. FAO water report 10, Food and Agriculture Organization of the United Nations, RomeGoogle Scholar
- WHO (2011) Guidelines for drinking-water quality, 4th edn. WHO, GenevaGoogle Scholar
- WHO (2014a) Early warning and disease surveillance bulletin (IDP camps and communities) 11–17 August 2014. http://www.who.int/hac/crises/ssd/south_sudan_ewarn_17august2014.pdf?ua=1. Accessed September 2014
- WHO (2014b) South Sudan Country Profile. http://www.who.int/countries/ssd/en/. Accessed 30 September 2014
- WHO/UNICEF (2012) Progress on drinking water and sanitation Joint Monitoring Programme update 2012. UNICEF, New York and WHO, GenevaGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.