Environmental Science and Pollution Research

, Volume 26, Issue 3, pp 2105–2119 | Cite as

Validating a continental-scale groundwater diffuse pollution model using regional datasets

  • Issoufou OuedraogoEmail author
  • Pierre Defourny
  • Marnik Vanclooster
Groundwater under threat from diffuse contaminants: improving on-site sanitation, agriculture and water supply practices


In this study, we assess the validity of an African-scale groundwater pollution model for nitrates. In a previous study, we identified a statistical continental-scale groundwater pollution model for nitrate. The model was identified using a pan-African meta-analysis of available nitrate groundwater pollution studies. The model was implemented in both Random Forest (RF) and multiple regression formats. For both approaches, we collected as predictors a comprehensive GIS database of 13 spatial attributes, related to land use, soil type, hydrogeology, topography, climatology, region typology, nitrogen fertiliser application rate, and population density. In this paper, we validate the continental-scale model of groundwater contamination by using a nitrate measurement dataset from three African countries. We discuss the issue of data availability, and quality and scale issues, as challenges in validation. Notwithstanding that the modelling procedure exhibited very good success using a continental-scale dataset (e.g. R2 = 0.97 in the RF format using a cross-validation approach), the continental-scale model could not be used without recalibration to predict nitrate pollution at the country scale using regional data. In addition, when recalibrating the model using country-scale datasets, the order of model exploratory factors changes. This suggests that the structure and the parameters of a statistical spatially distributed groundwater degradation model for the African continent are strongly scale dependent.


Groundwater nitrate Random Forest (RF) Validation Scale issue Country Africa 



This study was carried out within the framework of a doctoral research programme, and has been supported by the Islamic Development Bank (IDB) under the Merit Scholarship Programme (MSP) for theses and the ‘Fonds Spécial de Recherche’ (FSR) of the Université Catholique de Louvain. Several people from across the world helped with data acquisition, namely T. Gleeson (McGill University), N. Moosdorf (Hamburg University), and M. Cissé (DGPRE/Senegal).


  1. Aljazzar, T. H., (2010). Adjustment of DRASTIC Vulnerability Index to Assess Groundwater Vulnerability for Nitrate Pollution Using the Advection-Diffusion Cell. Von der Fakultät für Georessourcen und Materialtechnik der Rheinisch-Westfälischen Technischen Hochschule Aachen Ph.D. thesis. 146pp.Google Scholar
  2. Ateawung, J. N. (2010). A GIS based water balance study of Africa. Master of physical land resources, Universiteit Gent Vrije Universiteit Brussel Belgium.55ppGoogle Scholar
  3. Barrio I, Arostegui I, Quintana JM (2013) Use of generalised additive models to categorise continuous variables in clinical prediction. BMC Med Res Methodol 13(1):83. Google Scholar
  4. Bartram, J. and Ballance, R. [Eds] (1996). Water quality monitoring: a practical guide to the design and implementation of freshwater quality studies and monitoring programmes. Chapman and Hall, London. (Accessed online April 25th,2017).
  5. Bauder JW, Sinclair KN, Lund RE (1993) Physiographic and land use characteristics associated with nitrate nitrogen-nitrogen in Montana groundwater. J Environ Qual 22(2):255–262. Google Scholar
  6. Beven KJ (1993) Estimating transport parameters at the grid scale: on the value of a single measurement. J Hydrol 143(1-2):109–123. Google Scholar
  7. Böhlke JK (2002) Groundwater recharge and agricultural contamination. Hydrogeol J 10(1):153–179. Google Scholar
  8. Booker DJ, Snelder TH (2012) Comparing methods for estimating flow duration curves at ungauged sites. J Hydrol 434:78–94. Google Scholar
  9. Boy-Roura, M. (2013). Nitrate groundwater pollution and aquifer vulnerability: the case of the Osana region. PhD thesis. Universitat de Girona. 143ppGoogle Scholar
  10. Boy-Roura M, Nolan BT, Menció A, Mas-Pla J (2013) Regression model for aquifer vulnerability assessment of nitrate pollution in the Osona region (NE Spain). J Hydrol 505:150–162. Google Scholar
  11. Breiman L (2001b) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231. Google Scholar
  12. Breiman, L., (2001a). Random forests. Mach. Learn. 45, 5–32. Doi: ( Acccesed online June, 21st 2016).
  13. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group, Belmont, CaliforniaGoogle Scholar
  14. Chapman, D. (1996). Water quality assessments—a guide to use of biota, sediments, and water in environmental monitoring—second edition. 1996, 651 pages published on behalf of WHO by F & FN Spon. (accessed online March18th 2017).
  15. Charrière S, Aumond C (2016) Managing the drinking water catchment areas: the French agricultural cooperatives feed back. Environ Sci Pollut Res 23(11):11379–11385. Google Scholar
  16. Constant T, Charrière S, Lioeddine A, Emsellem Y (2016) Use of modeling to protect, plan, and manage water resources in catchment areas. Environ Sci Pollut Res 23(16):15841–15851. Google Scholar
  17. Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792. Google Scholar
  18. Davis DB, Sylvester-Bradley R (1995) The contribution of fertiliser nitrogen to leachable nitrogen in the UK: a review. J Sci Food Agric 68(4):399–406. Google Scholar
  19. De’ath G (2002) Multivariate regression trees: a new technique for modeling species–environment relationships. Ecology 83(4):1105–1117. Stable URL Google Scholar
  20. De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81(11):3178–3192. [3178:CARTAP]2.0.CO;2 Google Scholar
  21. Destouni G (1993) Stochastic modelling of solute flux in the unsaturated zone at the field scale. J Hydrol 143(1–2):45–61. Google Scholar
  22. Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC bioinformatics 7(1):3. Google Scholar
  23. Donigan, A.S., Jr., and Rao, P.S.C. (1986). Examples models testing studies in vadose zone modelling of organic pollutants. Edited by S.C. Hem and S.LM Melancon, PP103–131, Lewis Publ., Chelsea, MI.Google Scholar
  24. Dupas R, Curie F, Gascuel-Odoux C, Moatar F, Delmas M, Parnaudeau V, Durand P (2013) Assessing N emissions in surface water at the national level: comparison of country-wide vs. regionalized models. Sci Total Environ 443:152–162. Google Scholar
  25. El-Sadek, A. A. M. (2002). Engineering approach to water quantity and quality modelling at field and catchment scale. Ph.D. thesis. Katholieke Universiteit Leuven.251pp.Google Scholar
  26. Evans JS, Murphy MA, Holden ZA, Cushman SA (2011) Modelling species distribution and change using the random forest. In: Drew CA, Wiersma YF, Huettmann F (eds) Predictive species and habitat modeling in landscape ecology. Springer, New York, pp 139–159. Google Scholar
  27. Fekete A, Damm M, Birkmann J (2010) Scales as a challenge for vulnerability assessment. Nat Hazards 55(3):729–747. Google Scholar
  28. Foster SSD (2000) Assessing and controlling the impacts of agriculture on groundwater—from barley barons to beef bans. Q J Eng Geol Hydrogeol 33(4):263–280. Google Scholar
  29. Foster, S.; Garduño,H., Kemper, L., Tuinhof, A., Nanni, M., Dumars, C. (2003). Groundwater quality protection defining strategy and setting priorities. Briefing note 8.6pp. Accessed online march 6th, 2017).
  30. Gemitzi A, Petalas C, Pisinaras V, Tsihrintzis A (2009) Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: an application to south Rhodope aquifer (Thrace, Greece). Hydrol Process 23(3):372–383. Google Scholar
  31. Grömping U (2009) Variable importance assessment in regression: linear regression versus random Forest. Am Stat 63(4):308–319. Google Scholar
  32. Gross, E. L. (2008). Ground water susceptibility to elevated nitrate concentrations in South Middleton Township, Cumberland County, Pennsylvania. Master of Science. Shippensburg University. 117pp.; accessed online July 6th, 2015).
  33. Gubler S, Fiddes J, Keller M, Gruber S (2011) Scale-dependent measurement and analysis of ground surface temperature variability in alpine terrain. Cryosphere 5(2):431–443. Google Scholar
  34. Gurdak JJ, Qi SL (2012) Vulnerability of recently recharged groundwater in principal [corrected] aquifers of the United States to nitrate contamination. Environ Sci Technol 46(11):6004–6012. Google Scholar
  35. Gurdak JJ, Geyer GE, Nanus L, Taniguchi M, Corona CR (2016) Scale dependence of controls on groundwater vulnerability in the water–energy–food nexus. California Coastal Basin aquifer system Journal of Hydrology: Regional Studies 11:126–138. Google Scholar
  36. Gurdak JJ (2014) Groundwater vulnerability handbook of engineering hydrology. CRC Press, Taylor & Francis Group 2014:33Google Scholar
  37. Haller, L., McCarthy, P., O'Brien, T., Riehle, J. and Stuhldreher, T. (2013). Nitrate pollution of groundwater. 2014: alpha water systems INC.Google Scholar
  38. Hamza M, Larocque D (2005) An empirical comparison of ensemble methods based on classification trees. J Statist Comput Simulat 75(8):629–643. Google Scholar
  39. Hartmann J, Moosdorf N (2012) The new global lithological map database GLiM: a representation of rock properties at the earth surface. Geochem Geophys Geosyst 13(12):Q12004. Google Scholar
  40. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer. isbn:0-387-95284-5Google Scholar
  41. Heidema AG, Boer JMA, Nagelkerke N, Mariman ECM, van der, A.D.L., Feskens, E.J.M. (2006) The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet 7(1):23.
  42. Heuvelink GBM, Pebesma EJ (1999) Spatial aggregation and soil process modelling. Geoderma 89: 47–65.
  43. Jones MJ (1985) The weathered zone aquifers of the basement complex areas of Africa. Q J Eng Geol Hydrogeol 18:35–46. Google Scholar
  44. Jung YY, Koh DC, Park WB, Ha K (2016) Evaluation of multiple regression models using spatial variables to predict nitrate concentrations in volcanic aquifers. Hydrol Process 30(5):663–675. Google Scholar
  45. Knudby A, Brenning A, LeDrew E (2010) New approaches to modelling fish-habitat relationships. Ecol Model 221(3):503–511. Google Scholar
  46. Kulabako N, Nalubega M, Thunvik R (2007) Study of the impact of land use and hydrogeological settings on the shallow groundwater quality in a peri-urban area of Kampala, Uganda. Sci Total Environ 381(1):180–199. Google Scholar
  47. Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range shifts: model differences and model reliability. Glob Change Biol 12(8):1568–1584. Google Scholar
  48. Li X, Zhai T, Jiao Y, Wang G (2015) Using Bayesian hierarchical models and random forest algorithm for habitat use studies: a case of nest site selection of the crested ibis at regional scales. PeerJ PrePrints 3:e871v1.
  49. Liaw, A., Wiener, M., (2002). Classification and regression by random forest. Vol. 2/3, December 2002. (accessed online April, 16th 2017).
  50. MacDonald, A. (2010). Groundwater, health, and livelihoods in Africa. British Geological Survey © NERC 2010 Earthwise 26, 2pp. ORAL PRESENTATION. (Accessed online January 28th 2016).
  51. MacDonald AM, Bonsor HC, Dochartaigh BÉÓ, Taylor RG (2012) Quantitative maps of groundwater resources in Africa. Environ Res Lett 7(2):024009. Google Scholar
  52. MacDonald, A., M., R. Taylor, G., and H. Bonsor, C. (2013). (Eds.) Groundwater in Africa—is there sufficient water to support the intensification of agriculture from “Land Grabs”." Hand book of land and water grabs in Africa. pp 376–383Google Scholar
  53. MacDonald A, Davies J, Calow R (2008) African hydrogeology and rural water supply, Applied groundwater studies in Africa. IAH selected papers on hydrogeology, volume 13 (ed. by S. M. A. Adelana & a. M. MacDonald). CRC Press/Balkema, Leiden, The NetherlandsGoogle Scholar
  54. MacDonald AM, Davies J (2000) A brief review of groundwater for rural water supply in sub-Saharan Africa, British Geological Survey, technical report WC/00/33. Overseas Geology Series, BGS, Nottingham, UKGoogle Scholar
  55. Margat, J. (2010). Ressources et utilisation des eaux souterraines en Afrique. Managing Shared Aquifer Resources in Africa, Third International Conférence Tripoli 25–27 may 2008. International Hydrological Programme, Division of Water Sciences, IHP-VII Series on groundwater No.1, UNESCO, pp 26–34Google Scholar
  56. Mfumu KA, Ndembo LJ, Vanclooster M (2016) Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo. Hydrogeol J 24(2):425–437. Google Scholar
  57. Mulla DJ, Addiscott TM (1999) Validation approaches for field-, basin-, and regional-scale water quality models. Assessment of non-point source pollution in the vadose zone:63–78.
  58. National Research Council (NRC), (1993). Ground water vulnerability assessment: Predictive relative contamination potential under conditions of uncertainty. National Academy Press, Washington D.C., pp. 224. ISBN: 978–0–309-04799-9Google Scholar
  59. Nolan BT, Hitt KJ (2006) Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environmental Science & Technology 40(24):7834–7840. Google Scholar
  60. Nolan BT, Gronberg JM, Faunt CC, Eberts SM, Belitz K (2014) Modeling nitrate at domestic and public-supply well depths in the Central Valley, California. Environmental science & technology 48(10):5643–5651. Google Scholar
  61. Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JM (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random Forest. For Ecol Manag 275:117–129. Google Scholar
  62. Ouedraogo I, Vanclooster M (2016a) A meta-analysis and statistical modelling of nitrates in groundwater at the African scale. Hydrology and Earth System Sciences, Vol 20, no6 20(6):2353–2381. Google Scholar
  63. Ouedraogo I, Vanclooster M (2016b) Shallow groundwater poses pollution problem for Africa. In: SciDev.Net, p 4. Google Scholar
  64. Ouedraogo, I., Defourny, P., and Vanclooster, M.(2016a). Modeling groundwater nitrate concentrations at the African scale using random forest regression techniques. Accepted April 24th to review in the special issue on groundwater in sub-Saharan Africa for Hydrogeological Journal (HJ) (in progress, book expected in December 2017).Google Scholar
  65. Ouedraogo I, Defourny P, Vanclooster M (2016b) Mapping the groundwater vulnerability for pollution at the pan-African scale. Sci Total Environ 544:939–953. Google Scholar
  66. Pearson S (2015) Identifying groundwater vulnerability from nitrate contamination: comparison of the DRASTIC model and environment Canterbury’s method. Lincoln University, Degree of Master of Applied Science (Environmental Management), 58 ppGoogle Scholar
  67. Postnote (2011). Water Adaptation in Africa. Number 373 April 2011. (Accessed online January 26th, 2016)
  68. Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems (N.Y.), 9(2): 181–199.
  69. Puckett LJ, Tesoriero AJ, Dubrovsky NM (2011) Nitrogen contamination of surficial aquifers-a growing legacy. Environ Sci Technol 45(3):839–844. Google Scholar
  70. Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis, a research tool, springer, 658p. Google Scholar
  71. Refsgaard JC, Thorsen M, Jensen JB, Kleeschulte S, Hansen S (1999) Large scale modelling of groundwater contamination from nitrate leaching. J Hydrol 221(3):117–140.
  72. Refsgaard, J.C., and Butts, M.B. (1999). Determination of grid scale parameters in catchment modelling by upscaling local scale parameters. Proceeding of the Int. workshop on modelling transport process in soils. EurAEng’s IG on soil and water, Leuven, Belgium, 24-26 Nov., 650-665Google Scholar
  73. Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (southern Spain). Sci Total Environ 476-477:189–206. Google Scholar
  74. Royal Society of Chemistry (RSC) (2010) Africa’s water quality. Last accessed August 2016
  75. Schwarz GE, Richard BA, Smith RA, Preston SD (2011) The regionalization of National-Scale SPARROW models for stream nutrients. Journal of the American Water Resources Association (JAWRA) 47(5):1151–1172. Google Scholar
  76. Shamsudduha M, Taylor RG, Chandler RE (2015) A generalized regression model of arsenic variations in the shallow groundwater of Bangladesh. Water Resour Res 51(1):685–703. Google Scholar
  77. Sharaky, A. M. (2016). Geology and water resources in Africa. Institute of African Research and Studies. The university of Cairo. 40pp (accessed online 19th August 2016)
  78. Spalding RF, Exner ME (1993) Occurrence of nitrate in groundwater- a review. J Environ Qual 22(392–402).
  79. Strebel, O., Duynisveld, W. H. M., and Böttcher, J. (1989). Nitrate pollution of groundwater in Western Europe, Agric. Ecosyst. Environ. 26, 189–214.
  80. Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC bioinformatics 8(1):25. Google Scholar
  81. UNEP (United Nations Environment Programme). (2010). Africa Water Atlas. Nairobi, UNEP, Division of Early Warning and Assessment (DEWA). africaWater/book.php.
  82. UNEP/DEWA, (2014). Sanitation and Groundwater Protection –a UNEP Perspective UNEP/DEWA, 18pp (Accessed online August 14th 2014).
  83. Wakida FT, Lerner DN (2005) Non-agricultural sources of groundwater nitrate: a review and case study. Water Res 39(1):3–16. Google Scholar
  84. Ward MH, deKok TM, Levallois P, Brender J, Gulis G, Nolan BT, VanDerslice J (2005) Workgroup report: drinking-water nitrate and health—recent findings and research needs. Environ Health Perspect 113(11):1607–1614.
  85. Wheeler DC, Nolan BT, Flory AR, DellaValle CT, Ward MH (2015) Modeling groundwater nitrate concentrations in private wells in Iowa. Sci Total Environ 536:481–488. Google Scholar
  86. WHO (1992). GEMS/WATER Operational Guide. Third edition. World Health Organization, Geneva. 121pp. (Accessed online March 18th 2017)
  87. Xu Y, Usher B (2006) Groundwater pollution in Africa. Taylor&Francis/Balkema, The Netherlands, 353pp. Google Scholar
  88. Yee TW, Mitchell ND (1991) Generalized additive models in plant ecology. Journal of vegetation science, 2(5), 587-602. ISO 690.
  89. Zhao C, Liu C, Xia J, Zhang Y, Yu Q, Eamus D (2012) Recognition of key regions for restoration of phytoplankton communities in the Huai River basin, China. J Hydrol 420:292–300.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Issoufou Ouedraogo
    • 1
    Email author
  • Pierre Defourny
    • 1
  • Marnik Vanclooster
    • 1
  1. 1.Earth and Life InstituteUniversité catholique de LouvainLouvain-la-NeuveBelgium

Personalised recommendations