Advertisement

Environmental Science and Pollution Research

, Volume 26, Issue 3, pp 2105–2119 | Cite as

Validating a continental-scale groundwater diffuse pollution model using regional datasets

  • Issoufou OuedraogoEmail author
  • Pierre Defourny
  • Marnik Vanclooster
Groundwater under threat from diffuse contaminants: improving on-site sanitation, agriculture and water supply practices

Abstract

In this study, we assess the validity of an African-scale groundwater pollution model for nitrates. In a previous study, we identified a statistical continental-scale groundwater pollution model for nitrate. The model was identified using a pan-African meta-analysis of available nitrate groundwater pollution studies. The model was implemented in both Random Forest (RF) and multiple regression formats. For both approaches, we collected as predictors a comprehensive GIS database of 13 spatial attributes, related to land use, soil type, hydrogeology, topography, climatology, region typology, nitrogen fertiliser application rate, and population density. In this paper, we validate the continental-scale model of groundwater contamination by using a nitrate measurement dataset from three African countries. We discuss the issue of data availability, and quality and scale issues, as challenges in validation. Notwithstanding that the modelling procedure exhibited very good success using a continental-scale dataset (e.g. R2 = 0.97 in the RF format using a cross-validation approach), the continental-scale model could not be used without recalibration to predict nitrate pollution at the country scale using regional data. In addition, when recalibrating the model using country-scale datasets, the order of model exploratory factors changes. This suggests that the structure and the parameters of a statistical spatially distributed groundwater degradation model for the African continent are strongly scale dependent.

Keywords

Groundwater nitrate Random Forest (RF) Validation Scale issue Country Africa 

Introduction

Throughout the world, groundwater is an important source of fresh water, used by industry, agriculture and domestic users. However, worldwide, groundwater systems are experiencing increasing threat from and risk of pollution from agricultural activities, urbanisation and industrial development (Foster et al. 2003; Aljazzar 2010; Charrière and Aumond 2016; Constant et al. 2016). According to Gurdak (2014), all groundwater resources are vulnerable to nonpoint source (NPS) contamination. Diffuse NPS pollution from farming activities and point source pollution from sewage treatment and industrial discharge are the principal contaminant sources (Boy-Roura 2013). One of the most common and persistent problems of groundwater pollution is associated with diffuse pollution generated through the intensification of agricultural activities over the last decades, with increased use of chemical fertilisers and higher concentrations of animal excrement in smaller areas (Boy-Roura 2013). Agricultural land use leads to elevated concentrations of nutrients. According to Haller et al. (2013), on a global scale agricultural land use represents the largest diffuse pollution threat to groundwater quality. Elevated concentrations of nutrients (especially nitrogen and phosphorus) can cause a variety of problems, including degradation of ecosystems (for example, eutrophication of water bodies), and human health issues. Nitrate is the most ubiquitous nonpoint (NPS) contaminant of groundwater resources worldwide (Spalding and Exner 1993). Nitrate ingestion has been linked to methemoglobinemia, adverse reproductive outcomes, and specific cancers (Ward et al. 2005).

In Africa, groundwater is a crucial natural resource supporting the development of the continent, but it is also subject to many pressures. Two main threats are overexploitation and contamination (MacDonald et al. 2013). The pressures exerted by the agricultural sector on groundwater are of primary concern (Xu and Usher 2006; Sharaky 2016). In particular, shallow groundwater systems are vulnerable to pollution (Ouedraogo et al. 2016b; Ouedraogo and Vanclooster 2016b).

In the overwhelming majority of African water bodies, nitrates remain among the most critical pollutants, and the level of contamination is increasing (Spalding and Exner 1993; Puckett et al. 2011). Nitrates in groundwater are derived from various point and diffuse sources but mainly originates from the extensive use and release of anthropogenic nitrogen compounds in agricultural and urban environments (Strebel et al. 1989; Spalding and Exner 1993; Foster 2000; Böhlke 2002; Wakida and Lerner 2005). The presence of nitrates also depends on environmental attributes, such as soil type, climatology, hydrogeology, and others (Davis and Sylvester-Bradley 1995; Nolan and Hitt 2006; Kulabako et al. 2007; Boy-Roura et al. 2013; UNEP/DEWA 2014; Nolan et al. 2014; Pearson 2015; Wheeler et al. 2015; Ouedraogo and Vanclooster 2016a). In this regard, reliable predictions of nitrate concentrations in groundwater, in terms of land use or agricultural practices, are essential for groundwater development programs. Indeed, the ability to predict groundwater quality is a key for designing sustainable land and water management programs. Yet, at present, there are few regional- or continental-scale nitrate groundwater pollution studies for Africa.

Statistical data modelling can help to improve our understanding of the key processes involved in nitrate contamination of groundwater. Because statistical approaches differ in their ability to model relationships, an evaluation of different statistical approaches can provide insights into which approach is most appropriate for modelling groundwater quality. To this end, numerous studies have compared a suite of statistical approaches, including linear models (Bauder et al.1993; Rawlings et al. 1998; Boy-Roura et al. 2013; Jung et al. 2016), generalised linear models (Shamsudduha et al. 2015), generalised additive models (Yee and Mitchell 1991; Barrio et al. 2013), artificial neural networks (Gemitzi et al. 2009), classification and regression trees, multivariate regression trees and other highly computational statistical methods such as Random Forest (RF) methods (Breiman et al. 1984; De’ath and Fabricius 2000; De’ath 2002; Breiman 2001a; Evans et al. 2011). Generally, tree-based classification approaches behave better, because they enable the incorporation of complex nonlinear processes into the statistical model. Among these tree-based approaches, RFs often perform well (e.g. Lawler et al. 2006; Prasad et al. 2006; Knudby et al. 2010). RF is an ensemble learning method that combines multiple models built using bootstrap samples (Breiman 2001a). Ensemble learning techniques generate many classifiers and aggregate their results (Liaw and Wiener 2002). RF consists of a compilation of regression trees (e.g. 1000 trees in a single RF) and is empirically proven to be better than its individual constituent parts (Hamza and Larocque 2005).

In a previous study, we developed a statistical model based on a meta-database of nitrates obtained from across Africa (Ouedraogo et al. 2016a; under review). Using a cross-validation approach, this model allowed us to predict the spatial patterns of continental-scale groundwater degradation as observed in the meta-database (R2 = 0.97). In the present study, we attempt to evaluate the predictive ability of the continental-scale RF model to the regional scale by using independently collected regional datasets. To this end, we used groundwater nitrate measurement datasets for three African countries: Senegal, South-Africa and Burkina Faso.

Data and methods

Study area

The large-scale model was developed for the whole African continent, except Madagascar.The variation of physiographic attributes for this continent is large. The elevation of the African continent varies from below sea level to 5825 m above sea level, with an average elevation of approximately 651 m (Ateawung 2010). Africa’s surface water resources comprise a total of 63 international rivers basins, covering 64% of its land area and containing 93% of total surface water resources (UNEP 2010). These river basins are also home to some 77% of the population according to UNEP (2010). Surface water resources in Africa are predominantly transboundary, with most situated in the central and south-eastern regions of the continent, reflecting the spatial pattern of rainfall (Postnote 2011). According to the Royal Society of Chemistry (2010), around 50% of Africa’s total surface water resources are generated in the Congo basin alone. Africa has a vast array of drainage networks, the most important ones being associated with the Nile River, which drains northeast and empties into the Mediterranean Sea. The African continent is not blessed by a large quantity of groundwater resources, because it is the world’s second-driest continent after Australia and water resources are limited (UNEP 2010). MacDonald et al. (2012) have estimated the volume of groundwater resource in Africa at 0.66 million km3. However, groundwater is Africa’s most precious natural resource, providing reliable water supplies to at least a third of the continent’s population (MacDonald 2010). The proportion is higher in some arid and semi-arid countries. For example, in the case of Libya, this proportion is as high as 95% (Margat 2010). Groundwater occurrence depends primarily on geology, geomorphology/weathering and rainfall (both current and historic). The interplay of these factors gives rise to complex hydrogeological environments with countless variations in the quantity, quality, ease of access and renewability of groundwater resources. The geology of the African continent contains 13 lithological classes (Fig. 1) with varying coverages: evaporites (0.6%); metamorphic rocks (27.6%); acid plutonic rocks (1.1%); basic plutonic rocks (0.2%); intermediate plutonic rocks (0.1%); carbonate sedimentary rocks (9.4%); mixed sedimentary rocks (6.4%); siliciclastic sedimentary rocks (16.4%); unconsolidated sediments (35.1%); acid volcanic rocks (0.1%); basic volcanic rocks (3.3%); intermediate volcanic rocks (0.6%) and water bodies (0.9%) (Hartmann and Moosdorf 2012). The lithology describes the geochemical, mineralogical and physical properties of rocks. The African continent hydrogeology has been summarised at regional levels by many authors such as Jones 1985; MacDonald and Davis 2000; MaDonald et al. 2008). In 2012, MacDonald et al. distinguish across Africa five important hydrogeological environments: Precambrian crystalline basement rocks; consolidated sedimentary rocks; volcanic rocks; unconsolidated sediments; unconsolidated sediments in river valley.
Fig. 1

The lithological context of the African continent (from Hartmann and Moosdorf 2012)

Measurement of nitrate data in groundwater

Nitrate measurements in groundwater were compiled for South Africa, Senegal and Burkina Faso. A summary of the basic statistics and sources of nitrates collected is given in Table 1. The nitrates in groundwater were determined from several stations (wells, boreholes and springs) that are very often used for drinking water supply. Comparability of water quality data from different laboratories can only be ensured if it is identical, or at least if similar methods are used (Chapman 1996). There are many comprehensive standard manuals and guidebooks describing laboratory methods in detail, such as the GEMS/WATER Operational Guide (WHO 1992) and the practical guide to the methods discussed in this volume (Bartram and Ballance 1996). In our case, no standard guidance was followed on methods for the collection and interpretation of the data, although such guidance would clearly be beneficial and help to eliminate much of the subjectivity introduced by the dataset.
Table 1

Summary statistics and sources of compiled nitrate (NO3) measurements

Country

Number of samples

Min.

Mean

Max.

Date of collection

Sources/references

mg/L

Burkina Faso

9049

11

55.10

1282

2009

Contact: Mme Zougrana Jacqueline, aDEIE/Burkina Faso. Email: zougjac@yahoo.fr

Senegal

1332

0.02

19.37

889.7

1952–2009

Contact: Mr. Moussa Cissé/DGPREb, Senegal, Email: scissemoussa@yahoo.fr

South Africa

2923

50.1

126.79

1599

1994–2009

https://ggis.un-igrac.org/ggis-viewer/viewer/groundwaterafrica/public/default

aDEIE: Direction des Etudes et de l’Information sur l’Eau

bDGPRE: Direction de la Gestion et de la Planification des Ressources en Eau

Data quality control is a complex and time-consuming activity which must be undertaken continuously to ensure meaningful water quality assessments (Chapman 1996). Every stage of data handling increases the risk of introducing errors. Most risks are associated with human error during written transcription or ‘keying-in’ via a computer keyboard. Possible sources of errors in the collected samples could be (i) lack of trained and experienced data collectors, i.e. the laboratory personnel should be sufficiently trained and qualified to carry out the necessary analytical operations properly; (ii) lack of good quality supervision, for example not enough time allocated to supervision or a high ratio of data collectors to supervisors; or (iii) errors in data handling operations, such as data entry/omission in data reports, or when checking and validating measurement data. Therefore, in order to control the quality of the data, all datasets were analysed and filtered to eliminate at least some bias. For example, we eliminated all negative and zero values of nitrates recorded in the datasets. Nitrate data are principally collected at given geographical locations in the groundwater. Thus, the longitude and latitude of the sampling or measurement sites (x and y coordinates) which did not have a nitrate value reported were deleted. Out of a total of 37,382 samples for Burkina Faso, we retained 9049 samples after evaluation. As another example, in the Burkina Faso dataset, we observed a maximum value concentration of 55,550 mg/L and decided to delete this value. In this case, we assumed an error in this reported value or measurement, because of the large difference between this value and the second maximum value in the dataset (in this case 1282 mg/L). Hence, concentrations of nitrate in the Burkina Faso dataset ranged from 11 to 1282 mg/L, with a mean concentration of 55.10 mg/L. Out of a total of 2913 samples, 2813 were used in the South Africa case, where nitrate concentrations ranged from 50.1 to 1599 mg/L with a mean concentration of 126.79 mg/L. For Senegal, out of a total of 3721 samples, we kept 1332 after evaluation. For this country, nitrate concentrations ranged from 0.02 to 889.7 mg/L, with a mean concentration of 19.37 mg/L. We observed that the mean concentration of nitrates in groundwater for the Burkina Faso and South Africa datasets exceeds the World Health Organisation (WHO) drinking water standard of 50 mg/L. Furthermore, we observed that the maximum nitrate concentration for all these countries is very high. These high values of nitrate concentration demonstrate that the problem of nitrate pollution in African countries is very acute. The spatial distribution of the collected data is illustrated in Fig. 2a–c.
Fig. 2

a Distribution of nitrate concentration observed in groundwater in South Africa. b Distribution of nitrate concentration observed in groundwater in Burkina Faso. c Distribution of nitrate concentration observed in groundwater in Senegal

Examining the distribution of nitrate data

The Q-Q plot, or quantile-quantile plot, is a graphical tool to assess the theoretical distribution of a data set. For example, if we run a statistical analysis that assumes our dependent variable is normally distributed, we can use a normal Q-Q plot to check that assumption. If the data does indeed follow the assumed distribution, then the points on the Q-Q plot will approximately fall on a straight line. The distribution of groundwater degradation parameters is often skewed. The log transformation of the original data is therefore often used to reduce that skewness. We applied the log transformation and used a ‘qqnorm’ function in R to visualise the distribution of our data. Theoretical quantiles of a normal distribution versus sample quantiles for the three regional datasets were checked, as shown in Fig. 3. Figure 3 shows that even with a logarithmic transformation, the assumption of normality does not appear to be satisfied for the regional datasets. This is in contrast to the results obtained from a meta-analysis at the continental scale (Ouedraogo et al. 2016a; under review). The Q-Q plot for Burkina Faso (Fig. 3a) shows a staircase pattern of the distribution, which means that some values are discrete. In other words, this Q-Q plot is obviously very different from a linear trend line and data are not lognormally distributed. Neither does the Q-Q plot for the South Africa dataset support the lognormal distribution hypothesis (Fig. 3b). A remarkable feature of the Q-Q plot for the Senegal dataset (Fig. 3c) is the prominent lower tail anomaly.
Fig. 3

Q-Q plots for the three countries’ datasets: a Burkina Faso, b South Africa and c Senegal

The non-normality of the data poses problems with parametric methods such as multiple linear regression analysis. We therefore suggested to use the non-parametric RF algorithm, which is not constrained by the non-normality of the data.

Environmental variables

In addition to the nitrate measurement datasets, we also collected a total of 13 spatial attributes, extracted from several high-resolution databases covering physical and anthropogenic attributes. These spatial attributes are related to land use, soil type, hydrogeology, topography, climatology, etc. Table 2 presents the thirteen explanatory variables, their spatial resolution and their various main sources. All explanatory variables were integrated into a Geographical Information System (GIS) and processed in ArcGIS10.3™ in a raster format of 15 × 15 km2 spatial resolution. This resolution was found to be the best compromise, considering the resolution of the different available datasets, the large extent of the study area and the performance of our computers.
Table 2

Sources of the collected pan-African scale databases related to environmental parameters

Explanatory variables

Type

Units or categories

Spatial resolution/Scale

Date

Data source(s)

Land cover/land use

Categorical data

300 m

2014

aUCL/ELIe-Geomatics (Belgium)

Population density

Continuous point data

People/km2

2.5 km

2004

ESRI: www.arcgis.com/home

Nitrogen application

Continuous point data

kg/ha

0.5° × 0.5°

2009

bSEDAC: www.sedac.ciesin.columbia.edu

Climate class data

Categorical data

0.5°

1997

Global-Aridity values (UNEP, 1987)/ (UNESCO-IHE, Delft, The Netherlands)

Type of regions

Categorical data

0.5°

2014

Global-Aridity values (UNEP, 1987)/ (UNESCO-IHE, Delft, The Netherlands)

Rainfall class

Categorical data

mm/year

3.7 km

1986

UNEP: http://www.grid.unep.ch

Depth to groundwater

Categorical data

m

0.05° × 0.05°

2012

British Geological Survey: www.bgs.ac.uk/

Aquifer type

Categorical data

1:3,750,000

2012

cGLiM data (Hamburg University)

Soil type

Categorical data

1 km × 1 km

2014

ISRIC, World Soil Information: www.isric.org/content/soilgrids

Unsaturated zone (impact of vadose zone)

Categorical data

1:3,750,000

2012

GLiM data (Hamburg University)

Topography/slope

Continuous point data

Percentage (%)

90 m

2000

UCL/ELIe-Geomatics (Belgium) and dCGIAR/CSI (SRTM data)

Recharge

Continuous point data

mm/year

5 km

2008

Global-scale modelling of groundwater recharge

(University of Frankfurt)

Hydraulic conductivity

Continuous point data

m/day

Average size of polygon ~ 100km2

2014

eGLHYMPS data (McGill University)

aUniversité Catholique de Louvain/Earth and Life Institute/Environmental Sciences

bSocioeconomic Data and Applications Center (SEDAC)

cThe new global lithological map database, GLiM: a representative of rock properties at the Earth’s surface

dConsultative Group for International Agricultural Research (CGIAR)/Consortium for Spatial Information (CSI)

eA glimpse beneath the Earth’s surface: Global Hydrogeology MaPS (GLHYMPS) of permeability and porosity

Model development and validation approach

In this study, we used the RF algorithm for regression tasks rather than classification tasks. A detailed description of the RF method is given in Breiman (2001a) and Culter et al. (2007). We present here a short summary of the RF method. The RF method (i) is non-parametric, (ii) does not over-fit, (iii) has high predictive power and (iv) provides additional pieces of information (e.g. the importance of variables). The philosophy behind ensemble learning techniques, like RF, is based on the premise that its accuracy is higher than other machine learning algorithms because the combination of predictions performs more accurately than any single constituent model does (Rodriguez-Galiano et al., 2014). The individual decision trees in RF tend to learn highly irregular patterns, i.e. they overfit their training datasets. RF is a way of averaging multiple decision trees, trained on different parts of the same training dataset, with the goal of reducing the prediction variance (Hastie et al. 2008). RF modelling is appropriate for modelling the nonlinear effect of variables. It can handle complex interactions among variables, and is not affected by multicollinearity (Breiman 2001b). RF can assess the effects of all explanatory variables simultaneously and automatically rank the importance of these variables in descending order (Li et al. 2015). The algorithm for RF consists of building a forest of uncorrelated trees. Each individual tree is grown using a randomised subset of predictor variables. The trees are grown to the largest extent possible without pruning, and they are aggregated by averaging them. Out-of-bag (OOB) samples are used to calculate variable importance and to get an unbiased estimate of the test set error, which is one of the advantages of RF because it means that there is no need for cross-validation (Oliveira et al. 2012). The method essentially behaves as a ‘black box’ since the individual trees cannot be examined separately (Prasad et al. 2006) and it does not calculate regression coefficients nor confidence intervals (Cutler et al. 2007). Nevertheless, it allows the computation of variable importance measures that can be compared to other regression techniques (Grömping 2009). Within a very short period of time, RFs have become a major data analysis tool, and one which performs well in comparison with many standard methods (Heidema et al. 2006; Díaz-Uriarte et al. 2006) (such as linear regression) and complex models (such as artificial neural networks and support vector machines). What has greatly contributed to the popularity of RF is the fact that it can be applied to a wide range of prediction problems, even if they are nonlinear and involve complex higher-order interaction effects, and also that RF produces variable importance measures for each predictor variable (Strobl et al. 2007). The RF model has been successfully applied to various problems in the last few years, in (for example) genetic epidemiology, microbiology, ecology (Strobl et al. 2007) and other fields related to the environment and water resources (Booker and Snelder 2012; Zhao et al. 2012).

In our previous study (Ouedraogo et al. 2016a; under review), a continental-scale nonlinear RF statistical model was developed by using a meta-database. The model had an excellent prediction capacity when comparing predicted versus observed ln-transformed nitrate concentration values for the training data, based on only 80% of observations (R2 = 0.97) (see Fig. 4a) and predicted versus observed values for the test data, based on 20% of observations (R2 = 0.98) (see Fig. 4b). Due to this and to the good results obtained at the continental scale, we have also used an RF in the present study to model groundwater degradation at the country level. The predictive ability or validation consisted of a comparison of model prediction and observed nitrate. The original dataset was therefore randomly divided into calibration samples (80% of total samples) and validation samples (the remaining 20%). Given that the predictors include both discrete and continuous variables, we implemented the Classification And REgression Training (CARET) package to determine the importance of explaining factors in the predictive model. The CARET package uses a method recommend by Strobl et al. (2007) that take accounts for bias associated with disparity in the number of levels contained in factorial variables. Strobl et al. (2007) suggested this alternative variable importance measure with a large number of categorical variables, which are selected against with a traditional random forest approach. To this end, we classified the relative importance of each variable for Burkina Faso in more detail.
Fig. 4

RF regression for observed and predicted log (ln) nitrate concentration on training dataset (a) and tested dataset (b) using the continental-scale nitrate data set

All analyses were performed in the R statistical software version 3.2, using freely distributed packages.

Results

Validating the continental-scale model with three countries’ datasets

We evaluated the predictive ability of the continental-scale RF regression model at the country level using the NO3 data shown in Fig. 2a–c as the response variable. The validation results of this nonlinear method showed a poor performance at the country level compared to the calibration model (Ouedraogo et al. 2016a; under review). The scatterplots of predicted versus observed of nitrates showed the wide range of predictions. For example, in Burkina Faso, the prediction is quite modest at R2 = 0.23 (Fig. 5a). Furthermore, in Senegal (Fig. 5b) and South Africa (Fig. 5c), we observed a very poor predictive ability (R2 < =0.1), with a coefficient of determination of 0.09 and 0.003 respectively.
Fig. 5

a Validation results of the continental-scale RFR model on the observed groundwater nitrate of Burkina Faso. b Validation results of the continental-scale RFR model on the observed groundwater nitrate of Senegal. c Validation results of the continental-scale RFR model on the observed groundwater nitrate of South Africa

Re-calibration of nonlinear RF regression at a country level

We recalibrated the RF model for each country using the country specific measured nitrate data. The recalibration procedure used 80% of the country-specific data for the training dataset and 20% for validation. The results of these recalibrations showed variable results. For Burkina Faso (Fig. 6a), we obtained very good results for the recalibration (R2 = 0.91) and validation (R2 = 0.92) test (Fig. 6b). In contrast to this, the RF regression model failed to describe the data of the Senegal (Fig. 7a) and South African (Fig. 8a) datasets. For those latter datasets, we obtained R2 ≤ 0.2 in both the recalibration and the validation steps.
Fig. 6

Log nitrate of observed versus predicted values for Burkina Faso for the calibration dataset (a) and validation dataset (b)

Fig. 7

Log nitrate of observed versus predicted values for Senegal for the calibration dataset (a) and validation dataset (b)

Fig. 8

Log nitrate of observed versus predicted values for South Africa for the calibration dataset (a) and validation dataset (b)

Variable importance plots at country level

Figure 9 shows the ‘importance variable plots’ for Burkina Faso. These were produced by the CARET package, which portrays the importance of the first 20 levels of the factorial variable. Given the poor performance of the model for the Senegal and South Africa datasets, we cannot show this analysis for these two countries. By analysing variable importance in Burkina Faso, we found that nitrogen fertiliser application, population density and recharge had the highest scores among all variables. This corroborates earlier studies on nitrate pollution of African groundwater bodies (e.g. Mfumu et al. 2016).
Fig. 9

Variable importance plot for the RF model recalibrated for the Burkina Faso dataset

Discussion

Two prerequisites are necessary for large-scale modelling of nitrate pollution of groundwater on an operational basis (Refsgaard et al. 1999): firstly, access to readily available large-scale data of nitrate pollution and associated variables, making it possible to identify and validate an appropriate nitrate pollution model; and, secondly, an adequate scaling, making it possible to apply the identified and validated model at another scale. The first prerequisite is often challenging when assessing nitrate pollution of African groundwater bodies. Not all the existing ‘African’ nitrate databases are generally available for modelling due to various, often institutional, restrictions (e.g. not publicly accessible, or available but with poor data). Also, not all databases maintained by African institutions contain harmonised and integrated datasets. For example, many databases are not harmonised in their contents or nomenclatures.

In view of these limitations, the present work tested the predictive ability of a nonlinear RF statistical model developed at the pan-African scale (Ouedraogo et al. 2016a; under review) and applied at a national level. We conducted this experiment using data collected from Burkina Faso, Senegal and South-Africa. While good results were obtained from the calibration and validation runs at the pan-African scale (Ouedraogo et al. 2016a; under review), these results could not be replicated at the country scale. The country-scale validation of the continental-scale model produced a R2 of 0.003, 0.09 and 0.23 respectively for South Africa, Senegal and Burkina Faso. These poor results of the validation at the country scale show that the continental-scale model is not valid for making predictions at smaller scale. When recalibrating the model at the country scale, successful results could only be obtained for Burkina Faso. This shows that a similar model structure can be used to model nitrate pollution at the scale of Burkina Faso and the whole African continent, but that the model parameters are scale-variant. For Senegal and South Africa, the model structure of the continental-scale model cannot be used to predict pollution at the country scale. The regional model for Senegal and South Africa should therefore encompass other basin attributes for predicting nitrate concentrations (Schwarz et al. 2011; Dupas et al. 2013). For example, climate typically exhibits smaller variability within a region than across the continent (Schwarz et al. 2011). Hence, the role of climate in explaining nitrate contamination may become different at the regional scale as compared to the continental scale. The poor capacity to model nitrate pollution in Senegal and South Africa with a model type inferred from continental data may also be due to the poor quality of the data. Nitrate measurements collected for each country are derived from various climate zones, with no standard manuals or guidebooks describing the methods used to collect across the dataset. The data may therefore exhibit some bias, as we have mentioned in the “Study area” section. There is no guarantee that the available nitrate point data from Senegal and South Africa are truly representative. For example, for the nitrate dataset of Senegal, the author of the dataset has declared that errors exist in the recorded data. For example, on the IGRAC (International Groundwater Resources Assessment Centre) Website, the data provider mentions: ‘No additional quality checks were performed and data should be used with caution. SADC-GMI and IGRAC accept no responsibility for accuracy of the data’. Further, the low predictive ability of the continental-scale model may be due to calibration issues. Indeed, by using logistic regression to explain non-point source (NPS) NO3 in groundwater, Gurdak and Qi (2012) argue that poorly predictive models are probably over-trained on the calibration data.

The rather good results when recalibrating the continental-scale model at the country scale for Burkina Faso shows that (when the previously mentioned pitfalls are addressed) the structure of the continental-scale model can be maintained, but that the model parameters are scale-variant. The importance of scale effects has long been recognised by hydrologists, water resources managers and other water practitioners (Refsgaard and Butts 1999; El-Sadek 2002; Gubler et al. 2011). Refsgaard and Butts (1999), for instance, affirm that model codes are generally scale-specific. According to Heuvelink and Pebsema (1999) (cited in El-Sadek 2002), there are three principal reasons for this: (i) different processes are important at different scales; (ii) input data availability is reduced at larger scales; and (iii) the model’s input and output undergo a change of ‘support’, i.e. the sample volume of field data changes between different spatial scales. Such general observations on scale dependency also hold for groundwater vulnerability and pollution modelling. Gurdak and Qi (2012), for instance, aimed to develop a statistical groundwater pollution model for the entire California Coastal Basin aquifer system (CCB) using explanatory variables that represent the source, transport and attenuation (STA) of nitrate contamination in groundwater. They used a model initially inspired by the factors included in the DRASTIC vulnerability model. The first iteration of the CCB model had a poor fit to observations at the validation wells (R2 = 0.064). Gurdak and Qi (2012) affirm that many DRASTIC model factors often identified as sensitive in national-scale assessments were found not to be important when modelling contamination at the scale of the CCB. They recommended, for instance, to include dissolved oxygen (DO) as a sensitive parameter in the CCB-scale assessment. This suggests that explanatory factors and vulnerability models are highly sensitive to the scale of application. Four years later, Gurdak et al. (2016) examined these scale-dependencies in more detail and further illustrated the scale-dependency of the model and explanatory variables. They found that important differences in controlling factors were identified between the CCB- and national-scale models. They concluded by asserting that good management and policy decisions are best supported by models developed at the same spatial scale as the decision-making scale. Similar conclusions were obtained by Gross (2008), who affirmed that the explanatory factors may include challenges associated with scale or spatial autocorrelation. Vulnerability assessments and scale are therefore highly intertwined (NRC 1993), not only in technical application but also in conceptualisation. Therefore, Fekete et al. (2010) proposed three recommendations to address scale issues in statistical groundwater pressure models: (i) scale implications (both benefits and drawbacks) need more attention and documentation within vulnerability studies; (ii) the choice of the appropriate spatial level driven mostly by data availability, policy demand and aim of the concept should also be supported by theoretical considerations; and (iii) the identification of appropriate types of scale (spatial, temporal) and types of nesting of phenomena (single-level, multi-level and cross-level) should be a prior step to the conceptualisation of a statistical groundwater pressure model.

In addition to the limitation related to the scale transition, we acknowledge that in our study the model is subjected to the classical uncertainties and modelling errors. Indeed, model predictions at the regional scale are likely to be contaminated by several different modelling errors (Donigan and Rao 1986; NRC 1993). According to Mulla and Addiscott (1999), these errors include modelling structure error, experimental data measurement error, model parametrization errors and, as mentioned above, scale transitions errors. Errors in model structure occur when the process and the assumptions represented by the model fail to represent reality. The example could be a model which simulates solute transport using the convective dispersive equation for a region in which two-region or macropore transport is significant. The scale-transition error is due to errors in extrapolation caused by spatial and temporal averaging of model parameters (Destouni 1993), or due to bias caused when the calibration site is not representative for the region (Beven 1993). Mulla and Addiscott (1999) argue that the scale-transition error is also due to the fact that processes which dominate broad patterns and trends at the larger scale may be obscured by other processes that dominate at the smaller scale. An example is the description of groundwater contamination by nitrate-nitrogen at the regional scale in terms of regional patterns in precipitation, depth to groundwater and soil texture. At local scale, variations in nitrate leaching to groundwater may more strongly be controlled by management practices such as the amount and timing of N fertiliser application than by local variations in precipitation, depth to groundwater or soil texture. To conclude, these authors affirm that the rigorous validation of models at different scales is difficult for a variety of reasons.

Conclusion

To develop effective groundwater protection programs, we need to understand how natural and anthropogenic factors determine groundwater degradation. Within this paper, we evaluate the ability of a statistical model that was designed for predicting groundwater contamination by nitrates at the scale of the African continent to predict groundwater contamination by nitrates at the scale of individual African countries. We assess the results using datasets on groundwater contamination by nitrates from Senegal, Burkina Faso and South Africa. The assessment was poor for the Senegalese and South African dataset, but good for the Burkina Faso dataset when recalibration of the continental-scale model was considered. Many of the difficulties and limitations within this validation study were data-related and resulted from, among others, lack of homogeneous nitrate data at the African scale, and uncertainties related to the explanatory variables. For example, the first limitation occurred when the model was used to predict the distribution of nitrates at the country scale. Additionally, uncertainties or bias in the reference data (coarse explanatory variables) were found to distort the performance of the model validation at the country level. This study highlights the necessity of developing a national-level political programme for the characterisation of groundwater vulnerability, firstly by constructing a good database of groundwater quality data, and secondly by building a robust vulnerability predictive model for each country. The continental-scale model does not account either for local point sources of nitrate, or for features and processes that may promote focused recharge. Therefore, the continental-scale model may not be appropriate to support local-scale decisions. To improve this poor predictive ability, future modelling efforts must address the many modelling errors related to model structure, model parametrization and, in particular, scale-transition.

Besides the validation and recalibration, we have sought to determine the relative importance of the variables that determines nitrate contamination. Our results revealed that population density, nitrogen fertiliser application rate, groundwater depth, recharge rate, land cover and rainfall are the most important variables contributing to nitrate pollution in groundwater.

In summary, our study showed that RF regression can be an effective technique for predicting nitrate contamination at the continental and country scale. Yet, caution should be paid to data quality. Better data availability and better quality data would, of course, help to make model predictions more accurate in the future. We therefore encourage national and regional agencies to strengthen the groundwater quality monitoring programs. This echoes Goal 6 of United Nations Sustainable Development Agenda, which recommends the establishment or expansion of water quality monitoring programmes at a national, regional and global scale.

Notes

Acknowledgments

This study was carried out within the framework of a doctoral research programme, and has been supported by the Islamic Development Bank (IDB) under the Merit Scholarship Programme (MSP) for theses and the ‘Fonds Spécial de Recherche’ (FSR) of the Université Catholique de Louvain. Several people from across the world helped with data acquisition, namely T. Gleeson (McGill University), N. Moosdorf (Hamburg University), and M. Cissé (DGPRE/Senegal).

References

  1. Aljazzar, T. H., (2010). Adjustment of DRASTIC Vulnerability Index to Assess Groundwater Vulnerability for Nitrate Pollution Using the Advection-Diffusion Cell. Von der Fakultät für Georessourcen und Materialtechnik der Rheinisch-Westfälischen Technischen Hochschule Aachen Ph.D. thesis. 146pp.Google Scholar
  2. Ateawung, J. N. (2010). A GIS based water balance study of Africa. Master of physical land resources, Universiteit Gent Vrije Universiteit Brussel Belgium.55ppGoogle Scholar
  3. Barrio I, Arostegui I, Quintana JM (2013) Use of generalised additive models to categorise continuous variables in clinical prediction. BMC Med Res Methodol 13(1):83.  https://dx.doi.org/10.1186%2F1471-2288-13-83.  https://doi.org/10.1186/1471-2288-13-83 Google Scholar
  4. Bartram, J. and Ballance, R. [Eds] (1996). Water quality monitoring: a practical guide to the design and implementation of freshwater quality studies and monitoring programmes. Chapman and Hall, London. http://www.who.int/water_sanitation_health/resourcesquality/waterqualmonitor.pdf (Accessed online April 25th,2017).
  5. Bauder JW, Sinclair KN, Lund RE (1993) Physiographic and land use characteristics associated with nitrate nitrogen-nitrogen in Montana groundwater. J Environ Qual 22(2):255–262.  https://doi.org/10.2134/jeq1993.00472425002200020004x Google Scholar
  6. Beven KJ (1993) Estimating transport parameters at the grid scale: on the value of a single measurement. J Hydrol 143(1-2):109–123.  https://doi.org/10.1016/0022-1694(93)90091-M Google Scholar
  7. Böhlke JK (2002) Groundwater recharge and agricultural contamination. Hydrogeol J 10(1):153–179.  https://doi.org/10.1007/s10040-001-0183-3 Google Scholar
  8. Booker DJ, Snelder TH (2012) Comparing methods for estimating flow duration curves at ungauged sites. J Hydrol 434:78–94.  https://doi.org/10.1016/j.jhydrol.2012.02.031 Google Scholar
  9. Boy-Roura, M. (2013). Nitrate groundwater pollution and aquifer vulnerability: the case of the Osana region. PhD thesis. Universitat de Girona. 143ppGoogle Scholar
  10. Boy-Roura M, Nolan BT, Menció A, Mas-Pla J (2013) Regression model for aquifer vulnerability assessment of nitrate pollution in the Osona region (NE Spain). J Hydrol 505:150–162.  https://doi.org/10.1016/j.jhydrol.2013.09.048 Google Scholar
  11. Breiman L (2001b) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231. https://projecteuclid.org/euclid.ss/1009213726.  https://doi.org/10.1214/ss/1009213726 Google Scholar
  12. Breiman, L., (2001a). Random forests. Mach. Learn. 45, 5–32. Doi:  https://doi.org/10.1023/A:1010933404324. (https://link.springer.com/content/pdf/10.1023%2FA%3A1010933404324.pdf. Acccesed online June, 21st 2016).
  13. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group, Belmont, CaliforniaGoogle Scholar
  14. Chapman, D. (1996). Water quality assessments—a guide to use of biota, sediments, and water in environmental monitoring—second edition. 1996, 651 pages published on behalf of WHO by F & FN Spon. http://www.who.int/water_sanitation_health/resourcesquality/watqualassess.pdf. (accessed online March18th 2017).
  15. Charrière S, Aumond C (2016) Managing the drinking water catchment areas: the French agricultural cooperatives feed back. Environ Sci Pollut Res 23(11):11379–11385.  https://doi.org/10.1007/s11356-016-6639-8 Google Scholar
  16. Constant T, Charrière S, Lioeddine A, Emsellem Y (2016) Use of modeling to protect, plan, and manage water resources in catchment areas. Environ Sci Pollut Res 23(16):15841–15851.  https://doi.org/10.1007/s11356-015-5459-6 Google Scholar
  17. Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792.  https://doi.org/10.1890/07-0539.1 Google Scholar
  18. Davis DB, Sylvester-Bradley R (1995) The contribution of fertiliser nitrogen to leachable nitrogen in the UK: a review. J Sci Food Agric 68(4):399–406.  https://doi.org/10.1002/jsfa.2740680402 Google Scholar
  19. De’ath G (2002) Multivariate regression trees: a new technique for modeling species–environment relationships. Ecology 83(4):1105–1117.  https://doi.org/10.2307/3071917. Stable URL http://www.jstor.org/stable/3071917 Google Scholar
  20. De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81(11):3178–3192.  https://doi.org/10.1890/0012-9658(2000)081 [3178:CARTAP]2.0.CO;2 Google Scholar
  21. Destouni G (1993) Stochastic modelling of solute flux in the unsaturated zone at the field scale. J Hydrol 143(1–2):45–61.  https://doi.org/10.1016/0022-1694(93)90088-Q Google Scholar
  22. Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC bioinformatics 7(1):3.  https://doi.org/10.1186/1471-2105-7-3 Google Scholar
  23. Donigan, A.S., Jr., and Rao, P.S.C. (1986). Examples models testing studies in vadose zone modelling of organic pollutants. Edited by S.C. Hem and S.LM Melancon, PP103–131, Lewis Publ., Chelsea, MI.Google Scholar
  24. Dupas R, Curie F, Gascuel-Odoux C, Moatar F, Delmas M, Parnaudeau V, Durand P (2013) Assessing N emissions in surface water at the national level: comparison of country-wide vs. regionalized models. Sci Total Environ 443:152–162.  https://doi.org/10.1016/j.scitotenv.2012.10.011 Google Scholar
  25. El-Sadek, A. A. M. (2002). Engineering approach to water quantity and quality modelling at field and catchment scale. Ph.D. thesis. Katholieke Universiteit Leuven.251pp.Google Scholar
  26. Evans JS, Murphy MA, Holden ZA, Cushman SA (2011) Modelling species distribution and change using the random forest. In: Drew CA, Wiersma YF, Huettmann F (eds) Predictive species and habitat modeling in landscape ecology. Springer, New York, pp 139–159.  https://doi.org/10.1007/978-1-4419-7390-0_8 Google Scholar
  27. Fekete A, Damm M, Birkmann J (2010) Scales as a challenge for vulnerability assessment. Nat Hazards 55(3):729–747.  https://doi.org/10.1007/s11069-009-9445-5 Google Scholar
  28. Foster SSD (2000) Assessing and controlling the impacts of agriculture on groundwater—from barley barons to beef bans. Q J Eng Geol Hydrogeol 33(4):263–280.  https://doi.org/10.1144/qjegh.33.4.263 Google Scholar
  29. Foster, S.; Garduño,H., Kemper, L., Tuinhof, A., Nanni, M., Dumars, C. (2003). Groundwater quality protection defining strategy and setting priorities. Briefing note 8.6pp. http://documents.worldbank.org/curated/en/434861468166483398/pdf/301000PAPER0BN8.pdf. Accessed online march 6th, 2017).
  30. Gemitzi A, Petalas C, Pisinaras V, Tsihrintzis A (2009) Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: an application to south Rhodope aquifer (Thrace, Greece). Hydrol Process 23(3):372–383.  https://doi.org/10.1002/hyp.7143 Google Scholar
  31. Grömping U (2009) Variable importance assessment in regression: linear regression versus random Forest. Am Stat 63(4):308–319.  https://doi.org/10.1198/tast.2009.08199 Google Scholar
  32. Gross, E. L. (2008). Ground water susceptibility to elevated nitrate concentrations in South Middleton Township, Cumberland County, Pennsylvania. Master of Science. Shippensburg University. 117pp. http://www.ship.edu/uploadedfiles/ship/geo-ess/graduate/theses/gross_thesis_080505.pdf; accessed online July 6th, 2015).
  33. Gubler S, Fiddes J, Keller M, Gruber S (2011) Scale-dependent measurement and analysis of ground surface temperature variability in alpine terrain. Cryosphere 5(2):431–443.  https://doi.org/10.5194/tc-5-431-2011 Google Scholar
  34. Gurdak JJ, Qi SL (2012) Vulnerability of recently recharged groundwater in principal [corrected] aquifers of the United States to nitrate contamination. Environ Sci Technol 46(11):6004–6012.  https://doi.org/10.1021/es300688b Google Scholar
  35. Gurdak JJ, Geyer GE, Nanus L, Taniguchi M, Corona CR (2016) Scale dependence of controls on groundwater vulnerability in the water–energy–food nexus. California Coastal Basin aquifer system Journal of Hydrology: Regional Studies 11:126–138.  https://doi.org/10.1016/j.ejrh.2016.01.002 Google Scholar
  36. Gurdak JJ (2014) Groundwater vulnerability handbook of engineering hydrology. CRC Press, Taylor & Francis Group 2014:33Google Scholar
  37. Haller, L., McCarthy, P., O'Brien, T., Riehle, J. and Stuhldreher, T. (2013). Nitrate pollution of groundwater. 2014: alpha water systems INC.Google Scholar
  38. Hamza M, Larocque D (2005) An empirical comparison of ensemble methods based on classification trees. J Statist Comput Simulat 75(8):629–643.  https://doi.org/10.1080/00949650410001729472 Google Scholar
  39. Hartmann J, Moosdorf N (2012) The new global lithological map database GLiM: a representation of rock properties at the earth surface. Geochem Geophys Geosyst 13(12):Q12004.  https://doi.org/10.1029/2012GC004370 Google Scholar
  40. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer. isbn:0-387-95284-5Google Scholar
  41. Heidema AG, Boer JMA, Nagelkerke N, Mariman ECM, van der, A.D.L., Feskens, E.J.M. (2006) The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet 7(1):23.  https://doi.org/10.1186/1471-2156-7-23
  42. Heuvelink GBM, Pebesma EJ (1999) Spatial aggregation and soil process modelling. Geoderma 89: 47–65.  https://doi.org/10.1016/S0016-7061(98)00077-9
  43. Jones MJ (1985) The weathered zone aquifers of the basement complex areas of Africa. Q J Eng Geol Hydrogeol 18:35–46.  https://doi.org/10.1144/GSL.QJEG.1985.018.01.06 Google Scholar
  44. Jung YY, Koh DC, Park WB, Ha K (2016) Evaluation of multiple regression models using spatial variables to predict nitrate concentrations in volcanic aquifers. Hydrol Process 30(5):663–675.  https://doi.org/10.1002/hyp.10633 Google Scholar
  45. Knudby A, Brenning A, LeDrew E (2010) New approaches to modelling fish-habitat relationships. Ecol Model 221(3):503–511.  https://doi.org/10.1016/j.ecolmodel.2009.11.008 Google Scholar
  46. Kulabako N, Nalubega M, Thunvik R (2007) Study of the impact of land use and hydrogeological settings on the shallow groundwater quality in a peri-urban area of Kampala, Uganda. Sci Total Environ 381(1):180–199.  https://doi.org/10.1016/j.scitotenv.2007.03.035 Google Scholar
  47. Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range shifts: model differences and model reliability. Glob Change Biol 12(8):1568–1584.  https://doi.org/10.1111/j.1365-2486.2006.01191.x Google Scholar
  48. Li X, Zhai T, Jiao Y, Wang G (2015) Using Bayesian hierarchical models and random forest algorithm for habitat use studies: a case of nest site selection of the crested ibis at regional scales. PeerJ PrePrints 3:e871v1.  https://doi.org/10.7287/peerj.preprints.871v1
  49. Liaw, A., Wiener, M., (2002). Classification and regression by random forest. Vol. 2/3, December 2002. http://www.bios.unc.edu/~dzeng/BIOS740/randomforest.pdf (accessed online April, 16th 2017).
  50. MacDonald, A. (2010). Groundwater, health, and livelihoods in Africa. British Geological Survey © NERC 2010 Earthwise 26, 2pp. ORAL PRESENTATION. http://nora.nerc.ac.uk/17329/1/29-30%5B1%5D.pdf (Accessed online January 28th 2016).
  51. MacDonald AM, Bonsor HC, Dochartaigh BÉÓ, Taylor RG (2012) Quantitative maps of groundwater resources in Africa. Environ Res Lett 7(2):024009.  https://doi.org/10.1088/1748-9326/7/2/024009 Google Scholar
  52. MacDonald, A., M., R. Taylor, G., and H. Bonsor, C. (2013). (Eds.) Groundwater in Africa—is there sufficient water to support the intensification of agriculture from “Land Grabs”." Hand book of land and water grabs in Africa. pp 376–383Google Scholar
  53. MacDonald A, Davies J, Calow R (2008) African hydrogeology and rural water supply, Applied groundwater studies in Africa. IAH selected papers on hydrogeology, volume 13 (ed. by S. M. A. Adelana & a. M. MacDonald). CRC Press/Balkema, Leiden, The NetherlandsGoogle Scholar
  54. MacDonald AM, Davies J (2000) A brief review of groundwater for rural water supply in sub-Saharan Africa, British Geological Survey, technical report WC/00/33. Overseas Geology Series, BGS, Nottingham, UKGoogle Scholar
  55. Margat, J. (2010). Ressources et utilisation des eaux souterraines en Afrique. Managing Shared Aquifer Resources in Africa, Third International Conférence Tripoli 25–27 may 2008. International Hydrological Programme, Division of Water Sciences, IHP-VII Series on groundwater No.1, UNESCO, pp 26–34Google Scholar
  56. Mfumu KA, Ndembo LJ, Vanclooster M (2016) Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo. Hydrogeol J 24(2):425–437.  https://doi.org/10.1007/s10040-015-1337-z Google Scholar
  57. Mulla DJ, Addiscott TM (1999) Validation approaches for field-, basin-, and regional-scale water quality models. Assessment of non-point source pollution in the vadose zone:63–78.  https://doi.org/10.1029/GM108p0063
  58. National Research Council (NRC), (1993). Ground water vulnerability assessment: Predictive relative contamination potential under conditions of uncertainty. National Academy Press, Washington D.C., pp. 224. ISBN: 978–0–309-04799-9Google Scholar
  59. Nolan BT, Hitt KJ (2006) Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environmental Science & Technology 40(24):7834–7840.  https://doi.org/10.1021/es060911u Google Scholar
  60. Nolan BT, Gronberg JM, Faunt CC, Eberts SM, Belitz K (2014) Modeling nitrate at domestic and public-supply well depths in the Central Valley, California. Environmental science & technology 48(10):5643–5651.  https://doi.org/10.1021/es405452q Google Scholar
  61. Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JM (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random Forest. For Ecol Manag 275:117–129.  https://doi.org/10.1016/j.foreco.2012.03.003 Google Scholar
  62. Ouedraogo I, Vanclooster M (2016a) A meta-analysis and statistical modelling of nitrates in groundwater at the African scale. Hydrology and Earth System Sciences, Vol 20, no6 20(6):2353–2381.  https://doi.org/10.5194/hess-20-2353-2016. Google Scholar
  63. Ouedraogo I, Vanclooster M (2016b) Shallow groundwater poses pollution problem for Africa. In: SciDev.Net, p 4. http://hdl.handle.net/2078.1/169630 Google Scholar
  64. Ouedraogo, I., Defourny, P., and Vanclooster, M.(2016a). Modeling groundwater nitrate concentrations at the African scale using random forest regression techniques. Accepted April 24th to review in the special issue on groundwater in sub-Saharan Africa for Hydrogeological Journal (HJ) (in progress, book expected in December 2017).Google Scholar
  65. Ouedraogo I, Defourny P, Vanclooster M (2016b) Mapping the groundwater vulnerability for pollution at the pan-African scale. Sci Total Environ 544:939–953.  https://doi.org/10.1016/j.scitotenv.2015.11.135 Google Scholar
  66. Pearson S (2015) Identifying groundwater vulnerability from nitrate contamination: comparison of the DRASTIC model and environment Canterbury’s method. Lincoln University, Degree of Master of Applied Science (Environmental Management), 58 ppGoogle Scholar
  67. Postnote (2011). Water Adaptation in Africa. Number 373 April 2011. http://www.parliament.uk/documents/post/postpn_373-Water-Adapatation-in-Africa.pdf (Accessed online January 26th, 2016)
  68. Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems (N.Y.), 9(2): 181–199.  https://doi.org/10.1007/s10021-005-0054-1
  69. Puckett LJ, Tesoriero AJ, Dubrovsky NM (2011) Nitrogen contamination of surficial aquifers-a growing legacy. Environ Sci Technol 45(3):839–844.  https://doi.org/10.1021/es1038358 Google Scholar
  70. Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis, a research tool, springer, 658p.  https://doi.org/10.1007/b98890 Google Scholar
  71. Refsgaard JC, Thorsen M, Jensen JB, Kleeschulte S, Hansen S (1999) Large scale modelling of groundwater contamination from nitrate leaching. J Hydrol 221(3):117–140.  https://doi.org/10.1016/S0022-1694(99)00081-5
  72. Refsgaard, J.C., and Butts, M.B. (1999). Determination of grid scale parameters in catchment modelling by upscaling local scale parameters. Proceeding of the Int. workshop on modelling transport process in soils. EurAEng’s IG on soil and water, Leuven, Belgium, 24-26 Nov., 650-665Google Scholar
  73. Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (southern Spain). Sci Total Environ 476-477:189–206.  https://doi.org/10.1016/j.scitotenv.2014.01.001 Google Scholar
  74. Royal Society of Chemistry (RSC) (2010) Africa’s water quality. http://www.rsc.org/images/RSC_PACN_Water_Report_tcm18-176914.pdf Last accessed August 2016
  75. Schwarz GE, Richard BA, Smith RA, Preston SD (2011) The regionalization of National-Scale SPARROW models for stream nutrients. Journal of the American Water Resources Association (JAWRA) 47(5):1151–1172.  https://doi.org/10.1111/j.1752-1688.2011.00581.x Google Scholar
  76. Shamsudduha M, Taylor RG, Chandler RE (2015) A generalized regression model of arsenic variations in the shallow groundwater of Bangladesh. Water Resour Res 51(1):685–703.  https://doi.org/10.1002/2013WR01457 Google Scholar
  77. Sharaky, A. M. (2016). Geology and water resources in Africa. Institute of African Research and Studies. The university of Cairo. http://scholar.cu.edu.eg/sharaky/files/notes.pdf. 40pp (accessed online 19th August 2016)
  78. Spalding RF, Exner ME (1993) Occurrence of nitrate in groundwater- a review. J Environ Qual 22(392–402).  https://doi.org/10.2134/jeq1993.00472425002200030002x
  79. Strebel, O., Duynisveld, W. H. M., and Böttcher, J. (1989). Nitrate pollution of groundwater in Western Europe, Agric. Ecosyst. Environ. 26, 189–214.  doi.org/10.1016/0167-8809(89)90013-3
  80. Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC bioinformatics 8(1):25.  https://doi.org/10.1186/1471-2105-8-25 Google Scholar
  81. UNEP (United Nations Environment Programme). (2010). Africa Water Atlas. Nairobi, UNEP, Division of Early Warning and Assessment (DEWA). http://na.unep.net/atlas/ africaWater/book.php.
  82. UNEP/DEWA, (2014). Sanitation and Groundwater Protection –a UNEP Perspective UNEP/DEWA, http://www.bgr.bund.de/EN/Themen/Wasser/Veranstaltungen/symp_sanitat-gwprotect/present_mmayi_pdf.pdf?__blob=publicationFile&v=2. 18pp (Accessed online August 14th 2014).
  83. Wakida FT, Lerner DN (2005) Non-agricultural sources of groundwater nitrate: a review and case study. Water Res 39(1):3–16.  https://doi.org/10.1016/j.watres.2004.07.026 Google Scholar
  84. Ward MH, deKok TM, Levallois P, Brender J, Gulis G, Nolan BT, VanDerslice J (2005) Workgroup report: drinking-water nitrate and health—recent findings and research needs. Environ Health Perspect 113(11):1607–1614. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310926
  85. Wheeler DC, Nolan BT, Flory AR, DellaValle CT, Ward MH (2015) Modeling groundwater nitrate concentrations in private wells in Iowa. Sci Total Environ 536:481–488.  https://doi.org/10.1016/j.scitotenv.2015.07.080 Google Scholar
  86. WHO (1992). GEMS/WATER Operational Guide. Third edition. World Health Organization, Geneva. 121pp. http://apps.mwho.int/iris/bitstream/10665/62446/1/GEMS_W_92.1_(part1).pdf. (Accessed online March 18th 2017)
  87. Xu Y, Usher B (2006) Groundwater pollution in Africa. Taylor&Francis/Balkema, The Netherlands, 353pp.  https://doi.org/10.1201/9780203963548 Google Scholar
  88. Yee TW, Mitchell ND (1991) Generalized additive models in plant ecology. Journal of vegetation science, 2(5), 587-602. ISO 690.  https://doi.org/10.2307/3236170
  89. Zhao C, Liu C, Xia J, Zhang Y, Yu Q, Eamus D (2012) Recognition of key regions for restoration of phytoplankton communities in the Huai River basin, China. J Hydrol 420:292–300.  https://doi.org/10.1016/j.jhydrol.2011.12.016

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Issoufou Ouedraogo
    • 1
    Email author
  • Pierre Defourny
    • 1
  • Marnik Vanclooster
    • 1
  1. 1.Earth and Life InstituteUniversité catholique de LouvainLouvain-la-NeuveBelgium

Personalised recommendations