1 Introduction

Worldwide, sedimentary aquifers play an important role in drinking water supply. For this reason, aquifer architecture and hydraulic properties of aquifer sediments have since long been, and are still being, subject of interest of many geohydrological studies (e.g. Biteman et al. 2004; Eaton 2006). Recent advances in affordable geophysical techniques for regional surveying are expected to greatly aid in delineating the physical characteristics of aquifers (see, e.g. Bersezio et al. 2007; Chalikakis et al. 2008).

Because of the vulnerability of especially phreatic groundwater resources to anthropogenic pollution, geochemical aspects of groundwater resources have also received increasing attention over the past decades. These studies predominantly focus on water chemistry and include topics such as groundwater (redox) zonation and gradients; degree and scale of, and controls on water quality variability and spatial distribution; source apportionment; (bio)geochemical controls on contaminant degradation; reactive transport modelling; aquifer vulnerability mapping; and the determination of background/threshold concentrations (for recent papers, see, e.g. Vissers 2005; Park et al. 2006; Hinkle et al. 2007; Robins et al. 2007; Báez-Cazull et al. 2008, Griffioen et al. 2008; Hinsby et al. 2008; McMahon and Chapelle 2007, Sochaczewski et al. 2008; Spiteri et al. 2008).

The reactivity of the subsurface, such as its sorption and degradation capacity, is key to reactive transport processes and hence to all topics mentioned above. The EU Soil Thematic Strategy (STS; COM 2006) recognises as one of the principal soil functions the storage, filtering, and transformation of substances, including water, carbon, nutrients and pollutants. This function is of course not restricted to the topsoil or vadose zone only, but continues in subsoil sediments and even consolidated aquifers. In a completely natural situation, a quasi-steady state develops between the composition and reactivity of the subsurface and the infiltrating water, determined by both the chemical signature of the infiltrating water and the reactive compounds of the aquifer (Zhu and Burden 2001; Vissers 2005). Under the transient conditions of anthropogenic disturbance, the soil and sediment reactivity may mitigate the disturbance and protect downflow groundwater and surface water bodies, but less desirable side-effects, such as increase of water hardness or mobilisation of trace metals, may also occur (Larsen and Postma 1997; van Helvoort et al. 2007; Visser et al. 2009). Under transient conditions, knowledge of the subsurface reactivity is required to allow prediction of groundwater quality and ensuing adequate management. As climate change is also expected to influence soil–water exchange reactions, data on previous steady-state conditions will be less predictive of the future development, making collection of sediment data the more pressing. Recent examples from the Netherlands of prognostic modelling studies at regional or national scale in which information on the reactivity of the subsurface sediment is needed as input are Tiktak et al. 2006; van den Brink et al. 2007, van der Grift and Griffioen 2008.

Whereas strategies for soil surveying commonly include some aspects of soil geochemistry also (e.g. organic matter, nutrient content, priority metals and organic contaminants), this is not the case for subsoil sediments. Analyses of geochemical parameters that characterise aquifer reactivity is laborious and expensive, the more so because of sediment heterogeneity. Such analyses are therefore not routinely performed and effective reactivity of the aquifer sediment is deduced in many cases from the observed patterns in groundwater chemistry only. In some studies, qualitative analysis of the aquifer sediments is performed to corroborate inferences from groundwater analyses and modelling (e.g. Lee et al. 2007). Quantitative studies of sediment reactivity are found mainly in connection with local contaminant plumes, with some recent studies addressing the problem of As mobilisation in drinking water aquifers (Swartz et al. 2004; Shamsudduha et al. 2008). Over the last years, our group already explored possibilities for a more regional characterization of aquifer sediment reactivity (van Helvoort 2003; Hartog et al. 2004, 2005; van Helvoort et al. 2007). This showed that a facies-based approach, as advocated by Allen-King (1998) and following advances in hydraulic characterization (Bierkens 1996; Bierkens and Weerts 1994), indeed offers the possibility to tackle the problem of aquifer sediment heterogeneity.

With the realisation of a revised and integrated stratigraphic framework for both the onshore and offshore parts of the Dutch territory (Weerts et al. 2003, 2005; Rijsdijk et al. 2005) and the Digital Geological Model (DGM) and revised 3D hydrogeological model REgional Groundwater Information System (REGIS) of the Netherlands (Vernes and van Doorn 2006), the challenge arose for a nationwide, regional scale, quantitative characterization of sediment geochemistry, down to a depth of about 30 m below surface level, with focus on reactive properties. The intention is to finalise this systematic nationwide survey by the year 2020. In this paper, we address the issues of relevant sediment properties, adequate analytical methods, borehole location selection, detail of stratification and required sample size, to develop a protocol for efficient characterization of subsurface reactivity on a regional scale.

Statistically based sampling strategies for soil surveying have been extensively studied and described (e.g. de Gruijter et al. 2006). In case of aquifer sediments, the need of deep boreholes down to 30 m or more precludes a completely 3D-random approach for sampling. Somehow, a priori knowledge on aquifer architecture should be utilised for guidance. A facies-based approach as point of departure should aid in addressing sediment heterogeneity. For this, we need to determine the required degree of detail in stratification and assess variability within strata to determine the adequate number of samples. The latter will also depend on the statistical distribution of measured values within the strata and the precision needed for the water quality problem addressed. With respect to analysis, we need to identify the basic set of compositional properties that adequately describes the relevant aspects of subsurface reactivity from an environmental and societal point of view. The analytical techniques chosen need to be robust and have adequate long-term precision, because regional characterization inevitably invokes a long term measurement programme. At the same time, low detection limits may be required in case small contents already determine effective reactivity.

The protocol is worked out based on the situation in the Netherlands with respect to both its geology and its geo-information infrastructure. The relevance of the protocol and its underpinning for similar sedimentary regions will be discussed.

2 Background information

2.1 Geology

The Netherlands (land area about 34,000 km2) are located in the northwestern part of Europe (Fig. 1). Geologically, the area is part of the subsiding North Sea Basin, which is enclosed by the Brabant Massif in the South and the Rhenisch Massif in the East. A thick layer of unconsolidated sediments was deposited during the Cenozoic; most of the sedimentary formations that currently surface are of Quaternary age (De Mulder et al. 2003; Busschers et al. 2005). An overview of the Quaternary formations with depositional environment and approximate age is given in Fig. 2.

Fig. 1
figure 1

Left overview of northwestern Europe showing the Netherlands (darker grey) and surrounding countries (base map data from www.esri.com). Right map of the Netherlands with main lithological classification of surfacing sediments (NITG-TNO)

Fig. 2
figure 2

Overview of relative age and depositional environment of the Quaternary Formations in the Netherlands. The legend gives the predominant lithologies for each Formation

The base of the Pleistocene sequence is formed by marine sands and local clay layers containing ubiquitous shell fragments (Maassluis Formation), that were deposited when most of the area was still covered by a shallow sea. On its retreat, the rivers Rhine (Waalre, Sterksel and Urk Formations), Meuse (Beegden Formation), and the north German river system (Peize and Appelscha Formations) develop a large delta, mainly composed of fine and coarse sands, and later on also gravel during Early and Middle Pleistocene. The predominantly non-calcareous sediment supply from the north-German system comes to a halt in the Elsterian, when the northern part of what are now the Netherlands is covered with ice. In this period, the heterogeneous Peelo Formation was deposited, consisting of amongst others heavy, glaciolacustrine clays in subglacial erosion gullies. The southern fluviatile deposits up to this age may have been locally reworked and sorted by water and wind (Stramproy Formations). During the Saalian, the ice also reaches the central parts (glacial deposits of the Drenthe Formation) and most of the ice pushed ridges are formed. Rhine and Meuse are diverted to the west, still depositing coarse sands until the end of the Pleistocene (Kreftenheye Formation). In the warm period that follows the Saalian, marine sediments are again deposited in the southwestern, northern and central part of the Netherlands (Eem Formation). At its edges, thick peat layers develop (Woudenberg Formation). During the last (Weichselian) ice age, the ice sheet does not reach the Netherlands. Windblown sands and loess, and local fluvial and lacustrine deposits are formed (continuation of the Boxtel Formation that was deposited since the end of Middle Pleistocene).

The retreat of the ice and consequent sea level rise mark the beginning of the Holocene. The Rhine–Meuse system becomes meandering/anastomosing, delivering on average finer sediments than before (Echteld Formation). With continued sea level rise, large tidal basins and lagoons are formed at the present coastal margin, where (peri)marine calcareous sands and silty clays are deposited (Naaldwijk Formation). Sedimentation by Rhine and Meuse retreats to the central and south-eastern area. Peat develops from the margins of the intertidal area towards the centre and north (Nieuwkoop Formation). A new transgression erodes the peat, covering it with marine clays and fine grained Rhine–Meuse sediments. From this period onwards, human influence comes to dominate, in the form of endikements, drainage, peat excavation, etc.

2.2 Mineralogical composition of Pleistocene and Holocene sediments

Mineralogical studies of the subsurface sediments of the Netherlands have been focussed on the heavy mineral fraction of the sand deposits for provenance studies, and on the clay fraction of the clay deposits. When subdivided into grains size fractions, the mineralogical composition of the various sediment types show broad similarities (Breeuwsma 1990).

The Quaternary sands are of course dominated by quartz grains, the Rhine–Meuse also contain abundant lithic fragments of sandstone and shale. Feldspar (orthoclase with lesser albite and virtually no plagioclase) is the next important mineral. Muscovite is the common mica, biotite is not always present (van Baren 1934). Glauconite can be present in amongst others the Waalre Formation and may also occur in the local eolian sediments derived from it. The heavy mineral fraction (less than 0.5%) in the Rhine–Meuse sands is characterised by an unstable association of garnets, epidote, hornblende, and augite; the typical stable heavy mineral suite of the eastern sands comprises tourmaline, andalusite, kyanite, sillimanite and staurolite. The main secondary minerals are calcite and siderite, and iron and aluminium oxides. Calcite is mostly biogenic and the highest contents are found in the marine sands. The top of the eolian sands and especially the ice pushed ridges have become completely decalcified. Siderite may occur in brook deposits, where also vivianite and high contents of iron oxides may be found. Gibbsite (up to 1%) is the usual form of non-silica bound aluminium. Pyrite frequently occurs in reduced sediments, where highest contents are encountered in peat or organic-rich layer, under marine influence.

The Holocene marine clays have about equal amounts of illite and smectite, in addition to minor amounts of kaolinite and chlorite. The riverine clays also contain vermiculite and slightly less smectite, the more heavy clays are usually non-calcareous. The proportion of free to silica-bound iron is somewhat higher in the fluviatile clays than in the marine clays. Conversely, gibbsite is slightly more abundant in the marine clays (Breeuwsma 1990).

2.3 Societal perspective

Traditional use of the sedimentary subsurface in the Netherlands (other than mining) primarily focuses on the upper soil layer that is important for agriculture and above ground infrastructure. Utilisation of the deeper soil/sediment mainly has been for groundwater extraction and driving pile foundations through weak soil. Increasing population and needs of modern society have greatly intensified our interference with the subsurface. While providing economic benefits, underground activities such as subsurface energy storage or CO2 storage should help better preserve our living environment. Plans and implementation of underground infrastructure have highlighted the importance of preservation of archaeological heritage (Malta Convention; EC 1992). The role of sedimentary aquifers and aquitards in transporting and filtering water to receiving brooks, rivers and wells no longer goes without saying, as evidenced in Europe by amongst others the Nitrate Directive (EEC 1991), the Water Framework Directive and related Groundwater Directive (EC 2000; 2006). Point source pollution still poses problems. While the inventory of contaminated sites is reaching completion and various remediation technologies have been developed and demonstrated (Spira et al. 2006; Hafker 2007; Simon 2008), especially the larger scale contamination of groundwater poses a threat of ongoing spreading of organic contaminants. Attention has also shifted towards diffuse pollution by agriculture, as a main source of excess nutrients (N,P), metals like Cd, Cu and Zn, herbicides and pesticides, etc. A derived problem is subsurface Fe-sulphide oxidation (in particular of pyrite), either as a result of the input of nitrate as an oxidant or as an effect of intensive drainage. This may lead to acidification, increase in hardness, and/or mobilisation of metals, notably Ni (e.g. Ritsema and Groenenberg 1993; Larsen and Postma 1997; Boman et al. 2008).

Thus, the anthropogenic impacts and inputs for which reactivity of the subsurface is of relevance are (1) oxidation by drainage or infiltration of nitrate, (2) influx of organic contaminants from in particular industrial, military and agricultural activities, (3) excess nutrient inputs in the topsoil and (4) diffusive input and secondary mobilisation of trace metals. The response of the subsurface to these impacts is determined by a relatively small number of processes: (1) complexation, sorption and exchange of metal ions and phosphate, sorption of organic substances, (2) denitrification, decay of organic substances and (3) dissolution–precipitation of minerals. A regional scale chemical characterization should, therefore, focus on parameters that quantify the subsurface capacity for these processes.

3 Materials and methods

3.1 Geo-information infrastructure

All basic data on the subsurface of the Netherlands collected from boreholes are stored within the so called Data and Information on the Netherlands’ Subsurface (DINO) database, which is managed by the DINO group within the Geological Survey of the Netherlands (GSN; part of TNO, the Netherlands Organisation for Applied Scientific Research). The archive holds records on borehole descriptions and measurements, cone penetration tests, vertical electric soundings, groundwater heads, and results of borehole sampling and physical or chemical analyses (www.dinoloket.nl; Bosch 2000). The systematic borehole descriptions cover some 450,000 borings, down to a maximum depth of 4 km; the majority of data stored concerns the upper 500 m. The DGM and the REGIS 3D hydrogeological model are based on these data. They are so called stacked-layer models, in which the information is presented as top and bottom of every unit. In total, 106 units are distinguished within the uppermost 500 m, for which lithostratigraphy, geometry and hydraulic properties can be retrieved.

The geo-evolution of the North Sea Basin as described above resulted in a sequence of geological formations that differs for various parts of the Netherlands. To practically deal with this geographical variation, the area of the Netherlands has been divided into so-called geotop regions, typical sequences of geological formations in certain parts of the country. The geotop includes the upper part of the subsurface from the surface level to a depth of 15–30 m. The precise lower depth varies depending on the geological and hydrogeological structure and the geochemical characteristics of the region. From a geohydrological point of view, the geotop is the part where a dynamic relation exists between land use (including underground use), soil, upper groundwater, and surface water, and where anthropogenic influence is most noticeable. The Holocene Formations and the Boxtel Formation are always rated as belonging to the geotop. Their combined thickness is commonly less than 30 m but may be up to 80 m. In some cases, Tertiary formations are part of the geotop. The geographical stratification is based on two criteria:

  • which boundary (according to DGM and REGIS) is found at a depth between 15 and 30 m b.s.l.,

  • what depositional environment is associated with the surface formations.

In general the first criterion is leading, except in the coastal zone, where the primary distinction is between the marine and fluviatile environment. In total, 27 geotop regions are distinguished; grouped into seven major regional units (Table 1; Fig. 3). From a geohydrological perspective, four main types of geotops can be found (see Table 1):

  1. A.

    a confining top layer (e.g. in the western Holocene part of the Netherlands);

  2. B.

    a phreatic sandy aquifer (e.g. in the ice-pushed ridges);

  3. C.

    a thin (<15 m) confining layer on top of a semi-confined aquifer (e.g. the riverine area);

  4. D.

    a thin (<5 m) phreatic aquifer (eolian sands and high conductivity fluviatile sands) on top of a relatively heterogeneous layer.

Table 1 Definition of geotop regions based on the geologic boundary observed between 15 and 30 m b.s.l. and the depositional environment of the surfacing deposits
Fig. 3
figure 3

Division of the area of the Netherlands in 27 geotop regions. See Table 1 for more details

3.2 Available geochemical data for exploratory statistical analysis

As discussed in the introduction, existing quantitative studies—hence datasets—on subsurface sediment geochemistry are scarce, and this situation is no different in the Netherlands. Geochemical characteristics have not been routinely analysed as part of the boring and sampling programmes on which the DGM and REGIS model are based. However, within a number of specific projects chemical analyses have been performed. A brief inventory of the DINO dataset showed that, by December 2006, geochemical data were present from some 50 research projects, 80% of which were available in digital form. Geochemical analyses that were performed for at least 10% of the boreholes sampled included: X-ray fluorescence spectrometry (XRF, 80%), inductively coupled plasma–atomic emission spectrometry after destruction by HF or aqua regia (ICP-AES, 50%), thermo gravimetric analysis (TGA, 40%), and for less than 25% of the boreholes also ICP-mass spectrometry (ICP-MS), high-performance liquid chromatography, grain size (Malvern), atomic adsorption spectrometry, and C/S analysis (LECO).

Nearly half of the borings concerned had a maximum depth of less than 1.2 m (shallow soil borings), and borehole locations were found to be concentrated in two areas: the southern sand area, dominated by Pleistocene fluvial deposits (geotop region 4), and the fluviatile clay and connected western marine clay area, dominated by a Holocene, confining layer on top of a sequence of Pleistocene sediments (geotop regions 2b and 1a; see Table 1 and Fig. 3). These two areas, that together cover the main lithostratigraphical units, were therefore selected for exploratory statistical analyses. They will be further referred to as the Pleistocene and Holocene area, respectively. Samples from the Holocene area include 40% sands, as well as about 25% peats that were oversampled in comparison to their abundance. The majority of the Pleistocene area is indeed sand samples.

None of the samples selected had been analysed for the whole range of geochemical parameters included in the DINO database, not even for the subset obtained by the most commonly used analytical techniques listed above. Steered by data availability, and further based on the considerations regarding required information (see Section 4.1), we focused in the exploratory phase on XRF data for total contents of Al, Ca, Fe, and S, LECO C/S analysis for elemental S and organic C, and TGA for determination of organic matter and carbonate content (Table 2). From the analytical variables as listed in Table 2, the following five primary reactivity variables were calculated that were used in the statistical analysis:

Table 2 Number of samples analysed for the variables considered relevant for each of the two areas selected for exploratory statistical analysis (note that none of the samples is analysed for all variables)
Pyrite content::

\( {\hbox{Pyr}} = {{{S \times {M_{\rm{FeS2}}}}} \left/ {{\left( {{2}{M_s}} \right)}} \right.} \)

Reactive iron::

\( F{e_{react}} = \left[ {F{e_2}{O_3}--\left( {0.{225}A{l_2}{O_3} - 0.{91}\% } \right)} \right]*{{{{2}{{\hbox{M}}_{\rm{Fe}}}}} \left/ {{{{\hbox{M}}_{\rm{Fe2O3}}}}} \right.} \)

Clay content::

Clay = αAl 2 O 3  − β (α, β dependent on geographic region)

Organic matter::

OM = 2TOC or OM = TGA 550 −0.07Clay

Carbonate::

Carb = TGA 850  × MCaCO3/MCO2 or Carb = CaO − (0.0488 Al 2 O 3  − 0.1147%)

where M i is the molecular mass of compound i, TOC is total organic carbon, and TGA j is the incremental mass loss between temperature j and the previous temperature. The (γ Al 2 O 3  − δ) terms for reactive iron and carbonate are empirical relations to correct for silica-bound iron or calcium, respectively. The clay correction factor of 0.07 in the OM-TGA550 relation is also an empirically derived factor.

3.3 General approach

Statistical analysis of the existing geochemical data is combined with theoretical and practical considerations, to assess the degree of variability of subsurface reactivity, the relevance of different DGM based stratifications and the efficiency and possible redundancy of analytical parameters.

4 Results

4.1 Analytical protocol

Given the discussion above on societal relevance, the required regional scale chemical information to characterise the subsurface reactivity should include:

  • state and stability of pH and redox (relevant for mobility of trace elements, speciation of redox-sensitive elements, preservation of archaeological heritage, corrosivity to underground infrastructure);

  • reduction capacity (nitrate reduction, reductive degradation of organic contaminants, but also in response to drainage or injection of oxygenated water);

  • contribution of pyrite to reduction capacity, in view of release of potentially harmful trace metals;

  • acid buffering capacity (atmospheric input, but especially in response to pyrite oxidation);

  • retardation potential (to predict groundwater transport time scales and concentration levels of metals and organic contaminants).

Based on the knowledge on the lithology and mineralogy of the subsurface sediments, a finite number of relevant geochemical variables could be identified (Table 3). They are currently used, or are considered desirable, as input in models for reactive transport of contaminants, nutrient availability to crop or nature areas, acidification buffering, mobilisation of metals under changed hydrogeochemical regime, etc. The main focus of the Dutch effort on regional scale characterization of sediment geochemistry would be on the mineral subsoil and deeper sediment rather than the organic topsoil, which is reflected in the variables selected to be of primary importance (see Table 3). Table 4 provides an overview of the commonly available analytical techniques for these variables.

Table 3 List of geochemical variables that are commonly used as input in, or would improve, various reactivity models
Table 4 Commonly available analytical techniques for priority reactivity variables as identified in Table 3

For most of the relevant reactivity variables, more than one analytical technique is available, but some of these are quite laborious as they involve several dissolution/extraction steps before final analysis. Also, weak selective extractions are known to be highly sensitive to the exact laboratory conditions and procedures (temperature, solid to liquid ratio, extraction time, shaking frequency etc.), as well as to overall sample composition (e.g. carbonate content for non-buffered extractions). Therefore, these methods are less desirable for routine environmental analysis, both from an economic as from a quality assurance point of view. An optimal or ideal standard analytical package would thus make use of a small number of well-developed and preferably automated techniques, for which detection limits, duplicate precision (both within batches as on the longer term), and standard trueness are known and of course acceptable for the required purpose. Detection limits, precision, and trueness of a number of the GSN laboratory standard methods applied to natural soil/sediment samples were recently assessed by van der Veer (2006), within the framework of a geochemical soil survey of the Netherlands. Also based on these results, the standard package for geochemical characterization of soil and subsoil aquifer sediments was set as follows:

  • XRF, for total element content,

  • TGA, for sedimentary organic matter and carbonate content,

  • CS elemental analyser, for total S and total and organic C;

  • Laser diffraction particle size, for grain size distribution including clay content;

  • extraction by 0.43 M HNO3 of so called geo-available elements (hydroxides, carbonates, exchangeable metals; see Smith and Huyck 1999), analysis by ICP-AES/MS;

  • and for topsoil only: extraction by 0.01 M CaCl2 for parameters such as pH and DOC.

Information on pH and Eh for the deeper subsoil is considered to be sufficiently available from regional and national groundwater monitoring networks; adequate collection of predominantly anaerobic groundwater is not easily incorporated into regular drilling campaigns.

It is expected that this basic analytical package, when performed on a routine basis, can be performed for about €250 (or $315) per sediment sample. Additional analyses can of course be performed for specific purposes locally. All chemical analyses are to be executed according to the laboratory protocol in force. In the Netherlands, these will be the protocols of TNO Geological Survey of the Netherlands and, for the weak extractions, of the Chemical Biological Laboratory for Soil of the Wageningen University and Research Centre.

4.2 Boring and sampling protocol

4.2.1 Stratification

The aim of stratification in sampling design is to reduce uncertainty in the determination of relevant statistics such as mean, median, or 90 percentile (P90). The overall data variance is split up into between stratum variance and within stratum variance, with the latter intended to be much smaller than the initial overall variance. Given a fixed number of samples collected and analysed (usually based on financial and logistic constraints), using the occurrence of each of the strata distinguished as additional information may greatly improve the precision of statistical predictions. Prerequisites for adequate stratification in this case are thus:

  • The spatial distribution of the strata to be distinguished within the geotop must be known. For the case of the Netherlands, this means that the strata should be well-defined (hydro)geological/lithostratigraphical entities within the DGM or REGIS model.

  • The within-stratum variance for the reactivity properties measured should be small compared to the overall variance. If not, there is only limited gain in precision, and a false impression of differences in reactivity might be perceived that are actually statistically insignificant.

  • Each stratum distinguished should be significantly different from all other strata. Otherwise, similar strata should be grouped, as together they then could be characterised by a smaller number of samples or to a better precision. Of course, a significant difference may already be determined by only one of the reactivity properties.

Based on the first prerequisite, geotop region, geological formation and lithological class were considered for stratification. Using the available data (see Section 3.2; Table 2) the relevance of these classifiers with respect to the other prerequisites was tested in two reconnaissance studies for the Pleistocene and Holocene areas separately. Figure 4 shows an illustrative example of the type of Box–whisker plots used in the exploratory statistical analysis. Statistical significance of differences between combinations of geotop region, formation, and lithology were assessed based on the 95% two-sided non-parametric confidence interval of the median (Helsel and Hirsch 1992). The overall conclusion is that lithology provides the main differentiation in subsurface sediment reactivity but geological formations and geotop units also show significant differences in reaction capacity within a single lithology class. Apparently, primary geological differences in the lithological composition are dominant in determining differences in reaction capacity, while the time- and space-dependent specific depositional environment and—for secondary phases—diagenetic processes also make a difference. A further subdivision based on geological formation and geotop region thus aids in the geochemical characterization of aquifer sediment strata for environmental purposes.

Fig. 4
figure 4

Box–whisker plot of the pyrite content in geotop regions 1b and 2a of the Holocene area as a function of lithological class and geological formation (BX Boxtel Formation, EC Echteld Formation, KR Kreftenheye Formation, NA Naaldwijk Formation, NI Nieuwkoop Formation). The median is shown as a horizontal line, the box represents all values between the percentiles P25 and P75. Outliers (empty circle) and extremes (asterisk) are also shown. Exclamation marks indicate that the sample size for the category concerned was less than 30. N total is 1649

4.2.2 Statistics per stratum

The primary aim of the planned nationwide characterization of sediment geochemistry within the Netherlands is to provide information at a regional scale, in support of groundwater and surface water management (e.g. a Water Board management area or the infiltration area for a drinking water supply). It then depends on whether the specific question to be answered refers to (1) the total or average mass flux from, e.g. soil surface to groundwater, or groundwater to surface water, or (2) the percentage of cases, or fraction of the total area considered, where a certain critical concentration will be exceeded. In the first case, basically the average reaction capacity should be determined with adequate accuracy (combined precision and trueness). In the second case insight is needed into the entire statistical distribution, which can be translated into determining with adequate accuracy the median (P50), as well as some lower/higher percentiles (e.g. P10, P25, P75, P90). This is a multi-parameter stochastic approach, based on variability and extremes in addition to central tendency, but still on a regional scale, since no assertions are made as to where exactly a certain situation will arise.

With respect to the type of statistical distribution in relation to the nature of the reaction property to be assessed, four typical ‘endmembers’ are discerned. They will be worked out in examples below.

  1. 1

    The presence of a compound, rather than its content, determines reactivity

    For carbonaceous lithostratigraphic units, above a certain minimum carbonate content, the exact carbonate content is of little or no relevance. As long as carbonate minerals are present acid buffering capacity is guaranteed and, given relatively fast dissolution, the passing groundwater will become carbonate saturated. Analogous terms may apply for redox relevant parameters. Essential in these situations are:

    • what is the required minimum content,

    • what is the probability of (not) exceeding this minimum (% of samples above/below the threshold)

    • to what precision can this probability be estimated.

    The average reactivity in this case can be equated to the percentage of samples above the threshold times the (maximum) reactivity at this content. In the multi-parameter stochastic approach, it will probably suffice to assert that there is maximum reactivity in x% of the cases and barely any or no reactivity in 100–x% of the cases.

  2. 2

    Predominantly low reactivity determines the effective reactivity

    Sandy lithostratigraphic units generally have low reaction capacities. Large part of the measured values may be close to or below the detection limit. The frequency of occurrence of higher contents of the pertinent compound then largely defines overall reactivity, where again its exact content may be less relevant. An example is pyrite content; to remove nitrate from infiltrating groundwater through pyrite oxidation, pyrite does not have to be ubiquitous, and only some pyrite may be sufficient for complete denitrification of the passing groundwater. The disperse occurrence of pyrite in an otherwise pyrite-poor sandy matrix may thus effectuate denitrification of the entire aquifer. This implies a non-linear relation between content and effective reactivity, whereby the average content may actually give an overestimation of the effective reactivity. In a multi-parameter stochastic approach, the statistical distribution of the low content values will be highly important. Essential here are:

    • the threshold content that differentiates marked reactivity from low reactivity,

    • the probability of being above or below this threshold ant the accuracy of its estimate,

    • on the basis of this, an estimate of the effective, weighted average reactivity,

    • possibly, for the multi-parameter stochastic approach, the value and accuracy of low percentiles describing the low reactivity part of the distribution.

    Cases 2 and 1 are to a certain extent mirror images. In case 1, reactivity is the norm, and given the presence of the reactive compound reactivity is fully expressed. While the available reaction capacity will ultimately determine its depletion, it does not determine current reactivity. In case 2, we have the reverse, where lack of reactivity is the norm and effective current reactivity depends on the infrequent presence of the reactive compound.

  3. 3

    The relatively homogeneous content of the reactive compound determines reactivity

    This may occur for instance for the CEC in clayey lithostratigraphic units. In such cases, focus should be on:

    • the average (or median for the multi-parameter stochastic approach) and its accuracy (confidence interval)

    Because of the homogeneity, possible non-linearity between content and effective reactivity is not an issue here.

  4. 4

    The strongly inhomogeneous content of the reactive compound determines reactivity

    This situation exists for example for sedimentary organic matter in heterogeneous deposits. The sorption capacity for organic contaminants is governed by the average OM content and the degree of retardation with groundwater transport is directly dependent. To assess the average effect, we need to estimate:

    • the average (in case of non-linearity the weighted average) and its accuracy.

      For the multi-parameter stochastic approach, the entire statistical distribution needs to be characterised, through:

    • the median and its accuracy,

    • more extreme percentiles and their accuracy.

    Case 4 is more or less an intermediate situation between on the one hand case 3 and on the other hand case 1 or 2. Conditional upon whether there is a tendency towards either 1 or 2, focus in the multi-parameter stochastic approach will be on the high or low extreme percentiles. Case 4 implies a skewed distribution, for which accuracy of extreme percentiles is always less (wide confidence intervals) than for the median. Assuming a lognormal distribution may aid in improving accuracy of the percentile value estimates, but introduces model uncertainty in return. A further subdivision, in for example sediment facies, might aid in the spatial localization of subunits having higher reaction capacity.

4.2.3 Sample size

In the exploratory studies based on previously available data, the number of samples per statistical stratum differed widely. For the Holocene area, a deliberate oversampling on peat had been performed. For the Pleistocene area, in general sample size was larger for the more ubiquitous sandy units than for the clay, loam, or peat units. Because sands combine overall low reactivity with large variance, this did not always result in a more accurate estimation of their median contents, the more as these were often below the detection limit themselves. A sample size of 45 collected samples per stratum was explored, based on the results of the exploratory studies and general experience. This would imply a relative oversampling of the sparser lithologies in a boring. Given an average “loss” of around 10% of samples (that cannot be adequately classified, show obvious analytical error, etc.) it leaves a minimum of one sample for each 2.5% percentile step. This should be sufficient to adequately describe most types of statistical distribution.

Figure 5 shows the cumulative frequency distributions obtained with this kind of sample size for a clay lithological unit. The typical end-members are a normal type distribution, such as often found for the clay content, and a lognormal type distribution, such as frequently observed for organic matter. A normal distribution gives a near-linear relation when sorted values are plotted against a linear probability scale (see Fig. 5a), and—by definition—a linear relation when plotted against the normal probability scale (see Fig. 5b). A concave relation in Fig. 5a may imply a lognormal distribution, which gives a (near-)linear relation when the logarithms of the sorted values are plotted against cumulative frequency (see Fig. 5c and d). Distributions with outlying values, as is the case for organic matter in Fig. 5, also result in highly concave relations when plotted linearly. They are best recognised in plots against the normal probability scale; see the dotted lines in the plots of Fig. 5 that show the effect of removal of the outlying organic matter values. The clear bends in the curves for organic matter and calcite are suggestive of a bimodal statistical distribution.

Fig. 5
figure 5

Frequency distributions as obtained for the clays of the Naaldwijk Formation with n = 46. The dashed line for organic matter shows the distribution with one outlying value removed

4.2.4 Protocol

Boring campaigns will be carried out in two rounds. Upon sampling and analysis of the first series of borings, a first interpretation is to be made. Based on these results, it is decided for which lithostratigraphical (sub)units additional samples need to be collected to obtain a sufficiently accurate characterization of the reactivity of the aquifer sediments.

The first level of stratification is based on a division into geographical units with more or less homogeneous geohydrological build-up of the subsurface (the geotop regions in the Netherlands). Preferably, the planned boring locations are geographically spread within these units. Classes or strata within the geographical units are defined as combinations of lithology and lithostratigraphical unit (formation, member, or layer). The choice for the latter is based on a priori geological information (in the Netherlands included in the DGM and REGIS models) and resulting expectations for geochemical contrast.

Borings are executed and described according to the boring protocol in force (see Bosch 2000 for the GSN boring protocol). The type of equipment used should allow collection of undisturbed samples down to the depth required for the characterization study. As a rule of thumb, two samples per metre will be collected down to a depth of 5 m below surface level, and one sample per metre below this depth. However, sparse strata are oversampled (and large units undersampled) to obtain a sufficiently accurate characterization of their reactivity. Aim is to obtain a minimum of 45 samples per stratum, if it occupies at least 1% of sediment volume within the subsurface of the geotop region at hand.

Samples are then analysed according to the analytical protocol agreed. Sample classification (especially lithology class, but potentially also lithostratigraphy) is adjusted when necessary, based on the combined results of the laboratory analyses and the original borehole descriptions. The required statistics (mean, median, other percentiles of the reactivity variables) and their confidence interval are calculated for each stratum distinguished. Confidence intervals are compared with desired accuracy. Additional sampling and analyses, from a second series of borings, is performed for:

  • strata for which for some reason (e.g. too many adjusted classifications) less than 45 samples were obtained,

  • inhomogeneous strata for which the accuracy (for the statistics calculated) is less than required,

  • inhomogeneous strata for which it is concluded that a further subdivision offers adequate differentiation in sediment reactivity and is also practically possible (i.e. the spatial location of the subunits is known through a subsurface model such as DGM or REGIS).

In the second case, a multiple of the initial 45 samples may be needed depending on the required increase in accuracy, while in the last case the number of samples is multiplied by the required number of subunits.

Final interpretation is based on the combined results of the first and second round of sampling and analysis.

5 Discussion and conclusions

The case of the Netherlands shows that, with state-of-the-art techniques for aquifer architecture characterization and routine chemical analysis, assessment of subsurface reactivity on a regional scale has become feasible. Because a priori stratification is necessary, an adequate model of the subsurface geology needs to be available. The Netherlands in this respect are probably on the well-informed end of the spectrum. Lithological stratification proved essential, information on which is the least difficult to obtain, but lithostratigraphical and regional variability could not be ignored in our case. Due to the relative complexity of the geology of the Netherlands, with a west–east gradient of marine influence and north–south division with regard to glacial impact, the number of regional strata is relatively large. Lithostratigraphy as the only additional stratifier might suffice in other sedimentary basins.

A concern from the exploratory studies was the geographical “clotting” of borehole locations. The non-random, selective location of the boreholes might not have grasped lateral gradients in sediment composition, reflecting for example coastal proximity or distance from the main river channel. For subregions 4b and 4d1; however, a comparison could be made with results of a first pilot study where, using the protocol developed, a more deliberate choice for the spread of locations was made. In most cases, there was good correspondence between the frequency distributions of the old exploratory dataset and of the new data from the pilot study. Most of the few differences could be explained by the presence (old data) versus absence (new data) of detection limit values, or a better discrimination between lithological classes for the new dataset. The comparison thus verified that a division in regions with more or less uniform hydrogeological build-up of the subsurface adequately accounts for spatial diversity.

Given sufficiently detailed stratification, a sample size of 45 per stratum was found to adequately describe the type of statistical distribution. This implies that in most instances the required type of statistic (mean or percentile) would be determined with sufficient accuracy. The results of the additional pilot study indicated that a smaller sample size would not suffice for most units. To halve confidence intervals, sample size needs to be quadrupled (√n-relation). This would mean 180–200 samples per stratum for a twofold better estimate. Additional subclassifiers could then be the better alternative.

The benefit of additional classifiers was explored only for the Pleistocene area. Geohydrological layering and glauconite content sometimes provided additional separation, but stratification based on the sand median, sediment colour, or position within the geohydrological framework (recharge, discharge, or intermediate area; Broers 2004) proved not to be of added value. Also, a subdivision in sediment facies was considered, based on the data from the pilot study where this information was collected on boring. The facies subdivision specifically provided insight into what exact spatial unit contains the higher reaction capacity within an otherwise low reactivity sand group. In other words, the occurrences of the high-percentile values can be isolated based on their sedimentological setting. For the nationwide survey, however, this would require a more detailed geological model with reliable information on their spatial occurrence, which is presently not available. Also in other sedimentary basins this will generally not be the case.

The sampling protocol allows for a reclassification of samples based on the results of the laboratory analysis. This is because the classification from field observation and visual inspection of the boring is expected to be less precise than a stratification based on the measured grain size distribution and organic matter content. This was indeed confirmed by the additional pilot study data. From a purely statistical point of view it can be argued that the larger spread in reactivity per lithology class obtained from the field classification is a better measure of the spread in the model units of models such as DGM or REGIS. The spatial configuration of the units is based on borehole descriptions that in most cases go without additional geochemical analysis. However, in view of knowledge development, it seems more pragmatic to optimise the accuracy in the geochemical characterization of the stratum aimed at, and separately assess the uncertainty in the correct field and model classification.

The design of the standard analytical package was based on the principle that the geochemical reactivity of the subsurface is dominated by a limited number of soil properties and processes for the Netherlands. If some key analytes could be used as proxy for a number of reactive properties, an even smaller number of analytical techniques would of course be beneficial from an economic perspective. However, within the strata defined, with their relatively homogeneous mineralogical and chemical composition, the key analytes appear to be statistically independent: preliminary results from the pilot study, where all analytes were available for all samples, showed that mutually explained variance was usually less than 50%. Hence, a further reduction in analytical techniques would result in serious loss of information.