Introduction

Understanding how much water can be abstracted from a particular location in an aquifer, aquifer productivity is important not only for determining groundwater resources but also for renewable energy resources (e.g., those available for use by groundwater heat pump systems). Mapping aquifer productivity at the regional or national scale can provide valuable information to support planning and decision-making.

Various approaches are available for mapping and evaluating aquifer yields. The choice of method usually depends on the purpose and scale of the study as well as on the type and quality of the available data. Where the evaluation has to rely on existing data sets, aquifer productivity is often inferred from aquifer property data (transmissivity, hydraulic conductivity; Bezelgues et al. 2010; Martin et al. 2006; Schomburgk et al. 2005) or from proxy measures such as borehole yields (Banks et al. 2005) or specific capacity (Bezelgues et al. 2010; Macdonald et al. 2012a). These data are usually point measurements and geostatistical interpolation methods are often applied to allow prediction at unmeasured locations. Among these interpolation methods, kriging (Delhomme 1978) is one of the most popular as it allows characterisation of the uncertainty associated with these predictions. It assumes that the field of interest is a realisation of a random field and that the same pattern of variation can be observed at all locations within this field. It also requires that the spatial structure of the field can be adequately quantified from the available data (Huijsbregts 1975). While this can usually be achieved at the aquifer or regional scale, where kriging is widely applied (Bezelgues et al. 2010; Delhomme 1978; Marsily and Ahmed 1987; Papritz et al. 2012; Pucci and Murashig 1987), application at the national scale is more difficult as the inclusion of several different geological (aquifer and aquiclude/aquitard) units may lead to different patterns of variation in the observed properties. In addition, a non-uniform pattern of variation at different spatial scales and the presence of local anomalies can make it difficult to fit a valid model of spatial variation to the data. A small number of studies have applied spatial analysis at the national scale (e.g., Marchant et al. 2010), but they rely on relatively complex and computationally intensive algorithms to account for processes over disparate scales. While providing good parameter predictions and estimates of the associated uncertainty, these methods are not readily applicable within the framework of national-scale mapping projects.

Alternative methods include GIS-based techniques for generating predictions from spatial databases. These are widely applied in landslide hazard mapping (Carrara et al. 1995; Chung and Fabbri 1999; Pistocchi et al. 2001; Van Westen et al. 2003) but also have applications in hydrogeological studies, e.g., to map the distribution of hydrogeological properties (Lewis et al. 2006; Macdonald et al. 2004; Martin et al. 2007) or the ‘groundwater potential’ of an area or region (Chowdhury et al. 2008; Gupta and Srivastava 2010; Saha et al. 2010; Solomon and Quiel 2006). These methods make use of existing thematic maps and spatial data sets. They often involve the assignment of (hydrogeological) attributes and/or weights and ratings to the variables mapped in each thematic layer and their subsequent integration into a single map of index/indicator-based classes of hazard or resource potential (indicator-based maps). Attributes and/or weights are assigned by experts on the basis of field knowledge of the area and the expert’s best process understanding and judgement; hence, the method is, by design, subjective. In the context of environmental modelling, this is often considered a weakness and outputs from models are generally preferred. However, as Kirchner et al. (1996) point out, expert judgements are usually less specific than model predictions, and hence, more accurately reflect the uncertainties inherent in predicting behaviour of complex systems.

Methods have been developed to minimise subjectivity (see review in Kanungo et al. 2006), but there is still no systematic, best practice methodology for mapping hazards or resources (Faulkner et al. 2010). In any case, the methodology underlying the development of such index/indicator-based maps should fit scientific standards and, hence, must include a validation procedure (Girardin et al. 1999), i.e., a process of establishing confidence in the adequacy and usefulness of the output for its specified purpose (Chung and Fabbri 2003; Kirchner et al. 1996; Forrester and Senge 1980 in Nguyen et al. 2007). While model validation is an integral part in the development of numerical models (see review by Nguyen et al. 2007), formal validation is less common for indicator-based maps, and is by no means a standard procedure. Arguably, they can be less easily tested against data or facts (Lewis and Bardon 1998). Nonetheless, Bockstaller and Girardin (2003) propose a framework for validating environmental indicators. They distinguish three types of validation: design validation, output validation and end-use validation relating to conceptual validity, adequacy of the outputs and usefulness to users, respectively. While output validation based on expert judgement is acceptable, the authors emphasise the importance of validating against measured data, e.g., by using formalised tests (Bockstaller and Girardin 2003). Alternatively, Chung and Fabbri (2003) suggest a set of simple procedures for the validation of outputs from hazard mapping. Such validation is important for a number of reasons: (1) it convinces map creators and users of the degree of success in predicting, (2) it demonstrates that the predictions have operational/pragmatic validity (Faulkner et al. 2010), and (3) it communicates the significance of the predictions to map users.

The primary objective of this study was to develop and validate a methodology for mapping potential aquifer productivity on the national-scale based on available geological and hydrogeological data and expert knowledge. Drilling costs can be important considerations when it comes to evaluating the feasibility of developing a groundwater source. Thus, a second objective was to map the approximate depths to the water source. The maps were developed following a strategy similar to that outlined by Chung et al. (2000) for favourability function (i.e., resource potential) models. The steps were amended to fit the purpose of this study, its outputs and the available data. The thematic accuracy of the output maps was assessed and their predictive power was tested and compared against outcomes from random and uniform attribution scenarios.

Outputs from this study were specifically developed and have provided the basis for assessing the suitability of the subsurface for the installation and operation of open loop groundwater heat pump systems (GWHP) with peak loads of 100 kW and more (Abesser et al. 2014). Accordingly, aquifer productivity classes were selected to reflect flow rate requirements of these commercial-scale GWHP systems. Data and maps presented in his paper are particularly relevant for identifying suitable locations for commercial-scale GWHP application. However, the methodology can easily be adjusted for use in other applications, e.g., for mapping aquifer productivity for private and public water supplies, and this is discussed as part of this paper.

Materials and methods

Study area: geology and hydrogeology of England and Wales

England and Wales, which are part of the British Isles and the United Kingdom (UK), have a very variable geology consisting of a complex mix of older metamorphic rocks overlain by varying sequences of sedimentary rocks, into which igneous rocks have been intruded at various times in the geological past. The rocks have been subject to a variety of tectonic processes over an extended period of time. As a result of this, the Permian and younger succession, which contain the main aquifers in eastern and southern England, broadly dips at low angles to the south-east. These aquifers, ranging in age from Permian to Cretaceous, comprise the Magnesian Limestone, Permo-Triassic sandstones, Great and Inferior Oolites, Corallian, Lower Greensand and Chalk (Fig. 1). Within this broad structural pattern, subsidiary basins have been formed (e.g., the London Basin and the Hampshire Basin) that are significant in a hydrogeological context. Elsewhere, the geology is more complicated and the Permo-Triassic sandstones are also present under large parts of western England and in North Wales. The Chalk is the principal aquifer of the UK and underlies much of eastern and southern England. It has a high porosity but its matrix permeability is low due to the very small pore throat sizes (Allen et al. 1997). The Chalk is a major aquifer due to a network of secondary fractures (frequently enlarged by solution) that impart a high permeability. It is referred to as possessing dual porosity (Price et al. 1993) although the effective groundwater storage is primarily within the fracture network and larger pores, not in the matrix as in classic dual porosity aquifers. There is considerable variability in transmissivity within the Chalk associated with topography as well as with depth (Allen et al. 1997).

Fig. 1
figure 1

Map of the study area showing major (colour) and relevant minor (shaded) aquifer units in England and Wales (Digital geological data, British Geological Survey ©NERC. Contains Ordnance Survey data © Crown Copyright and database rights 2015)

The Jurassic limestones are also prominent aquifers in parts of southern and eastern England. They are represented by the Great and Inferior Oolites and the Corallian limestones. They include subordinate sandstones and mudstones, but mostly consist of oolitic limestones with low specific yields. As in the Chalk, an extensive fracture network (enlarged by solution) provides high permeabilities.

The Lower Greensand flanks the Chalk in eastern England and is also present in the south-east of the country. It comprises a series of sands and sandstones of varying degrees of cementation, with silts and clays. The aquifer has some fractures but groundwater movement is dominated by intergranular flow. It is not as productive as the Permo-Triassic sandstones but its high specific yields and moderate permeability make it an important groundwater source.

The Permo-Triassic sandstones are found in a series of deep sedimentary basins in western and north-western England and North Wales; they also outcrop from south-west to north-east England occurring at depth below younger rocks in eastern England and in the Hampshire Basin. The sandstones include the Sherwood Sandstone and the Bridgnorth Sandstone. They generally have a high porosity and permeability and the unconfined storage tends to be higher than in the Chalk. Their aquifer properties are controlled by lithology and the degree of cementation, as well as by fracturing. These properties, and therefore the properties of the aquifer, are often complex and difficult to predict both laterally and with depth (Allen et al. 1997).

The Magnesian Limestone aquifer comprises dolomitic limestones and occupies a narrow north–south-trending outcrop across north-east England. It is separated into two aquifer units by intervening mudstones and evaporites. Groundwater movement is via secondary fractures.

Devonian and Carboniferous age strata can also form aquifers. They are found in south-east Wales and central and northern parts of England and include the Old Red Sandstone, Carboniferous Limestone, Millstone Grit and Coal Measures. They are much harder and more compact rocks and are generally of secondary importance in terms of water supply, although individual formations can provide considerable yields.

The older Silurian to Precambrian age rocks present over large parts of Wales and north-west England, and the Devonian and Carboniferous rocks of south-west England have low primary permeabilities; however, where they occur at the surface, small groundwater yields can be obtained from fractures mainly in the upper 50 m of the indurated sandstones. Alluvial sands and gravels are not major aquifers in England and Wales, but can be important locally with wells and boreholes sited in these deposits in many parts of the country.

Two studies, carried out between 1993 and 1999, collected extensive data for the hydrogeological characterisation of the major and minor aquifers in England and Wales (Allen et al. 1997; Jones et al. 2000), collecting pumping test data for more than 3,000 locations in England and Wales. The data are available from the British Geological Survey’s (BGS) Aquifer property database (AP) and include yield, transmissivity and specific capacity data. Aquifer yields for selected aquifer groups/formations are illustrated in Fig. 2. In this plot, aquifers are arranged according to their yields, ranging from high to low.

Fig. 2
figure 2

Cumulative frequency distributions for selected aquifer groups from different geological ages, arranged according to: a high yield, b moderate yield, and c low yield. D Devonian, C Carboniferous, P Permian, TR Triassic, K Cretaceous, J Jurassic, Q Quaternary

Construction of thematic maps and data layers

Scale of map development and data availability

The map was developed at the 1:250,000 scale for use at the 1:500,000 scale. The scale was selected to reflect (1) the purpose of the map (i.e., for use at the screening level in regional assessments) and (2) the availability, scale and accuracy of the geological, aquifer property/hydrogeological and geophysical data required for mapping the extent and productivity of aquifers in England and Wales.

Map scales of 1:200,000 are generally considered adequate for use in regional assessments (Struckmeier and Margat 1995). It requires, however, that the accuracy of the available map data corresponds with the chosen scale. The key data sets used for the map development and their respective scales are summarised in Table 1.

Table 1 Data sets used in the tool development

Digital geological maps (DiGMapGB) of the UK are available at the scales of 1:50,000 (DiGMapGB_50), 1:250,000 (DiGMapGB_250) and 1:625,000 (DiGMapGB_625). In these maps, geological units are represented by polygons attributed with a Lexicon (LEX) code, describing the lithostratigraphical, chronostratigraphical or lithodemic nomenclature, and with a rock classification scheme (RCS/ROCK) code relating to the lithology or composition of the unit. The cartographic accuracy (a measure of how faithfully the lines are captured; not the accuracy of the geological interpretation) of the maps is nominally 1 mm (Smith 2013) which, at the 1:250,000 scale, equates to 250 m on the ground. The level of geological detail and, hence, the number of polygons and LEX-RCS codes included in the maps increases with increasing scale. The DiGMapGB_250 map is considered to be the most suitable product for regional-scale use (British Geological Survey 2015) and, hence, is used in this study. For England and Wales, it comprises a total of 24,231 polygons and 593 LEX_ROCK codes. For comparison, DiGMapGB_50, recommended for local-scale assessments, comprises 240,526 polygons and 3,902 LEX-RCS labels for the same area.

Mapping aquifer productivity based on the bedrock geology (as described in section “Depth-to-source map”) requires a good understanding of the subsurface flow properties and the transmissive and storage behaviour of the different geological units/formations. Such data are available from BGS’ Aquifer property database. The two data sets were linked by matching aquifer descriptions from the AP database to the appropriate LEX-ROCK codes of the DiGMap-250 geological map. For England and Wales, aquifer yield data were available for 127 LEX-ROCK codes.

Other data sets required for the map development (see Table 1) include BGS’ River Head Space model (RHSM) data (Bloomfield et al. 2007), BGS’ Superficial Deposits Thickness model (SDTM) data (Lawley and Garcia-Bajo 2010) and the ATLAS GIS contour data. The RHSM data provide an estimate of the (shallow) regional water table under natural flow conditions (i.e., without abstractions) as inferred from river locations and river-base-level data, digital terrain model (DTM) data and borehole water level data. The SDTM provides data on the thickness of superficial deposits in the UK as derived from the interpolation of borehole geology data which were manually corrected by Quaternary geologists in areas where borehole data were unavailable. Both maps have been developed at the 1:50,000 scale (with a cartographic accuracy of 1 mm on the map = 50 m on the ground) and, hence, are compatible for use at the 1:250,000 scale. The ATLAS GIS map data is derived from seismic surveys and includes contour maps of the base and the top of the main geological formations that form significant concealed aquifers in the UK (Table 2). The nominal map scale is given as 1:1,000,000; hence, the concealed aquifers are represented in less detail in the bedrock aquifer map compared to where the aquifers are at outcrop. The (horizontal) cartographic error of the ATLAS maps is not known but it is estimated to be around 1,000 m.

Table 2 Geological units included in the mapping of concealed aquifers and maximum depths of aquifer delineation, after UKTAG (2011)

Bedrock-aquifer productivity map

A key objective was to produce a map that shows the distribution of bedrock aquifers (at outcrop and concealed) that can provide sustainable yields of < 1 L/s, 1–6 L/s or > 6 L/s. The term ‘bedrock aquifer’ is used by BGS to refer to geological formations, of Pliocene age or older, that meet the definition of an aquifer (i.e., they are able to store and release water in quantities sufficient to supply useful amounts to boreholes). The term ‘outcrop’ is used to indicate where the aquifer is at the surface (or covered by superficial deposits) as shown on the bedrock geological map. The term ‘concealed aquifer’ is used to refer to aquifers that are present in the subsurface beneath other, generally less permeable bedrock. The distinction between ‘aquifer at outcrop’ and ‘concealed aquifer’ is made to (1) indicate to the user where multiple aquifers are present and (2) because aquifer behaviour (e.g., response to pumping, impact on nearby water features) is different in outcrop aquifers with mostly unconfined water levels compared to concealed aquifers where water levels are often confined.

The map was developed by assigning productivity classes to the geological units represented by the different lithostratigraphical and lithological (LEX_ROCK) codes in the 1:250,000 digital geological map (DiGMapGB_250). Attributions were based on expert judgement and on aquifer property data, estimated from pumping tests, which were available for 127 of the 593 LEX_ROCK codes. Although these make up only 1/5th of the total number of LEX_ROCK codes, they represent about 70 % of the surface area of England and Wales. The data included borehole yield, transmissivity and specific capacity data. Transmissivity, estimated from well pumping tests, is generally the preferred measure of aquifer productivity as it is largely independent of the induced drawdown, amount of aquifer penetrated and borehole diameter. However, borehole yield data were more widely available in the Aquifer property database and, hence, were used in this study. The use of borehole yields for aquifer productivity estimations is considered valid at the regional and national scale, where borehole yields were found to be directly related to transmissivity and, hence, to aquifer productivity (Acheompong and Hess 1998; Graham et al. 2009; Jetel and Krasny 1968). Data from 2,862 locations were included in the analysis. Where results from multiple pumping tests were available for the same location, the median borehole yield was used in order to prevent bias towards more frequently tested sites. Empirical cumulative frequency distributions of borehole yields were drawn for the different aquifer groups (Fig. 2). The data display near log normal distributions, hence non-parametric statistics (median and inter-quartile ranges), were used for the characterisations of aquifer productivity (Banks et al. 2005). Most plots in Fig. 2 include several stratigraphic formations and lithologies (represented by different LEX_ROCK codes) such as illustrated for the Chalk (Fig. 3a), the Sherwood Sandstone (Fig. 3b) and the Magnesian Limestone (Fig. 3c) aquifers. Figures 2 and 3 show that borehole yields can vary by two to three orders of magnitude, both within aquifer groups and also within individual formations; however, a general distinction can be made between high yielding formations (median > 10 L/s, Q25–Q75: 6–100 L/s), moderately yielding formations (median 1– 6 L/s, Q25–Q75: 0.5–50 L/s) and low yielding formations (median <1 L/s, Q25–Q75: 0.5–1 L/s). Accordingly, three yield categories were defined for the mapping of aquifer productivity, representing formations that can typically provide: (1) large yields (>6 L/s), (2) moderate yields (1–6 L/s) and (3) no/small yields (<1 L/s).

Fig. 3
figure 3

Cumulative frequency distributions for the units (described by their LEX_ROCK code) of a the Chalk aquifer, b the Sherwood Sandstone aquifer, and c the Magnesian Limestone aquifer

Each LEX_ROCK code, and subsequently each polygon of the DIGMAP-250 map, was assigned to a productivity class to create the aquifer outcrop map. Attributions were made on the basis of expert hydrogeological judgement and ensuring that the inter-quartile range of yield values of the LEX_ROCK unit approximate the yield range of the attributed category. Where the observed inter-quartile range for a geological unit was larger than the category range, median values were used as a guide value for the attribution by ensuring that they were included within the appropriate range.

In some formations, aquifer yields show considerable regional variations. The Millstone Grit sandstones (MG-SDST), for example, can provide large yields in the Pennines (North England) but are only moderately yielding in South Wales. This was accounted for by adding a geographical condition/quantifier to the attribution of the associated LEX_ROCK code.

Data within the BGS Aquifer Property database were specifically collected for the characterisation of major and minor aquifers; thus, they are biased towards medium-to-high-yielding formations with little or no data for less productive units. To ensure the correct attribution of these lower yielding units, the Environment Agency’s National Abstraction Licence Database (NALD) was used to assess what, if any, abstraction volumes have been licensed for these formations. In England and Wales, abstraction licences are required for all groundwater abstractions of 20 m3/day or more (∼0.23 L/s, if pumped continuously). Licences are granted based on site-specific hydrogeological assessments. Hence, in the absence of actual borehole yield data, this data set was used for constraining the productivity attribution of units not represented in the Aquifer Properties database.

The down dip (i.e., concealed) extent of the main aquifers (Table 2) was mapped using the ATLAS GIS data, producing the concealed aquifer map. Table 2 lists the geological units which are included in this map and also gives the depths to which they are considered to form aquifers according to guidance from the UK Technical Advisory Group on the Water Framework Directive (UKTAG 2011). The aquifers included in the concealed aquifer map represent major aquifers in England and Wales and can provide yields of at least 1 L/s, although most of them provide typical yields of > 6 L/s (Fig. 2).

The rules used to combine the aquifer outcrop map and the concealed aquifer map are shown in Fig. 4, and produced the bedrock-aquifer productivity map (Fig. 5). This map has six categories: (1) no suitable aquifer (yields < 1 L/s); (2) moderate aquifer at outcrop (yields 1–6 L/s); (3) good aquifer at outcrop (yields > 6 L/s); (4) moderate–good aquifer concealed at depth (yields > 6 L/s); (5) moderate aquifer at outcrop and another (moderate–good) aquifer concealed at depth; (6) good aquifer at outcrop and another (moderate–good) aquifer concealed at depth. In this application, aquifer yields were mapped in terms of their potential to support the operation of commercial-size open-loop ground source heat pump systems with peak load requirements of >100 kW. For such loads, minimum flow rates of 2–3 L/s are required. Hence, aquifers with yields <1 L/s are mapped as unsuitable in this application. The different aquifer productivity classes and their respective yield ranges are summarised in Table 3.

Fig. 4
figure 4

Rules for the development of the bedrock aquifer map

Fig. 5
figure 5

Final map product: Bedrock-aquifer productivity for England and Wales (see also BGS website, British Geological Survey 2014b) (Digital geological data, British Geological Survey © NERC. Contains Ordnance Survey data © Crown Copyright and database rights 2015)

Table 3 Bedrock-aquifer productivity classes and their yield ranges

Map validation

The accuracy of the aquifer productivity map was tested by comparing map predictions against pumping test data from an independent data set (i.e., not used for aquifer productivity characterisation and attribution). The data were provided by the BGS’ Wellmaster database which contains borehole yield data from well completion and (initial) test pumping. The data are provided to BGS by drilling contractors (under the Water Resources Act, 1991 and its predecessors that requires the reporting of all water boreholes of 15 m depth or more). To ensure that the yield values are representative of long-term sustainable yields, only yield data from pumping tests with a duration of at least 1 h (generally > 6 h) were included. Sites for which Aquifer Property data exist were removed from the data set to ensure independence of the verification data, from the data used for attribution. The verification data set included data from 3,282 sites. For each data point, the recorded borehole yield was compared with the productivity of the class they fell in based on the aquifer properties data. The range of yields within each class (Table 3) is shown in Fig. 6.

Fig. 6
figure 6

Distribution of observed yields (from the Wellmaster database) within the predicted productivity classes illustrated in the form of a box plots and b empirical cumulative frequency distribution (ECFD) plots. Yield ranges for classes 16 in plot b are the same as in plot a

To convince users of the utility of the map (i.e., its effectiveness in predicting potential borehole productivity), it was important to demonstrate that the proposed methodology improves prediction success over, for example, randomly assigning classes to the map (random attribution) or assigning one class to the entire map area (uniform attribution). To assess the utility of the map, its predictive power (i.e., its ability to predict potential aquifer productivity correctly) was compared against the predictive power of (1) random attribution and (2) uniform attribution. The term ‘predictive power’, as used in this study, refers to the proportion of sites for which the yields were predicted correctly by the map or alternative attribution scenarios.

The assessment focussed on the ability of the map to correctly predict the yield categories—i.e., (A) > 6 L/s, (B) 1–6 L/s or (C) < 1 L/s—for the yields obtained at a given site (irrespective of concealment conditions/ number of aquifers). A reference data set was created by assigning each site from the verification data set to one of the yield categories (A–C) based on the borehole yield recorded for the site. This represented the reference (‘correct’) attribution against which predictions (e.g., the class predicted by the aquifer productivity map) were compared. Since concealment conditions were not considered in this assessment, each of the six aquifer productivity classes could be directly linked to one (or more) of the yield classes. For example, aquifer productivity classes 3, 4 or 6 and deeper boreholes in class 5 cover the yield range of yield class C. Hence, where the map predicted productivity class 3, 4 or 6, these were considered to be correct if the observed yields lay within yield range of category A (>6 L/s). Table 3 shows the yield ranges for the different aquifer productivity classes and how they relate to the yield categories (A–C) used in this assessment. Based on the aforementioned methodology, the proportion of sites for which productivity ranges were correctly predicted was calculated.

Using a random number generator, random class numbers (uniformly distributed between 1 and 6) were generated for each location for which measured yields were available (random attribution) and compared against the observed yields (i.e., the reference attribution) at each location. A total of 1,000 realisations were run and for each of these, the proportion of sites for which productivity ranges were correctly predicted was calculated. Uniform attribution means that the entire map area is attributed to the same productivity class. This was tested for all six aquifer productivity classes and in each case, the proportion of sites for which yield ranges were correctly predicted was calculated.

Depth to source map

Another objective was to produce a map estimating the depth required to reach the uppermost (i.e., the shallowest) aquifer. This depth can coincide with the depth to the piezometric surface/water table (where the aquifer is not concealed) or it can represent the thickness of superficial sediments or less permeable rock formations that overlie the aquifer (where the aquifer is covered by superficial deposits/ concealed). It was derived by combining (1) BGS’ River Head Space model data (RHSM) which provides an estimate of the regional water table under natural flow conditions (i.e., not depressed by abstraction) (Bloomfield et al. 2007), (2) the BGS Superficial Deposit Thickness model (SDTM; Lawley and Garcia-Bajo 2010), which provides the thickness of superficial deposits overlying bedrock formations and (3) ATLAS GIS Map contours of the top of the main aquifers (Table 2). These layers were combined according to the rules in 6 and grouped at 50-m intervals into eight depth classes ranging from less than 50 m below topographic surface (class 1) to 350–400 m below topographic surface (class 8).

Map validation

Verification of the accuracy of the map was difficult due to the lack of suitable data. The best available data for assessing the map performance were pumping water level data and borehole completion depth data from BGS’ Wellmaster database. By definition, these are influenced by water level drawdown due to pumping and the depth of borehole penetration into an aquifer, respectively, both of which are not considered by this map.

Pumping-related drawdown is usually localised (except in confined aquifers and in some urban areas). Typical drawdown values for the aquifers in this study were between 3 and 20 m (inter-quartile range from drawdown measured at 2,781 locations), i.e., the average drawdown is noticeably smaller than the 50-m mapping interval. Thus, map predictions were considered to be acceptable where the recorded pumping water level fell within the range predicted by the map. Comparison of predicted depth against pumping water levels was only carried out for areas where the aquifer is considered to be unconfined (i.e., the water table = below the top of aquifer). Where confined, hydraulic heads (and hence borehole water levels) are, by definition, above the top of the aquifer and, therefore, not useful for this verification.

To permit comparison of map predictions against borehole completion depth, some allowance had to be made for average borehole depths. Typical borehole depths in the Wellmaster database range between 35 and 95 m (inter-quartile range from 24,439 boreholes, median = 60 m). Thus, map predictions were considered to be acceptable where the recorded borehole completion depth was within the predicted depth range or up to 100 m (2 classes) deeper.

As for the aquifer productivity map, the utility of the depth-to-source map was assessed by comparing its predictive power (i.e., its ability to predict depth ranges correctly) against the predictive power of (1) random attribution and (2) uniform attribution. A depth class (1–8) was assigned to each site to be verified based on the available pumping water level data/borehole completion depths. This represented the reference (‘correct’) attribution against which predictions were compared.

Using a random number generator, random class numbers (uniformly distributed between 1 and 8) were generated for each location for which depth data were available (random attribution) and compared against the observed pumping water levels/borehole completion depth (i.e., the reference attribution). A total of 1,000 realisations were generated and for each of these, the proportion of correctly predicted depth ranges was calculated.

Prediction power of uniform attribution was tested by assigning the entire map area to one of the eight depth classes and calculating the proportion of sites for which the ranges were correctly predicted. This was tested for depth classes 1–3, which are the most prevalent.

Results and discussion

Bedrock-aquifer productivity map

Figure 5 shows the bedrock aquifer map for England and Wales indicating the aquifer productivity class that can reasonably be expected. The mapped yield ranges correspond to approximate inter-quartile ranges (i.e., all values between the 25th and the 75th percentile of the cumulative frequency curves in Figs. 2 and 3) of borehole yields from the different geological units as estimated from aquifer property studies. As such, the map represents only the most frequently observed (central 50 %) of the observed yields and, hence, predicts the productivity range most likely to be encountered at a given locality (as inferred from the underlying geology).

Comparing map predictions against actual borehole yields (Fig. 6a,b) shows that each class contains a wide range of yields. There is considerable overlap between the inter-quartile ranges (represented by the base and top of the boxes in Fig. 6a) of the different classes. This can be expected when taking into consideration that the aquifer yields span several orders of magnitude for most geological formations included in this study (Fig. 2). The inherent variability in aquifer properties tends to increase with increasing degree of fracturing and with average fracture size (Banks et al. 2005; Grey et al. 1995). This is illustrated in Fig. 7 for some of the main aquifers in England and Wales, which shows an increasing trend in uncertainty in predicting the aquifer properties from relatively predictable, intergranular-flow dominated aquifers, such as the Lower Greensand, to fracture-flow dominated aquifers such as the Chalk.

Fig. 7
figure 7

Flow characteristics in major UK aquifers (after Grey et al. 1995)

Median values (central line in boxes in Fig. 6a) for all map productivity classes (1–6) fall within the attributed yield range (Table 3), i.e., medianClass1 < 1 L/s, 1 L/s < medianclass2 < 6 L/s, medianclass3,4,6 > 6 L/s, medianclass5 > 1 L/s. This implies that at least 50 % of the boreholes in each class provided the predicted yields. The proportion is higher in classes 5 and 6, that comprise two aquifers, either with different yield ranges, i.e., class 5 (moderately yielding aquifers overlying a good aquifer) or with similar ranges, i.e., class 6 (highly productive aquifers such as the Chalk or the Sherwood Sandstone overlying another very productive aquifer, such as the Lower Greensand or the Magnesian Limestone). The relationship between the distribution of observed yields within each class and the class boundaries (marked as vertical dashed lines) is shown in Fig. 6b.

Table 4 shows that map predictions were correct in 56 % of the cases compared to an average of 38 % when randomly attributing a class. Uniform attribution of one of the six aquifer productivity classes to the entire map areas resulted in correct predictions of between 6 % (class 4) and 29 % (class 3; Table 4). The prediction success of 56 % indicates that in 44 % of the cases the potential productivity was either underestimated (5 %) or overestimated (39 %). The degree of error/uncertainty can be expected, considering that the map predicts the most likely productivity range for a given location/area, and is related to the high degree of heterogeneity within the different aquifers.

Table 4 Comparison of observed borehole yields (Wellmaster data, n = 3282) against yield ranges predicted by the aquifer productivity map, random attribution or uniform attribution

The tendency to overestimate yields is likely to be due to the bias in the input data on which the aquifer attribution is based. Both data sets, BGS’ Aquifer Property data and the Environment Agency’s (EA) NALD data are, by their nature, biased towards higher yielding boreholes and with considerably less data on low-yielding boreholes, in particular where yields are non-licensable (i.e., <20 m3/day or 0.23 L/s, although in reality the instantaneous yield obtained from these boreholes could be as much as double this amount (0.5 L/s) as they will rarely be pumped for more than 10–12 h/day). Some bias is also expected in the validation data set (BGS’ Wellmaster data). In theory, all water boreholes of 15 m depth or more should be reported to the BGS and hence should be considered in the validation. However, in praxis, drillers tend not to report dry or failed wells to the relevant authorities (Banks et al. 2005).

Some overestimates could be due to the required yield from a borehole being less than the volume of water an aquifer is capable of supplying. In any case, the prediction power of the aquifer productivity map was significantly higher (at the 99 % significance level) than that of the random and uniform attribution scenarios.

The ability of the map to correctly predict concealment conditions or the number of aquifers was not assessed in this study as it would have required independent data on the location and extent of concealed aquifers, which were not available. Only the main geological formation (Table 2) that both form important aquifers at depths and whose subsurface distribution has been mapped, were identified as concealed aquifers. Less important concealed aquifers are not considered in this layer either because their subsurface extent is not known or they do not form important aquifers at depth. A more detailed presentation of the subsurface geology (including geological volumes and units) is currently being developed as part of a three-dimensional (3D) national geological model (NGM; British Geological Survey 2014a). This will provide the necessary data required for more detailed mapping of the concealed aquifers in England and Wales and/or for validating concealment conditions in the current map.

Superficial deposits are not considered in this map, even though, locally, they can form moderately productive aquifers, capable of supplying sustainable borehole yields of more than 1 L/s (Ó Dochartaigh et al. 2011; Birks et al. 2013). However, the inherent heterogeneity of superficial deposits means their properties as aquifers (e.g., permeability, thickness and lateral extent) can change significantly over short distances even within the same lithological unit. BGS maps of the surface distribution of superficial deposits and their thickness are available but these are often classified on their mode of origin (e.g., glacial, fluvial, marine or aeolian) rather than lithology. Permeability within these deposits can vary hugely (Macdonald et al. 2012b), and productivity will also depend on the lithology of the deposit, area of outcrop and saturated thickness, making it difficult to distinguish between deposits that can yield significant volumes of water and those that cannot at the 1:500,000 scale.

Depth-to-source map

The rules used to create the depth to the groundwater source map from the regional water table, the thickness of superficial deposits and from contours on the base of overlying formations are shown in Fig. 8. The resulting depth-to-source map (Fig. 9) shows the depth that it is necessary to drill to reach the water source. The depth is the distance from the ground surface to the top of the aquifer and, hence, the minimum length of borehole required.

Fig. 8
figure 8

Rules for the development of the depth-to-source map. Maximum/minimum value indicate where the highest/lowest value of the combined data sets is used in the assignment

Fig. 9
figure 9

Final map product: Depth-to-source for England and Wales (see also BGS website, British Geological Survey 2014b) (Digital geological data, British Geological Survey © NERC. Contains Ordnance Survey data © Crown Copyright and database rights 2015)

Boreholes are usually completed within the aquifer they abstract from; hence, in areas where the aquifer is present at rockhead, the depth represents the distance from the surface to the water table, unless the aquifer is covered by superficial deposits. Where an aquifer is covered by low permeability superficial deposits or concealed by less permeable formations, these must usually be completely penetrated to reach the underlying aquifer (even when the piezometric surface lies within these overlying formations). Hence in concealed areas and where superficial deposits are present, the depth shown on the map represents the thickness of the overlying formation that needs to be penetrated to reach the top of the aquifer. This also applies where the aquifer is concealed but not confined (i.e., the water level is below the top of the aquifer). In these cases, the depths required to reach the source will be somewhat deeper, but is unlikely to fall outside the 50-m mapping interval.

Normally, a significant saturated thickness of aquifer needs to be penetrated to achieve a sustainable yield and accommodate any drawdown that may result from pumping. Pumping-induced drawdown and depth of penetration into an aquifer were not accounted for in this map. This needs to be kept in mind when using the map for estimating total drilling depth/cost. This also has implications for the validation of the map which is based on pumping water levels and borehole completion depths, i.e., data sets that are not directly comparable to the map output.

Table 5 shows the results from comparing predicted depth ranges in outcrop areas against pumping water levels from the Wellmaster database. It shows that for 73 % of the sites, the depth range predicted by the map agrees with the recorded pumping water level. For 14 % (13 %) of the sites, pumping water levels were lower (higher) than the predicted depths. Comparison against the borehole completion depth (outcrop + concealed areas; Table 6) shows that for 41 % of the sites, the recorded borehole completion depth was within the predicted depth range. For 35 and 12 % of the sites, the actual borehole depth was under-predicted by 1 class (50 m) and 2 classes (100 m), respectively. Typical borehole depths in the study area range between 35 and 95 m (median 60 m), suggesting that under-prediction by 1–2 classes is acceptable. This suggests that, for most locations, the depth to source was modelled correctly; however, it is not possible to quantify model success based on the available data.

Table 5 Comparison of observed pumping water levels (Wellmaster data, n = 8,980) against depth ranges predicted by the depth-to-source map, random attribution or uniform attribution for outcrop (i.e., excluding concealed) areas
Table 6 Comparison of observed borehole completion depths (Wellmaster data, n = 24,448) against depth ranges predicted by the depth-to-source map, random attribution or uniform attribution for outcrop and concealed areas

To assess the utility of the depth-to-source map relative to alternative attribution methods, the map performance was compared against prediction success from random attribution and uniform attribution scenarios. Prediction success was assessed by comparing the predictions against reference attributions based on observed pump water levels (Table 5) and borehole completion depth (Table 6). The results show that, prediction success of the depth-to-source map is significantly higher than that from random attribution. There is no significant difference between the performance of depth-to-source map and assigning depth class 1 to the entire map area, implying that there is no advantage in using the depth-to-source map over assuming a depth range of < 50 m for the entire map area. The similarity in prediction success is largely due to the nature of the map and the validation data set. Uniform attribution will always perform well where one or two classes dominate the mapping area and where the observations used for the validation are biased towards these classes. Class 1 is by far the most dominant class of the depth-to-source map (Fig. 9) as the majority of water level depth/confining features tend to be < 50 m. This implies that the (vertical) resolution at which the depth to the water source is mapped is too small, in particular in the shallow subsurface, and could be increased to improve the usefulness of the map. The vertical mapping intervals were set to match the nominal (vertical) resolution of the AtlasGIS data, which tend to be more important for mapping the depth to concealed aquifers in the deeper subsurface. In the shallow subsurface (<50 m), depth-to-source is predominantly mapped using regional water table and superficial thickness data, both of which are available at 1:50,000 (horizontal) resolution and would support a vertical discretisation into 10-m depth-intervals.

Furthermore, using borehole completion depth and pumping water level data (which tend to be < 100 and < 50 m, respectively) for map validation means that the validation data set is also strongly biased towards the shallow depth classes. Classes 1 and 2, for example, include 77 and 98 % of all observed borehole completion depth and pumping water level data used in the validation. This means that map performance for predicting depth to water source at depths > 100 m remained largely untested.

Applicability of the approach

The approach has broader applicability beyond the study area to areas where key data on surface geology and on the subsurface extent of key hydrogeological units exist and where an adequate density of yield and depth to groundwater data is available to enable credible validation of the modelled results. This limits its applicability to relatively well-parameterised systems and is most likely to be of use where the expansion of an existing groundwater abstraction regime is required, e.g., for private/public water supply or as a source for renewable energy (e.g., GWHP), or where conflicting demands on the groundwater resource have to be managed.

The approach is applicable at different scales. In this study, maps are developed for use at the 1:500,000 scale and are not intended for use at the local-scale or for site specific investigations. More detailed desk studies and site investigations by qualified professionals will always be required to check more detailed datasets (geological and hydrogeological maps and records) and to define the conceptual model to support the operational and technical boundary conditions for the proposed abstraction.

The methodology can be applied to larger scale maps, e.g., 1:50,000 geological maps. However, this requires that sufficient depth and yield data is available for (a proportion of) the different formations to characterise their hydrogeological properties/yield characteristics and also to validate the outputs. In this application, yield data were available for 20 % of all mapped formations, covering 70 % of the study area. These included borehole yield data for all major and most of the minor aquifer formations and were sufficient to validate the original heuristic attribution of potential yields and to provide confidence in the approach by illustrating that, in the absence of detailed data, acceptable results can be achieved based on good hydrogeological knowledge.

From comparison with measured yields (at 3,283 locations), it is estimated that the overall confidence in the map estimations is 56 % for the aquifer productivity map, with a 39 % chance of yields being over-estimated. Mapping at larger scales may reduce some of the uncertainty resulting from the aggregation of smaller geological formations into larger units, but some level of uncertainty will almost remain due to the heterogeneous nature of the geological formations and the inherent variability in aquifer properties (Fig. 7). The uncertainty associated with the depth predictions could not be quantitatively assessed, but comparison against proxy data suggests that confidence in these predictions is high (73–88 %).

Quantification of prediction uncertainty for individual formations or areas was not undertaken in this study, but could be applied to identify formations/areas for which prediction uncertainty is high and where additional data are required to improve the yield/depth predictions. As such, the proposed approach also has utility for application in less well-characterised systems, for example as a tool for identifying priorities for targeted data acquisition programmes to improve understanding of the distribution of aquifer productivity and depths to source.

National-scale (England and Wales) application of the proposed methodology at larger scales, i.e., using BGS’ 1:50,000 DiGMapGB_50 product, was not within the scope of this study. It would require the attribution of 240,526 polygons and 3902 LEX_RCS codes (compared to 24,231 and 593 in the current application) and, hence, requires a considerably larger amount of pumping test data. Due to the high data demand, aquifer productivity mapping at the 1:50,000 scale is more suitable for regional-scale applications. Abesser (2012), for example, successfully applied the methodology to the West Midlands area as part of a study to map subsurface suitability for open-loop GWHP installations.

Yield categories and aquifer productivity classes presented in this paper were selected to reflect flow rate requirements of commercial-scale GWHP systems. The map is, therefore, specific to this application. Yield requirements for other applications, e.g., private/public water supply, vary. A typical dairy farm in the UK (= private water supply), for example, requires 25 m3 water per day (= 0.3 L/s if pumping constantly; DairyCo 2009), while a moderately productive public water supply borehole abstracts a few ML/day (>50 L/s); hence, if applied elsewhere, the productivity ranges presented in the map need to be adjusted to fit the intended employment of the map.

Conclusions

A methodology has been presented to map aquifer productivity and depth-to-source at the national (England and Wales) scale. It makes use of the close association between geological and hydrogeological properties of rock formations and uses expert knowledge and pumping test data to assign hydrogeological properties to geological map units (whose distribution at the surface and subsurface is known) in order to estimate aquifer productivity. It draws on widely available data sets and with the increasing availability of 3D geological models, will be easy to apply at a range of scales, from regional to national, to map aquifer productivity and/or depth to source.

Aquifer yields can vary greatly within individual geological formations and this is accounted for by using inter-quartile ranges of observed yields for defining and attributing aquifer productivity classes. Hence, the uncertainty associated with predicting productivity for different geological formations/aquifers is built-in to this map. The utility of the proposed methodology strongly depends on the quality of the data sets used for aquifer characterisation and attribution. The data sets employed in this application are, by their nature, biased towards higher yields; hence, while the overall prediction success of the aquifer productivity map is satisfactory, it has a tendency towards over-predicting yields for some of the formations.

An essential component in making predictions is the validation of the results as this determines their reliability and demonstrates their utility over using other prediction methods. The validation methods employed in this study consist of an assessment of the accuracy of the map themes and of their utility compared to random and uniform class assignment methods. The validation confirmed that the maps provide acceptable accuracy and that use of the bedrock-aquifer productivity map considerably improves prediction success (compared to random or uniform attribution). The original heuristic approach for the attribution of the aquifers was validated for 127 geological units based on more detailed information, providing considerable confidence that the same methodology can produce acceptable results for the remaining 466 units for which these supporting data were not available.

However, validation also revealed that the utility of the depth-to-source map is limited, due to insufficient vertical resolution in the shallow (<50 m) subsurface. The vertical resolution of this map is currently limited by the resolution of the data (AtlasGIS), which is mostly used for mapping aquifer distribution at depths > 50 m. Other data layers used for developing the map have a smaller spatial resolution and could be used to increase the vertical discretisation of the map in the upper 50–10 m depth-intervals, thereby increasing the utility of the map.

Pumping water levels and borehole depths provided tolerable proxy data for validation of the depth-to-source map and for testing the map performance, although some assumptions had to be made about typical pumping-induced drawdown levels and borehole depths. However, the bias in the validation data set towards shallower depth classes meant that map performance for predicting depth to water source at >100 m depth could not be sufficiently tested. This highlights the importance of identifying and sourcing suitable validation data sets, e.g., from drilling projects unrelated to groundwater exploration or from sources outside of BGS, in order to gain confidence in the outcomes of the validation.

Overall, the proposed methodology provides a suitable alternative to more time- and data-consuming (geostatistical) methods. It produces maps which predict potential aquifer productivity/depth to source with an acceptable accuracy and within the uncertainty range observed for individual aquifer formations and associated with the heterogeneous distribution of permeability and fracture development.