Opportunities have never been greater for sharing hydrogeological data across national and international borders, and across governmental, industry and academic sectors. These opportunities are driven primarily by a pressing need to better understand our natural environment at all scales, especially continental and global (e.g. Boyd and Crawford 2012; Mayer-Schonberger and Cukier 2013). Major challenges include meeting the expanding demand for water resources associated with human consumption, food production and energy resources (Gorelick and Zheng 2015; Scanlon et al. 2012), the potential impact of unconventional energy exploitation (e.g. AWA 2012; Vidic et al. 2013; Vengosh et al. 2014), the predicted impacts of changing climate (e.g. Taylor et al. 2013), as well as other ecological and environmental impacts (e.g. Nevill et al. 2010).

The growth of hydrogeological data is another significant driver, magnified by its much greater availability due to technological advances and governmental open data policies (Zuiderwijk and Janssen 2014). However, collecting more data and increasing its availability is only part of the solution to the major challenges, as there are limitations in how to transform this data into improved understanding of hydrogeological science and water management (e.g. Loch et al. 2014). These limitations are largely due to the distributed custodianship and heterogeneity of groundwater data, making it difficult for the end-user to discover, access and harmonize the data. Although there may be an impressive volume of global groundwater data available, few users have the capacity or desire to overcome these difficulties.

An additional part of the solution is making disparate data interoperable—that is, making the data usable in a seamless manner regardless of its original heterogeneity. Data interoperability typically requires the transformation of data into a common representation readily understood by both the data supplier and consumer (Brodaric et al. 2016). Often this representation takes the form of a data standard developed and maintained by key stakeholders, one that is typically used within two main technological environments: (1) within data networks that supply heterogeneous data to users, or (2) within workflow systems that chain together diverse software for modularized processing of data. While online groundwater workflow systems are somewhat nascent (e.g. Klug and Kmoch 2014), web-based groundwater data networks have become more prevalent. Such networks consist of distributed data sources that are managed autonomously and that possess heterogeneous data structure and content. Government initiated examples of such groundwater data networks include the Canadian ‘Groundwater Information Network’ (Boisvert and Brodaric 2012; Brodaric et al. 2016); the United States ‘National Ground-Water Monitoring Network’ (Blodgett et al. 2016; Brodaric et al. 2016); groundwater aspects of the French ‘Water Information System’ (BDLISA 2013, 2014), the New Zealand SMART system (Klug and Kmoch 2014; Kmoch et al. 2015); and the Australian ‘National Groundwater Information System’ (BOM 2015; Carrara et al. 2015). Outside of government, the Australian ‘Visualizing Victoria’s Groundwater’ portal (Dahlhaus et al. 2012, 2016) is an interoperable data network developed by university researchers. It is noteworthy that the growth of such data networks is not limited to groundwater science, but echoes a larger trend in the sciences generally and in hydrology particularly (Brodaric and Piasecki 2016).

A consequence of the rise of water data networks is a growing imperative for their connection, which recognizes the need to build networks of networks to supply data for large-area science problems (e.g. Nativi et al. 2014). As this need becomes more evident, particularly across political boundaries, the absence of an international groundwater data representation becomes more acute. In this paper, this gap is addressed through the establishment of GroundWaterML2 (GWML2), a new global standard for groundwater data. GWML2 has been developed by a group of international groundwater data providers collaborating within the Groundwater Standards Working Group of the Open Geospatial Consortium (OGC). It comprises the groundwater component of the WaterML2 suite of hydrologic data standards and best practices (WaterML2 2017), and consists of data structures and encoding guidelines for groundwater data. GWML2 has been successfully implemented and tested by a variety of major groundwater data providers from North America, Europe, and Australasia, and it is expected to complement emerging data standards for both surface water and atmospheric water to facilitate data exchange for the complete water cycle.

This paper describes the motivation behind the development of GWML2, the development methods, the representation itself, its testing and evaluation, its relation to other standards and relevant work, and a summary including possible future directions. Overall, it is shown that GWML2 is an effective data transfer mechanism and a useful interoperability aid for groundwater data.

Usage scenarios for GWML2

Five usage scenarios motivate GWML2. These represent common and important data delivery situations, categorized as commercial, policy-oriented, environmental, scientific, or technological. They are selected to guide the design of GWML2 and to provide evaluation criteria, inasmuch as the data required by these scenarios should be representable in GWML2 and transmittable from major groundwater data sources to enable the scenarios to be carried out.

The commercial scenario involves finding water wells and springs within an area to enable estimation of the cost to complete a new water supply well—for example, a consultant or water well driller could use a web tool to explore the local geology and inspect wells located near the target area. By investigating the surficial sediment and rock materials, water level, yield and total depth at each well, the consultant could infer the distance and materials to the water table, as well as the expected yield, and the driller could estimate the cost and timeline for drilling. The public are also able to assess online water well records and make independent estimates. This not only aids evaluation of drilling potential, but consequently might also influence property purchases. Key entities in this scenario include water wells, related measurements such as water levels, as well as various hydrogeological units such as aquifers or confining beds.

The policy scenario involves reporting on the state of groundwater in specific administrative districts, motivated in large part by European requirements. Involved is the collection and evaluation of geological and hydrogeological characteristics, as well as the quantitative and qualitative monitoring of chemical and physical indicators. This aids the overall assessment of a management area, especially one that crosses political borders, and typically requires synchronization of information collected by multiple state water authorities. Key entities include management areas, related hydrogeological units, and monitored information.

The environmental scenario enables environmental managers, water managers, and legislators to assess threats to groundwater dependent ecosystems. The role of groundwater in sustaining environmental values is of growing importance globally, particularly in arid countries such as Australia. Key items include depth to water table, monitored information on groundwater chemistry and biology, and flow between groundwater and surface water.

The scientific scenario focuses on the delivery of data for use in groundwater flow modeling and soil-water balance modeling, as one among a myriad of possible scientific activities. Key entities for such modeling include the hydrogeological and geophysical properties of aquifers and related measurements, as well as characteristics of water wells, water bodies, and water use.

The technologic scenario determines compatibility with other hydrogeological data representations such as database schemas and exchange formats, via conversion to and from GWML2. This is particularly important to enable data interoperability (1) within a groundwater data network, by converting between local databases and GWML2, or (2) between different data networks, by converting between GWML2 and local data formats such as the European-wide INSPIRE (INSPIRE 2013a,b) or North American GWML1 (Boisvert and Brodaric 2012). Involved are all entities from the previous scenarios.

Methods for developing GWML2

GWML2 developers include hydrogeologists and information scientists from a cross-section of major governmental and academic groundwater data providers. The development of GWML2 follows five steps, as shown in Fig. 1, beginning with a needs assessment and concluding with a tested design published as an international standard. Step 1 involves the establishment of prototypical usage situations, as described in section ‘Usage scenarios for GWML2’, which form the requirements and testing criteria for GWML2. In step 2, key terms are identified from the usage scenarios and authoritative definitions are attached to ensure a common conceptual foundation as well as to frame GWML’s scope (GWML2 Vocabulary, 2017). Step 3 involves the development of a technical design (Brodaric 2017) expressed as three related information models that culminate in a specification for encoding data in XML (Extensible Markup Language; W3C 2008). In step 4 example encodings are developed for each component of GWML2, and these are made permanently available online alongside the design specification (GWML2 XML Examples 2017; GWML2 XML Schema 2017). Step 5 tests the encodings in implementations across North America, Europe and Australasia to ensure satisfaction of the usage scenarios, thereby providing proof of feasibility (see section ‘Implementation and evaluation of GWML2’).

Fig. 1
figure 1

GWML2 development method

Results: the GWML2 design

As sketched in Fig. 2, GWML2 can represent a wide variety of entities associated with hydrogeological units, subsurface bodies of water, and man-made artifacts such as wells. A novel aspect of its design is the refinement of a water container pattern (from Boisvert and Brodaric 2012), which conceptually distinguishes (1) a container such as a rock body or sand unit, (2) the spaces (voids) hosted by the container, and (3) the fluids occupying those spaces. These distinctions enable properties to be attributed to each of the containers, fluids, or voids individually, as well as to their relations. For example, an aquifer vulnerability estimate is a property of the unit or its part, the volume of a fluid is a property of the fluid body, and the volume occupied by voids such as fractures or pores, is a property of the voids. In contrast, porosity is a property of the relation between a container and its hosted voids, if understood as the proportion of void volume to total unit volume.

Fig. 2
figure 2

Main entities represented by GWML2, labeled with GW

Types of hydrogeological units in GWML2 are aquifers, aquifer systems, confining beds, and groundwater basins. Such units are understood to be subsurface volumes of earth material that serve as containers for fluids. Their physical boundaries are demarcated by hydrogeological characteristics or, in the case of basins, by groundwater flow divides. As fluid containers, units variously host voids and are associated with a wide variety of properties such as porosity, transmissivity, storativity, hydraulic conductivity, yield and so on. GWML2 allows hydrogeological unit properties, and many others, to be further categorized by the data provider—for example, the porosity property can be additionally categorized as effective porosity, total porosity, or any other type of porosity, and the yield property can be similarly further categorized as safe yield, sustainable yield, aquifer yield, or any other relevant notion of yield. Importantly, these categorizations do not replace the standard GWML2 property, rather they serve as an optional and supplementary description for it. In addition to such augmented categorization, GWML2 also allows multiple occurrences of property values. This enables, for instance, not only both total and effective porosity to be specified for a hydrogeological unit, possibly as summary values for the unit as a whole, but it also permits the specification of multiple property values scattered throughout the unit to capture their spatial distribution, which is especially required by the scientific modeling usage scenario. This flexible mode of representing property values is due to re-use of the ISO/OGC Observations and Measurements standard (O&M; Cox 2011, 2013), for representing measured values, and re-use of the OGC GeoSciML standard (GeoSciML, 2017), for representing a full geological description of a hydrogeological unit, including its materials, age, stratigraphic relations, and associated properties.

Figure 3 illustrates the use of the GeoSciML and O&M standards in a partial GWML2 encoding for a Canadian aquifer system, namely the Appalachian External Zone. Encoded are multiple material compositions from GeoSciML (i.e. gsml:composition), prefixed with “gsml”, and with the “Shale” instance fully open. Also shown are some key characteristics of the hydrogeological unit, prefixed by “gwml2” such as recharge (i.e. gwml2:gwUnitRecharge), discharge (i.e. gwml2:gwUnitDischarge), and the fully open hydraulic conductivity description (i.e. gwHydraulicConductivity). The value for this latter property is represented as a complex Observation, prefixed with “om”, and includes supplemental categorization of the property as a statistical median via the assignment of the “Hydraulic Conductivity – Median” qualifier (i.e. for om:observedProperty). Note the omission of several sections in Fig. 3 for reasons of space and readability, as indicated by “[…snip…]”.

Fig. 3
figure 3

Partial GWML2 encoding for an aquifer system and its median hydraulic conductivity. “[…snip…]” denotes sections deleted for reasons of readability and space

Hydrogeological units in GWML2 can also be associated with water budgets, discharge and recharge estimates and their geographical zones, as well as management areas. Fluid bodies in GWML2 refer to the fluid contained in the voids of hydrogeological units, including their biologic, chemical and material constituents. Fluid bodies can also be subdivided into nested parts such as plumes, and can host various surfaces such as a water table, piezometric or potentiometric surface. These surfaces can in turn host divides—that is, lines projected onto the surface to denote the divergence of flow direction within the fluid body. Types of flow represented by GWML2 include those within containers, as well as between containers. Intra-container flow is exemplified by the movement of fluid within a unit, and inter-container flow is exemplified by recharge and discharge.

Man-made artifacts in GWML2 consist of water wells and monitoring sites, as well as related measurements and calculations recorded as single values or multi-valued time series. Water wells can be described geologically, via reference to their host hydrogeological units, as well as by well logs that describe the lithology of the different units or various measurements along the length of the borehole such as those from borehole geophysics logging. Water wells can, of course, also be described by various hydrogeological characteristics such as water level depth and yield, and by construction characteristics associated with casings, screens, and seals. GWML2 monitoring sites are places where sensing devices periodically measure significant hydrogeological properties such as water level, flow rate, water temperature, and chemical composition, or where samples can be taken. Such sites are closely related to other entities in GWML2, for example water wells or springs, which can host the devices or the entity being monitored or sampled.

GWML2 also provides a framework for representing a wide variety of aquifer tests, from which hydrogeological properties can be determined. This might involve pumping or injecting fluid at known rates, and then observing changes in the water table over time, or injecting a tracer at some location and following its progression at observation points. GWML2 representation of such tests then interrelates data about initial conditions, test parameters and procedures, as well as sampling sites and pertinent entities such as wells and the targeted hydrogeological unit, in addition to the measured and calculated values for various GWML2 properties. If perchance an exotic property is missing in GWML2, it can be added as a supplementary categorization to an observation description. In this way, a wide variety of aquifer tests can be represented (see Brodaric 2017, pp. 106–115). A full description of the entities represented by GWML2, as well as their technical design, can be found in the standards specification (Brodaric 2017).

Implementation and evaluation of GWML2

GWML2 is successfully implemented by nine major providers of groundwater data, with the implementations collectively delivering a wide variety of data from diverse systems. Each component of GWML2 is served online from at least one data source, and many components are served by several sources (Brodaric 2016), enabling GWML2 to meet the following criteria:

  • Deployability: implementation with existing open standards and technologies was relatively uncomplicated across a broad range of data systems, and the design was shown to be compatible with diverse software environments.

  • Completeness: GWML2 was shown to cover the wide variety of key data types available in the data sources with minimal information loss. Moreover, successful encoding of each GWML2 component demonstrated conformance with the intended design.

  • Usability: the information delivered by GWML2 satisfied the primary needs of the usage scenarios, as demonstrated by the successful delivery of all key entities from disparate databases. While efficiency was not measured quantitatively, in all cases the implementations returned results in times acceptable to human users, as indicated in ad-hoc reports.

Overall usability, including satisfaction of several usage scenarios, is further demonstrated by the consumption of GWML2 data by various web applications. Figure 4 illustrates satisfaction of the commercial scenario as GWML2 water well information is visualized from the Canadian (Fig. 4a) and US (Fig. 4b) groundwater data networks (Brodaric et al. 2016) via the Canadian web portal. This visualization also demonstrates satisfaction of an important aspect of the technological scenario, namely the conversion of GWML1 to GWML2. Satisfaction of the environmental scenario is demonstrated in Fig. 5, which shows GWML2 groundwater chemistry in support of groundwater dependent ecosystems displayed in the Australian VVG (Visualizing Victoria’s Groundwater) web portal, and originating from a data network of federal, state and academic groundwater data providers (Dahlhaus et al. 2016). Figure 6 illustrates development of a groundwater flow model in MODDFLOW using GWML2 input data from a management area in New Zealand. When considered in tandem with the 3D visualization of the management area and model in Fig. 7 (Kmoch 2017), it is apparent that the scientific and policy scenarios are satisfied, inasmuch as a flow model was successfully constructed from GWML2 data, and from this several hydrogeological characteristics were attributed to the management area. These uses of GWML2 also highlight a significant additional benefit, namely that it facilitates the development of sharable hydrogeological software that can be built over a single normative data format.

Fig. 4
figure 4

a GWML2 water wells from the Canadian Groundwater Information Network (GIN), displayed in the GIN web portal (Brodaric et al. 2016). b GWML2 water wells from the US National Groundwater Monitoring Network (NGWMN), displayed in the GIN web portal (Brodaric et al. 2016)

Fig. 5
figure 5

GWML2 groundwater chemistry shown in the Australian VVG web portal (Dahlhaus et al. 2016)

Fig. 6
figure 6

USGS MODFLOW single layer volumetric grid generated from GWML2 input data for the Horowhenua district management area, New Zealand. White cells denote the hydrogeologically active area, and blue cells denote sites for input observations from water wells (from Kmoch 2017). Cells are volumes with variable elevation and depth, possessing many hydrogeological properties from GWML2 such as hydraulic conductivity. The total area shown is approx. 12 km × 16 km, with distance in metres indicated along each axis

Fig. 7
figure 7

3D visualization of GWML2 data for New Zealand management area, including related water wells from Fig. 6 (from Kmoch 2017)

Related work on groundwater data representation

Although data representations in hydrology are numerous, those focusing on groundwater are less common and can be categorized as (1) database schemas, (2) ontologies, and (3) data exchange formats. Groundwater database schemas provide a structure for organizing and managing groundwater information in data repositories, either for restricted scope (e.g. Chesnaux et al. 2011; Gogu et al. 2001; Steward and Bernard, 2006; Wojda et al. 2010; Oulidi et al. 2009), or for broad use (Kisters 2017; Nešetřil and Šembera 2014; Strassberg et al. 2007). Ontologies are formally arranged vocabularies used by data networks to help find and use distributed data, with a focus on hydrology in general (Atkinson and Dornblut 2012; Buttigieg et al. 2013; Raskin and Pan 2005), or groundwater in particular (Brodaric and Probst 2009; Tripathi and Babaie 2007; Yang et al. 2010). Groundwater data exchange formats are used to transfer data between a data supplier and consumer, and include the INSPIRE hydro/geology specification (INSPIRE 2013a), GWML1 (Boisvert and Brodaric 2012) and Hg2O (Wojda and Brouyère 2013). GWML2 clearly falls into this third category of data transfer format, but it is conceptually differentiated from representations in all categories by its adaptation of the water container pattern (from Boisvert and Brodaric 2012; see section ‘Results: the GWML2 design’), which distinguishes containers (e.g. aquifers), water bodies, voids (e.g. pore spaces), and their properties and relations, enabling distinct description of each, and thus a more precise expression of the data.

Summary and future directions

GWML2 is as an international standard for exchanging a wide variety of groundwater data, to meet increasing demands for such data in many scientific and societal endeavors. Testing has shown that GWML2 can be used successfully to deliver groundwater information from significant data providers to meet key usage scenarios. In this process, GWML2 has already impacted the design of groundwater databases maintained by these providers, and is being incorporated into their operational data delivery mechanisms. It is further informing the nascent development of ontology-based knowledge structures for the hydrological domain (Hahmann et al. 2016).

Next steps for GWML2 involve its possible submission to the World Meteorological Organization for endorsement as a global hydrological standard, in concert with its WaterML2 siblings, to complement its current standing as a global geospatial standard. Possible future technical directions include the development of additional GWML2 encodings in languages other than XML, as well as the enhancement of standard content. While GWML2 provides a rigorous standard structure for groundwater data, in many cases, the items that populate that structure, in the form of controlled vocabularies, are not constrained—for example, GWML2 captures porosity or yield values associated with an aquifer and provides a means for specifying the particular type of porosity or yield. It does not, however, enumerate a controlled vocabulary of acceptable types of porosity or yield, leaving it to the data provider to determine these. As a result, GWML2 documents from different data sources will be structurally identically, but much of the content is likely to be heterogeneous. In essence, GWML2 assures structural interoperability, but only partial semantic interoperability of content. Advancement of standard content vocabularies for GWML2 is a future challenge.

Other potential work involves a “lite” version, to lower the entry barrier to GWML2, as well as advancement of the scientific modeling usage scenario to further test the efficacy of GWML2 in supplying data to, and between, groundwater modeling tools. Augmentation of the technologic usage scenario is also a possibility, to determine alignment with important standards such as with the ArcHydro Groundwater standard database schema (Strassberg et al. 2007), and with emerging surface water schemas such as HY_Features (Atkinson and Dornblut 2012). It is anticipated that these and other future directions for GWML2 will continue to be coordinated by the OGC Groundwater Standards Working Group in collaboration with related activities in industry, academia, and government, to promote GWML2 adoption and further the overall aim of enabling groundwater data exchange and interoperability.