Similarity-based approaches in hydrogeology: proposal of a new concept for data-scarce groundwater resource characterization and prediction

A new concept is proposed for describing, analysing and predicting the dynamic behaviour of groundwater resources based on classification and similarity. The concept makes use of the ideas put forward by the “PUB” (predictions in ungauged basins) initiative in surface-water hydrology. One of the approaches developed in PUB uses the principle that similar catchments, exposed to similar weather conditions, will generate a similar discharge response at the catchment outlet. This way, models developed for well-observed catchments can be used to make predictions for ungauged catchments with similar properties (topography, land use, etc.). The concept proposed here applies the same idea to groundwater systems, with the goal to make predictions of the dynamic behaviour of groundwater in poorly observed systems using similarities to well-observed and understood systems. This paper gives an overview of the main ideas, the methodological background, the progress so far, and the challenges that the authors regard as most crucial for further development. One of the main goals of this article is thus to raise interest for this new concept within the groundwater community. There are a multitude of highly interesting aspects to investigate, and a community effort, as with PUB, is required. A second goal is to foster and exchange ideas between the groundwater and surface water research communities who, while often working on similar problems, have often missed the opportunity to learn from each other.


Background and objectives
The central objective of this paper is to introduce the main ideas of a new concept that can be used to describe, analyse and predict the dynamic behaviour of groundwater resources based on time series classification, under data-scarce conditions. The idea for this concept was driven by three considerations: 1. Numerical (here meaning physics-based) models of groundwater flow provide all the information needed to assess the dynamic behaviour of groundwater resources, namely changes in head and storage over time in response to changes in recharge. The physics is well understood and numerical approaches have proven effective in a huge number of cases. However, numerical models are datahungry and often fail to provide reliable and meaningful results where data for parameterization, calibration and verification/validation are scarce and unevenly distributed (Refsgaard et al. 2010;Zektser and Dzyuba 2014;Zhou and Li 2011). Conceptual hydrological models such as HBV (Bergstrom and Forsman 1973) are sometimes used to describe groundwater dynamics under conditions where data are scarce but have the disadvantage that they largely neglect the heterogeneous, complex threedimensional (3D) setup of the subsurface. Thus, satisfactory methods to describe groundwater dynamics in data-scarce areas are often difficult to establish. 2. Surface-water hydrologists have proposed, developed and successfully applied concepts that are specifically targeted at tackling data-scarcity using the concept of "catchment classification and similarity". This concept is based on the hypothesis that similar systems (catchments) will respond similarly (river discharge) to similar inputs (precipitation/ weather). The foundations of the catchment classification and similarity approach were developed within the framework of the PUB (predictions in ungauged basins) initiative associated with the International Association of Hydrological Sciences (Hrachowitz et al. 2013;IAHS 2021). PUB did not include groundwater particularly, nor have groundwater scientists engaged much with PUB (Barthel 2014b). 3. Time series analysis and time series-based models are generally less frequently used in groundwater hydrology than in surface-water hydrology (Bakker 2019;Bakker and Schaars 2019). In climate change research, the groundwater community has a very strong focus on physically based numerical models, where time series are used for calibration, but not so much of one on gaining an understanding of system responses. However, time series records of groundwater levels contain a large amount of valuable information that cannot be fully used if times series are only used for calibration of numerical models.
Based on these three considerations, the authors first argue that there is a need to develop alternative and complimentary approaches to deal with groundwater dynamics that can cope with data-scarcity but still take geological complexity into account. Second, the authors are looking for tools to extract the maximum available information from groundwater hydrographs. Finally, the authors see an opportunity to bring the surface water community and groundwater community closer together by fostering the exchange of similar concepts and ideas (see Barthel (2014b) and Staudinger et al. (2019) for detailed discussions on this matter).
To summarize, the authors ask whether classification and similarity-based approaches, as developed in surface-water hydrology ("PUB"), can be adopted for use in groundwater hydrology. To answer this, the following hypothesis was tested: do similar groundwater systems (aquifers) show a similar response (groundwater levels) to similar inputs (groundwater recharge)? Testing this hypothesis requires answering three fundamental questions: 1. How can similarity between groundwater hydrographs be determined and quantified? 2. How can similarity between groundwater systems be determined and quantified?
3. Can the dynamic behaviour of groundwater systems be systematically linked to groundwater system properties?
This paper summarizes the authors' achievements in answering the questions listed in the preceding. The summary avoids details and focusses on the overarching ideas, principles and challenges. Readers interested in more details are referred to Nygren et al. (2020), Giese et al. (2020), Haaf et al. (2020), Haaf and Barthel (2018), as well as the PhD dissertations of Ezra Haaf (Haaf 2020) and Benedikt Heudorfer (Heudorfer 2019).
Similarity-based approaches in groundwater: state of the art The main issues in regional scale (domains of 10 3 -10 5 km 2 , see Barthel (2014a) for a discussion of the term regional scale) groundwater resource assessment are scarcity and uneven distribution of data, large variability and heterogeneity of natural conditions, and uncertainty of structures and processes (Candela et al. 2013;Refsgaard et al. 2010;Zhou and Li 2011). One approach to deal with large, complex, heterogeneous systems widely applied in other disciplines is classification. Classification is used to bring order into unstructured and incomplete information, to reveal patterns and similarities, to better understand dominating factors and processes on the respective scales, to separate factors of influence, and to reveal dependencies, as well as to fill gaps in space, time and observed parameters (Cormack 1971;Sokal 1974). In the field of surface hydrology, Lyon and Troch (2010), Woods (2004), andWagener et al. (2007), among others, have proposed catchment classification as a new concept. Fundamentally, their concept of classification is based on hydrological similarity and forms an essential element of the predictions in ungauged basins (PUB) concept (Sivapalan et al. 2003); however, PUB was mainly concerned with, and thus limited to, surface hydrology.
While the catchment classification approach is now established in surface hydrology (see e.g. HESS special issue, Castellarin et al. 2012), subsurface hydrology (hydrogeology, groundwater hydrology) has not yet embraced the concept. This is surprising, since classification as such is very common in (hydro-)geology. Typical hydrogeological classification schemes, however, are usually made for specific regional conditions and principally deal with static properties and conditions (see e.g., Anderson 1989;Dahl et al. 2007;Güler and Thyne 2004;Klingbeil et al. 1999;Winter 2001). Similarity-based concepts are very often applied implicitly: groundwater models, for instance, use zonation concepts to aggregate areas of assumed similar response (Carrera et al. 2005) and groundwater vulnerability mapping approaches (e.g., Gogu and Dassargues 2000;Vrba and Zaporozec 1994) are based on the classification of hydrogeological properties to estimate vulnerability to pollution.
So far, very few attempts have been made to investigate using the similarity of groundwater time series to group, classify or characterize groundwater systems. There are some exceptions, however. Schürch et al. (2010) classified Swiss groundwater time series by their annual maxima and shape of annual variation. Allen et al. (2010) classified mountain valley groundwater systems using several features including the seasonal reversals of recharge and discharge from snowmelt and rivers. Most recently, Rinderer et al. (2017) and (2019) used time series classification of groundwater hydrographs from a small headwater catchment in Switzerland and then used this to upscale and model groundwater dynamics from point to catchment scale.
The state of the art with respect to similarity and classification in groundwater hydrology can be summarized as follows: & Hydrogeological classification schemes are widespread, but usually at a very low level of formalization, mainly descriptive and only applicable in specific regional contexts. & With the exception of hydrochemistry, classification has generally received little attention in hydrogeology as a scientific concept. & Classification in groundwater hydrology is usually based on static properties and conditions, and rarely includes similarities in the dynamic responses of groundwater systems. Similarity analysis and classification of groundwater time series are hardly ever used.
The authors would like to conclude this section with a quotation by Cliff Voss, executive editor of Hydrogeology Journal, from the editorial to the special issue "The Future of Hydrogeology" (Voss 2005): "There is some hope by the Editor (albeit intuitive!) that certain geologic settings have typical characteristics regarding the spatial variability of their hydrogeologic properties so that some important characteristics of measurements in one place can be transferred to understanding of a similar geologic setting in another place." It is fair to assume that many hydrogeologists have had similar thoughts, yet it seems surprising that this way of thinking has not received much (if any) scientific attention, while surface hydrology scientists have dedicated an entire decade to it.

Locations and general description of the data
The examples presented in this article use data from two regions: one in central Europe (Southern Germany, Austria, France, Switzerland) and one in northern Europe (Sweden, Finland). Much of the data used from central Europe stems from the GLOWA Mauser and Prasch 2016) and Rivertwin (Barthel et al. 2008) projects. These projects, and the problems encountered when developing the large-scale numerical models therein (for explanations see Barthel and Banzhaf 2016), also formed the background and motivation to develop the concept presented here. The raw datasets comprise groundwater level time series from around 5,500 groundwater observation wells (~4,000 from central Europe). Supporting datasets, rather important in this context, include a large variety of geographic, socio-economic, geological, hydrological and climate data, and borehole information. Data were gathered from different agencies in different countries (see Acknowledgements section) and are thus very heterogeneous in terms of time series length and resolution, metadata, etc. An overview characterizing a large part of the dataset is provided in Römer et al. (2016) and some data are characterized in the studies mentioned in section 'Background and objectives'. Figure 1 shows the locations of the observations in southern Germany as well as the outline of the Fennoscandian study area.
With respect to the raw data available for the studies summarized here, it is important to note that most of the observation wells were: Two main issues are associated with the limitations listed in the preceding. First, there is a strong bias towards shallow alluvial aquifers in the selected observations. Deeper aquifers and hard rock aquifers are not well represented and there were too few to provide significant results. Second, many of the time series are not suitable for testing the hypothesis and method development, as many of the currently available approaches to compare and classify time series require them to have equal length, equal measurement interval, and be free of gaps and outliers. Observation wells with incomplete geological descriptions of well design can generally not be used to evaluate the dependency on groundwater dynamics on geological factors. It is also important to avoid mixing observations that have been impacted by anthropogenic sources with those that have not. It should also be mentioned that data preprocessing took a long time because of the typical "messy" character of groundwater data (data quality issues included missing metadata, inconsistent temporal resolution, gaps, changing influences of abstractions, etc.). It is also difficult and time-consuming to establish consistent, homogeneous datasets to characterize well locations and well properties. For a further discussion on outliers and errors in groundwater hydrographs, Peterson et al. (2017) is recommended.
It is important to note that, for a large number of observation wells, metadata and site-specific data on borehole and well design, as well as stratigraphy, are partially or entirely missing. Plots of a selection of time series used in the various studies can be found in the electronic supplementary material (ESM).

Anthropogenic influence
Anthropogenic influence on groundwater dynamics is a special concern in the context of the suggested concept. In the first stages of development of the approaches, it would be ideal if dynamics could be attributed to natural changes only. However, anthropogenic influences are likely to influence almost all observations in the dataset. For example, the Central European study area is densely populated, and groundwater observation wells were usually built relatively near to human activity-they were usually established to monitor the consequences of such activity. Natural causes of change were of lesser concern 50 or 100 years ago. Anthropogenic influence is manifested in different ways: (1) sudden, short-term changes, e.g. during a construction process, (2) slowly developing changes, e.g. due to changes of land-use, growing settlements, and (3) as a more or less continuous background signal, e.g. pumping-and combinations of the aforementioned.  Fig. 1 Detailed map with observation well locations of the study area in southern Germany based on the International Geological Map of Europe IGME5000 (Asch 2007). Points are the locations of 751 out of~5,000 observation wells in Central Europe used in the study described by Heudorfer et al. (2019). Note that Quaternary alluvial sediments, from which the majority of the observations were made, are often not explicitly distinguishable at this scale due to their small spatial extents. Quaternary alluvial sediments are typically located in narrow stretches alongside rivers (rom Heudorfer et al. 2019). For the Fennoscandian study area (Sweden and Finland, highlighted in grey on the country map), observation well locations are not shown. Country codes from ISO (2021) Changes, in this context, may include changes of the amplitude of fluctuations, changes of the frequency of fluctuations or changes of the shape of peaks. In addition, more outliers such as spikes, offsets, which can have technical or unknown causes, are typical for groundwater measurements (Peterson et al. 2017). The ESM includes a selection of groundwater hydrographs that the authors have characterized as being "irregular" to illustrate what irregular may mean in this context.
Anthropogenic influences were dealt with in a pragmatic way: some of the hydrographs exhibit strong and obvious anthropogenic influences (i.e. obvious upon visual inspection). Those, as well as hydrographs with outliers, offsets and peculiar-looking sections were removed from the working dataset.
Whilst anthropogenic influence is likely to be present, even in most of the remaining time series, its potential relevance must not be overstated. For the international reader, it is important to note that both study areas are very water rich. Only about 1-3% of effective precipitation is used for human consumption in southern Germany (Nickel et al. 2005), so overpumping and depletion are rare. Groundwater extraction, therefore, usually has only very limited spatial impact.

Preprocessing and working dataset
The working dataset used in the studies described here contains about 900 time series from southern Germany and 264 from Sweden and Finland; different studies used subsets of the larger dataset. In the various studies this paper is based on, different preprocessing such as gap-filling, de-trending, normalization, homogenization and adjustments of intervals between data points have been applied. This preprocessing is not detailed in this overview, so readers are referred to the underlying studies listed in section 'Introduction'. It should be noted that preprocessing, in particular the handling of irregular data, is a crucial and tedious task that needs a good deal of attention. Special attention should be given to visual inspection of the data (see the ESM as well).

Examples of groundwater time series
In the following, a selection of time series is presented to demonstrate that: It is essential to demonstrate these points as they are a fundamental requirement for the proposed concept to work. Figure 2 shows a selection of nine time series, each representing one of nine "groups" that were defined by visual classification (=grouping, see section 'Visual classification' and the ESM). The examples suggest that differences in dynamics are mainly manifested through frequency and magnitude of fluctuations, but also through shape factors such as symmetry of peaks, upper or lower bounds, etc. Figure 3 shows three examples from two different groups (groups 1 and 5). For group 1, examples are also shown from two different subgroups (1.3 and 1.5). To illustrate that similarity is not just the result of spatial proximity, examples were taken from the Swedish and German dataset. Please note that Swedish time series are usually measured bi-weekly, the German ones daily, leading to a slightly noisier appearance of the German ones. These examples raise two issues: (1) what is the appropriate temporal resolution to look at, and (2) how should potential noise be defined and treated?

Methods
To address the questions listed at the end of the introduction, approaches are needed to: 1. Detect similarity between groundwater time series and to group/classify them 2. To characterize and classify groundwater systems 3. To determine the dependency between groundwater dynamics and groundwater systems properties This section provides a brief overview of methods used in relation to these tasks-for details, please refer to the individual studies Nygren et al. (2020), Giese et al. (2020), Haaf et al. (2020), Haaf and Barthel (2018), as well as the PhD dissertations of Ezra Haaf (Haaf 2020) and Benedikt Heudorfer (Heudorfer 2019).

Time series similarity
Numerical approaches for detecting similarities, grouping, characterizing and classifying time series are very common in many fields of science (Lhermitte et al. 2011). In hydrology, the analysis of hydrographs similarity has received a good deal of attention (see section 'Similarity-based approaches in groundwater: state of the art'). For groundwater hydrograph similarity detection, only a few studies exist (Allen et al. 2010;Moon et al. 2004;Rinderer et al. 2017;Triki et al. 2014;Winter et al. 2000).
Within the context of the presented concept, five main methods for similarity-based classification of a set of groundwater hydrographs have been explored. The first is pairwise comparison of original time series by calculation of pairwise distance measures (e.g. Euclidean, or dynamic time warping). The second is transformation of time series before distance . The third involves calculating indices for each time series that express the variations in hydrographs such as the recession coefficient, the amplitude and autocorrelation, with subsequent similarity estimation carried out by calculating the distance between the index values of objects. These three methods all use the similarity between time series compressed into pairwise numerical distance measures. The distance matrix is then fed into a clustering algorithm that groups objects based on different rules such as minimum distance or minimum variance. A fourth strategy is to cluster the hydrographs using a method such as k-means, without calculating pairwise distances. The fifth option is hydrograph classification based on visual perception of similarity. To describe the methodological background of these methods is not possible within the scope and length limits of this article. Key publications providing the methodological background and references are Haaf and Barthel (2018) and Heudorfer et al. (2019) as well as the dissertation presented by Ezra Haaf (Haaf 2020).

Groundwater systems similarity
As explained in section 'Similarity-based approaches in groundwater: state of the art', there are methods to group groundwater systems of similar type but these tend to be subjective and usually only valid in a specific spatial context. Groups can be formed based on similarity of lithology, hydraulic properties, etc. Creating formal and transferable grouping approaches based on these straightforward principles is challenging for two main reasons: (1) partial information (for instance, a hydraulic conductivity value may be known from field tests for one formation/location but not for another), and (2) most descriptors are nonnumeric ("sandstone", "confined", etc.) and thus not suitable for a large number of standard approaches (involving distance metrics, clustering, etc.). A third challenge is the unclear definition of the system to be compared and grouped. While a catchment can easily (at least in theory, see Condon et al. 2020) be delineated using topography, the boundaries of aquifers are often unknown, both in terms of location and nature. It is also unclear how the boundary conditions should be included in the classification of groundwater systems as the responses of those systems to changes are not only the result of the aquifer properties but also of the unsaturated zone properties, land surface features and processes in adjacent and connected surface-water bodies (e.g. Giese et al. 2020). Therefore, the classification of groundwater systems must include consideration of where the system boundaries are drawn and how the boundary processes are viewed.
Within the framework of the research carried out so far, two approaches for groundwater systems classification and similarity analysis were investigated. The first approach was based on using a set of descriptors to characterize groundwater systems at observation well locations and for entire groundwater systems (note: the term groundwater system is used here to avoid the term aquifer, as the system to be analysed includes more than just the aquifer itself). The descriptors chosen stem from a wide range of fields spanning aquifer properties (e.g. aquifer thickness, relative share of grain fractions), observation well properties (e.g. screen depth and length), properties of the unsaturated zone (e.g. thickness of unsaturated zone), land surface (topography features, e.g. mean slope of surrounding terrain, Topographic Wetness Index), and finally climate (e.g. Seasonality Index, average annual precipitation). Haaf et al. (2020) evaluated which of those descriptors significantly influence the dynamic behaviour of groundwater levels. Giese et al. (2020) applied the found descriptors to a selection of smaller case study areas. All descriptors defined in the framework of this research are described in Haaf et al. (2020).
The second approach was a combination of the first (descriptors) with expert-based groundwater systems classification. Expert based, in this context basically means that a groundwater expert, familiar with the hydrogeological situation in the study area, judges characteristics of the groundwater systems based on existing information; missing data and information is replaced by using (local) groundwater expert knowledge. In the context of the proposed approach, such an expert-based approach is a harsh compromise as it basically prevents the establishment of a generic, transferable approach to predicting groundwater system behaviour. Given the nature of groundwater systems, and the typical data availability associated with it, it is likely that such a compromise is always required, but it is definitely essential in the early stages of methodical development. An interesting discussion in this context is presented by Gleeson et al. (2020).

Dependency between groundwater systems and groundwater dynamics
Dependencies between groundwater systems and types, or individual features, of groundwater dynamics can be determined in different ways. Intuitively, one would like to be able to establish a scheme that allows the relating of one specific type of groundwater system to one specific response type. This, however, does not seem to be possible for a variety of reasons, one being that a consistent and unequivocal characterization of groundwater systems is hindered by data availability and uncertainty of data (see previous section). The second-best option is therefore to use selected characteristics of system properties and selected characteristics of time series (groundwater dynamics) to carry out dependency analysis. This includes the limitation that only parts of the variability of dynamics and system characteristics can be used and explained; models will necessarily remain ambiguous. Details of the approaches chosen in the framework of the research presented here are provided in . Finally, there is also the potential to apply semiquantitative approaches to dependency analysis, where formal systems of characterizing groundwater dynamics and groundwater systems properties are combined with expert knowledge (see previous section).

Visual classification
A visual classification (grouping) of time series was carried out on 1,069 time series (814 from the German dataset and 255 from the Swedish dataset). A hierarchical classification scheme was created, with "groups" on the top level (Fig. 2), which were subdivided into "subgroups", which were further subdivided into "types" at the lowest hierarchical level. Types were numbered with three digits X.Y.Z, where X stands for the group, X.Y indicates the subgroup, and X.Y.Z a type. Figure 4 shows six examples from group 2 (Fig. 2b) representing two of the six subgroups and six types. Group 2 contains time series with generally smooth appearances and a dominance of interannual fluctuations over intraannual fluctuations. The types in group 2 show variations of this general pattern, that is, they exhibit different degrees of smoothness, different number of peaks, greater or fewer interannual fluctuations, etc. The reader may not be immediately convinced as to why, for example, plot f is considered to be more similar to e (same subgroup) than to d (other subgroup)-this lies in the nature of the visual approach and shows some of its drawbacks. Despite the disadvantages, which are significant, the authors consider visual classification an indispensable and valuable tool, in agreement with other studies (Ehret and Zehe 2011;Seibert et al. 2016). However, the main value of visual classification is in the early stages of method development and to explore the particularities of new data sets. Visual classification is not an appropriate tool for making predictions or any sort of quantitative reproducible analysis.

Direct comparison of original time series
Hydrograph classification based on direct comparison of entire time series was carried out on 512 time seriesprincipally  1988 1990 1992 1994 1996 1988 1990 1992 1994 1996 1988 1990 1992 1994 1996 1988 1990 1992 1994 1996 1988 1990 1992 1994 1996  with the aim of finding the optimum combination of distance metrics/clustering approach (Haaf and Barthel 2018). Visualizing and explaining the results of this effort in a condensed way is hardly possible due to the complexity of the underlying approach. Therefore, the reader is referred to Haaf and Barthel (2018) for details; here only a brief summary is presented. The similarity measure that ranked best, and is consistently coupled with different clustering algorithms, was based on the discrete wavelet transform (dwt). Relatively similar results were achieved using Euclidean (eucl) and Nash-Sutcliffe efficiency-based ( Disadvantages are essentially the same as those mentioned in section 'Direct comparison of original time series', but also include: & Certain indices are rather sensitive to noise and irregularities. & The algorithms to calculate certain indices contain parameters that need to be adjusted for a specific dataset and are thus not immediately transferable.

Groundwater systems similarity
Being able to group groundwater systems into groundwater system types (analogous to catchment types used in the surface hydrology concept of catchment classification and hydrological similarity (Wagener et al. 2007) was identified as one of the crucial steps in achieving a "groundwater PUB", i.e. using information from extensively studied groundwater systems to make predictions about locations with less data. However, as already described in section 'Methods', developing systematic, formalized (quantitative) and meaningful approaches to groundwater systems classification that could be regarded as an equivalent to catchment classification in PUB (Castellarin et al. 2012) is challenging. The authors have not yet found an approach that fully meets the overall goals. The results of the two approaches described in the methods section (previous section 'Groundwater systems similarity') can best be assessed by looking at the results of the dependency analysis presented in the next section and the figures therein.
While not being able to provide clear results at this point, it was still decided to describe this in a short section of its own, as the authors are convinced that investigating the topic of groundwater system classification and similarity analysis further is something that might, potentially, have great benefits

Description
Regularity of the seasonality with regard to timing, i.e. persistence of timing of annual minima and maxima.
Dominance of inter-annual periodicity.
Frequency of change ocurring in groundwater head.
Tendency of groundwater head toward an upper or lower bound.
Dispersion over range of groundwater heads.
Existence of modes, where groundwater head more frequently lingers, e.g. due to seasonal plateaus.
Homogeneity of shape of peaks on different scales.
Rise or fall of slope, i.e. shape of peak. In contrast to flashiness, slope refers to the rate of response while flashiness is the frequency of peaks occurring. for hydrogeology, with many different fields of application (see also, e.g. Gleeson et al. 2020;Voss 2005).

Dependency between groundwater systems characteristics and groundwater hydrograph characteristics
Here the results of two selected approaches are presented briefly. The first approach is called semiquantitative, and is based on both visual inspection and expert-based groundwater assessment, two approaches which include both quantitative, observed data and qualitative assessments. The second approach is quantitative in the sense that it is completely based on numerical values of indices derived from statistical analysis of time series and numerical descriptors of groundwater systems.

Semiquantitative dependency analysis
In this semiquantitative dependency analysis, "expert-based" groundwater systems classifications were compared to the results of visual classification of time series. The authors chose a comparison with the visual classification here, foremost because this provides a more intuitive perspective on the meaning of (perceived) groundwater hydrograph similarity and geological features. As pointed out before, visual classification does otherwise not provide an avenue to predictions and quantitative analysis. The goal of this comparison was to discover whether the groupings found through visual inspection could be explained by the hydrogeological conditions at the observation well locations. Figure 6 shows example results of this comparison between groundwater systems defined semiquantitatively based on a combination of descriptors and expert knowledge, and times series groupings based on visual classification carried out on a smaller subset (n = 101) of the German dataset. Figure 6 shows that there is a strong, but ambiguous, relationship between hydrogeological properties and groundwater dynamics as determined from visual classification. A few things to highlight are: & A quite clear relationship appears to exist between the two "deep aquifer" groundwater system types, where the deep unconfined locations mostly have a dynamic type of group 2, while the deep confined locations reflect a behaviour associated with group 1 (compare with Fig. 2). & In the groundwater system type "shallow unconfined (2)", a clear separation into two types of dynamics (two groups: 7 and 9) can be observed. This implies that there is a crucial hydrogeological difference that distinguishes the two which was not taken into account in the expertbased grouping. The aquifers in this groundwater system type are shallow peat aquifers, not all of which are artificially drained (ditches), creating very distinct types of dynamics. & Groundwater dynamics for limestone aquifers seem not to show specific characteristics-limestone alone is not a well-defined enough type of groundwater system. Unfortunately, there are too few limestone records in the dataset to allow subdivisions into deep-shallow, confinedunconfined, and fractured-karstic.

Formal quantitative dependency analysis
This approach makes use of indices, derived from time series using statistical analysis, and numerical descriptors of Groundwater system type Fig. 6 Relationship between eight qualitatively determined "groundwater system types" and the results of the visual classification of 101 observation wells located within each of the classes. The length of the blue bars indicates the proportion of wells within a certain groundwater system type that were grouped into one of nine "groups" (determined through visual classification) groundwater systems, derived primarily from metadata and GIS analysis of spatial features. For characterizing single relationships between groundwater dynamics indices and system characteristics, correlation analysis was carried out. The full procedure is detailed in Haaf et al. (2020) and Haaf (2020). Figure 7 shows the relationship between the depth to groundwater level and distance to stream associated with two dynamics indices: base flow stability (BFS) and coefficient of variance of fall rate (fall.cv). Base flow stability expresses interannual variability and does not have the same conceptual interpretation as in hydrology. Rather, base flow here can be understood as the slow component of the hydrograph with BFS being the difference between maximum and minimum annual "baseflow" or base groundwater level. Results are additionally grouped according to hydrogeological settings G1, 2, 3, where:

Group
& G1 contains wells screened in deep unconfined sand and gravel aquifers with a depth to groundwater level between 25 and 40 m. & G2 contains wells screened in deep confined gravel aquifers. The depth to groundwater level is 30-50 m. & G3 comprises wells located in shallow unconfined aquifers. Figure 7 shows some example pairs of indices and descriptors which show both strong, significant correlations with a clear relationship to hydrogeological settings as well as rather weak, diffuse dependencies. These results indicate both the potential value of the approach and the necessity for improvement.

Applications
The results presented so far were mainly created in the context of developing and exploring the fundamental methodological steps involved in the concept (time series and groundwater systems characterization and dependency analysis). In addition to this, several more application-focussed studies were carried out to explore the validity of the concept in a concrete hydrogeological context. Some examples of these studies are: & Using index-based time series characterization and clustering for drought analysis (Heudorfer 2019). & Case study-based evaluation of index-based time series characterization and clustering to describe groundwater flows in different shallow aquifer settings . & Identification of anthropogenic influence on groundwater hydrographs (unpublished).
Here some of the results from the last (unpublished) example are briefly shown, where index-based groundwater time series dynamics characterization was used to distinguish between natural variability (dynamics) and changes in dynamics due to anthropogenic impacts. The underlying idea was to analyse whether and how the nature of groundwater dynamics, as expressed by various indices, changes over time. In the chosen example, a reservoir with a constant water level was R= -0.43,p=0.018 R=0.78,p=3e−07 R= -0.19,p=0.32 R=0.47,p=0.0096 Depth to GW−level  Fig. 7 Correlation between two selected index values (BFS and fall.cv) and selected physical groundwater system descriptors (distance of observation well to stream, dist_stream and depth to groundwater (depth_to_GW). R is the Pearson correlation coefficient built in a river valley. A large number of observation wells were created in the adjacent alluvial aquifer. Observations close to the reservoir showed extreme changes in dynamics after dam construction, with the influence of that construction lessening with distance (Fig. 8). Figure 9 shows the distributions of six selected indices determined for 57 time series near the dam, split between before and after periods. The analysis used observations from nearby and far from the dam, and thus indices have a considerable spread. Despite this, the changes before and after dam construction are significant.
In this specific example, the time and location of the human interference are known exactly. The potential of the method, of course, does not lie so much in the analysis of previously known anthropogenic impacts but in the detection and separation of anthropogenic impacts from anthropogenic causes in cases where it is unclear whether there has actually been human interference. The same line of thinking can, of course, also be applied to separating other influences on groundwater dynamics. Heudorfer et al. (2019) calculated indices of groundwater dynamics in moving windows to identify and analyse the impact of drought. The authors see great potential in this type of application, both in terms of gaining better understanding of how groundwater systems work and in analysing and predicting the behaviour of groundwater systems where there is defined natural or anthropogenic change.

Discussion and conclusions
The purpose of this paper was to propose and introduce a new concept to describe groundwater systems, primarily by making use of differences and similarities in groundwater dynamics. It is proposed that different types of dynamics, exhibited through variability of groundwater head time series, can be related to different features (controls, descriptors, properties) of groundwater systems. Those relationships can be used to learn more about groundwater systems' responses to change. With this knowledge it could be possible to make predictions of groundwater systems responses in locations where no or only few observations are available, using the idea that similar systems will respond similarly to similar inputs.
The paper summarizes the work done so far with respect to establishing some of the major steps towards developing a concept of "groundwater systems classification", namely: approaches to determine similarity of groundwater hydrographs, ways to characterize and explain the differences in groundwater dynamics, and approaches to relate dynamics to system characteristics. The paper also presented examples to show how the concepts may be used in practice.

Promising results and potential benefits
One of the considerations that led to the idea to propose this new concept was that physics-based numerical models of groundwater flow, as well as conceptual hydrological models, both fail to provide accurate and reliable results at a regional scale where data are scarce and unevenly distributed. Based on the results so far, it cannot be claimed that the similarity and classification concept provides an alternative, standalone approach, capable of delivering equally good or better results than other modelling approaches. However, the authors are confident that the similarity and classification concept in groundwater is useful, even if its prospects as an alternative, stand-alone modelling concept are limited. The benefits of using the similarity and classification concept in groundwater are: 1. Groundwater research can benefit greatly from standardized approaches to groundwater systems classification. Such classifications (or typologies) have been used in many different ways, mostly with a very low degree of formalization and limited to certain regional conditions or applications. To summarize those attempts into a unifying concept would be a great step forward, as it would help to transfer local knowledge on the behaviour of groundwater systems to everywhere else. It is, therefore, of outstanding importance that the classification of groundwater system properties is coupled to dynamic behaviour. This is a challenging endeavour that can only be tackled through a community effort, making use of a wide variety of hydrogeological settings and climatic conditions. 2. Methods to classify groundwater hydrographs can be of great help in understanding, comparing and analysing groundwater system behaviour. Being able to quantitatively assess the dynamic characteristics of a groundwater hydrograph can, for example, be used to detect systematic changes in aquifer behaviour and attribute these to anthropogenic or climatic changes. It can also be used to analyse the differences between different events and periods. 3. The greatest benefits of groundwater similarity and classification approaches may emerge in combination with either physics-based numerical models and/or conceptual hydrological models. Numerical models of regional scale systems can often be successfully calibrated against observed groundwater time series. However, the predictive power and validity remains low for regions which are at some distance from observation wells or aquifer systems, and less frequently used, and thus less often observed (Barthel and Banzhaf 2016;Barthel et al. 2008).
Similarity-based predictions can be used for independent cross-validation in such situations. For hydrological models, in particular lumped ones with no specific conceptualization of the groundwater system, similaritybased approaches can be used to regionalize the lumped results.

Main challenges, open questions and suggestions for further research
A lot can be learned from the catchment classification and similarity concepts established in surface hydrology, but it has become apparent that a direct transfer of those concepts to groundwater systems may not be possible. The two main reasons for this are: (1) aquifers are systems that are essentially different from catchments, and groundwater hydrographs are essentially different from discharge hydrographs (Barthel 2014b;Staudinger et al. 2019), and (2) data for groundwater system properties are far more difficult to obtain, with groundwater observations being far less standardized than surface water observations. Therefore, even if data are available, the effort to make it suitable for a structured analysis is usually excessive and consistent descriptions of groundwater systems that make use of a wider variety of system properties are rarely available. That having been said, one of the biggest remaining challenges is defining and delineating the appropriate "target system" (groundwater system, hydrogeological system, aquifer system or aquifer). The fairly straightforward techniques of catchment delineation are not applicable to groundwater systems. It remains unclear how processes at system boundaries and properties of adjacent systems can, and have, to be treated. As an example, groundwater system behaviour is strongly influenced by the processes and properties in the unsaturated zone. It is largely unclear as to whether the unsaturated zone needs to be incorporated in the classification approach or whether one can use groundwater recharge as a boundary condition.
Another challenge, of lesser significance, is finding an optimal way to classify groundwater time series and to quantify their degree of similarity. The tested approaches all have their disadvantages, none leads to unambiguous results and there is no way to determine which approaches perform better or best. Finding the optimal technique is (as always) not possible in a purpose-independent way. Different schemes will have different advantages in different applications. To date, too few studies have been carried out in too few places to be able to say which techniques perform best for which purpose.
With the challenges of groundwater systems classification remaining and some difficulties in determining the optimal way to classify groundwater time series still to be resolved, it is not yet possible to establish an optimal methodology to determine the dependencies between groundwater system properties and dynamic behaviour, nor to establish rules that would allow development of a model based on these dependencies.
In terms of future research, these are some suggestions: & It would be highly beneficial to apply approaches to characterization of groundwater dynamics, for example the index approach as presented in ) to a wider range of different hydrogeological, climatic and geographic contexts. The tools required for this are available on request from the authors. & A promising path for future research is cross-validation of the approach with other models, namely numerical models. This could be achieved by comparing the results of existing models with classifications of the time series data used for calibration of those models, or by setting up numerical models to try to reproduce the dynamic behaviour of typical hydrogeological settings. Studies of this nature have been carried out (e.g. Hellwig et al. 2020;Stoll et al. 2011) albeit with a different scope. & Establishing consistent, systematic approaches to groundwater system characterization has proven to be one of the challenges that needs to be overcome to realise the proposed concept successfully. However, consistent, systematic approaches to groundwater system characterization can be of great value even beyond the context classification and similarity approaches, in terms of resource and vulnerability assessment, and management of ground resources. Many hydrogeological classification systems have been developed, often at a local level, and often dedicated to a specific purpose (e.g. hydrochemical characterization). They almost never involve groundwater dynamics. The classification and similarity approach suggested here provides a great toolbox for unifying the currently separate characterization and classification schemes into a widely applicable groundwater systems typology.
The concepts presented here are derived from similar ideas that have been explored widely in surface hydrology within the PUB initiative. In surface hydrology, hundreds of scientists all over the world collaborated in the IAHS decade 2003-2012 to establish, apply and validate this concept, while the groundwater community have not engaged with it. The authors cannot yet present ready-to-use models or full-scale applications to real-world problems. The authors' focus, so far, has been on developing approaches to detect and quantify similarity in groundwater hydrographs. While this is a main element of a groundwater PUB, other important elements are still missing, namely consistent formal approaches to groundwater systems classification. The main objective of this article was to raise interest for the concept within the groundwater community and maybe to start a similar initiative.