Editor’s message: Groundwater modeling fantasies—part 2, down to earth
- First Online:
- Cite this article as:
- Voss, C.I. Hydrogeol J (2011) 19: 1455. doi:10.1007/s10040-011-0790-6
Message de l’Editeur: Modelé souterrain imprévisible de l’eau—2ème partie, considérations réalistes
Mensaje del editor: Fantasías de la modelación del agua subterránea—parte 2, con los pies sobre la tierra
Mensagem do Editor: Fantasias da modelação de águas subterrâneas—parte 2, com os pés na terra
Simplicity is the final achievement. After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art. (Frédéric Chopin, a musician and composer, quoted in If Not God, Then What? by Fost 2007)
Despite the dubious developments discussed in part 1 of this Editor’s Message (Voss 2011), groundwater modeling really does represent the state of the art in hydrogeology, and groundwater modeling is in fact one of our most powerful tools for enhancing hydrogeologic understanding and for informing management of subsurface resources, at least when in the hands of competent hydrologists.
Automatic estimation foibles and simplification
Creating a model structure and determining values of model parameters is not a science, but rather an intuitive exercise, hopefully carried out with some wisdom concerning how natural systems function. Whether warranted or not, whether useful or not, parameter estimation has become a major part of model creation and this evolution has been fueled by the recent wide availability of automatic estimation software. In some sense, this wide availability has promulgated greater fallacious use of groundwater models. Automatic estimation software is truly a wonderful convenience when used properly, but it is no more than a convenience—and it should not be the primary objective of a modeling analysis to use it.
The most-common estimation technique relies on the rather arbitrary assumption that a minimized least-squares objective function fit is the best one. There are other equally valid objective functions and estimation approaches, rarely used today, and these would give different parameter estimates in the same model. Error structure is another assumption rarely tested when fitting models to data. Should we be minimizing the error that is the square of absolute differences between model predictions and observations or perhaps the square of the differences in logarithm of model predictions and observations, or perhaps differences of another function of model predictions and observations? These all assume different error distributions. The choice between the first two possibilities is most often expressed in automatic estimation software via weighting of observation values. The selection is usually made by a modeler who is oblivious to the issues involved. This selection makes a huge difference in the estimated values of model parameters, but is not often admitted to be a major ambiguity in results. The selection yields additional uncertainty in model predictions.
Deciding on the simplicity or complexity of a model when doing automatic calibration, here meaning determining the number of parameters that will be estimated, by turning a knob on a mathematical objective function, is one of many totally arbitrary possible approaches to simplification—and is not necessarily better or worse than other mathematical or scientific-judgment/human-intuition-based approaches. An objective function is merely one artificial construct to express what is desired from the model fit. There is no correct objective and no correct approach to regularization, and various such objectives will give different model complexities and different parameter estimates. Whether some are better than others is a matter of discussion, never to be fully resolved.
In the experience of this writer, if there are more than perhaps ten parameters in a groundwater model of an actual area, some of the parameters become highly correlated and their individual values cannot be distinctly determined by automatic estimation. We might have a good discussion about how many parameters are appropriate to estimate for your current model.
How to improve use of models
Complex modeling typically takes a prescribed type of detail work, a long time, and extreme amounts of computational effort. Simple modeling takes expertise in hydrogeology and numerical modeling, a lot of interesting thinking, and often, less time and less computational effort. Over-fitting available data occurs when the modeler considers every deviation of model simulation results from data values to be indicative of the need to add more parameters to estimate. These modelers are only satisfied when the model fits all measurements perfectly well.
Rather, the modeler needs to accept that (1) the subsurface cannot be represented in detail, (2) that a model is going to be a deficient representation of hydrogeologic reality, and (3) that the model will be a mathematical representation merely of the modeler’s own concept of how the subsurface domain of interest functions. Given this view, the modeler will use the model more effectively as a tool to help develop better understanding of the physical behavior occurring. The modeler will elucidate alternative hydrogeologic representations that equally well reproduce the data, and will define places and times and types of measurements that could be made to better select models from among the set of elucidated candidates. Finally, the modeler will give a range of predicted behaviors of pertinence to questions being asked, for example by water managers, based on the full set of alternative model representations and ranges of possible parameter values for each. Perhaps, the true representation of the subsurface system will not be included within the set of alternative models considered—but we cannot do any better than to use our hydrogeologic insight to develop what we believe is a full set that includes the true situation, and then admit that we are merely making our best hydrogeologically informed guess.
The initial objective of modeling should be to represent the system in question in the simplest way possible that captures the most important overall behavior. This challenges the modeler to create the most-effective simple representation (with perhaps, at most, two to four parameters), rather than to match details of as many data points as possible. The parameterization, zonation, model features (e.g. boundary conditions) and structure should be as uncomplicated as possible. Automatic inverse modeling might here be used as a state-of-the-art convenience to estimate parameter values and to evaluate correlation structure of the few parameters that are estimated. The data upon which such calibration is based might be detailed measurements, but could also be the modeler’s interpretation of regional or temporal averages of such data. The objective here is not to match anything precisely, but rather to learn how well the data informs estimation of the selected parameters’ values, and whether it is even possible to obtain independent estimates of the values.
Evaluation might include these questions: Which parameters control the important behaviors? Which parameters have correlated estimates and why? Is it impossible to estimate independent values for parameters that were initially deemed to be important controls—and can values of only combinations of these (e.g. products, ratios, sums) be independently estimated?
Should two parameter estimates turn out to be highly correlated in a candidate model representation, then one parameter must be eliminated by setting its value—or by attempting definition of alternative parameter sets in which parameter estimates are more independent. Yes, setting the value of a parameter that the modeler would prefer to estimate will add an arbitrary, at best subjective, component to the modeling process, but this step clearly impresses upon the modeler how much (rather how little) can be gleaned from available data and from existing knowledge of the subsurface system.
If the simplest model, so developed, cannot satisfactorily reproduce the data or aquifer response of interest sufficiently well, then either the model definition/structure should be changed fundamentally without adding more parameters, or the existing model definition should be made a bit more complex by adding as few as possible (one or two?) more parameters or features. This change or ‘complexification’ may be based on pure judgement or can be aided by quantitative approaches that try to find simpler patterns within large numbers of parameters, as provided by some inverse modeling tools. Complexification should be done with the full realization that the choice of how to increase complexity is a subjective process that depends on the intuition and experience of the analyst.
Eventually, one or more of these increasingly less-simple models will satisfactorily reproduce the data of interest sufficiently well. ‘How well?’ is a matter of ‘taste’, a matter of judgment. Various analysts might disagree on where to stop complexifying—and this is a most-valid discussion; indeed, it is essential and should not be avoided in hydrogeology because it will help to distinguish the most important aspects of behavior to be represented while highlighting uncertainties, perhaps resulting in a somehow better model representation. At the very least, the discussion will shed light on the depth of understanding of the system in question (or lack of it). The argument is not necessarily resolvable in general for all modeling; rather, each modeling case may require its own discussion and decisions. This discussion is a key part of an effective modeling process.
Given patience, interest and time, the analyst should develop several alternative relatively simple models (having different external factors, structure or parameter values) that all represent measured processes of interest fairly well. This set of alternative models may provide a range of predictions in answer to the questions that motivated the modeling study. The most optimistic expectation (assuming the modeler has been clever or lucky) is that this range of predictions will include the true answer and so management schemes should be designed to robustly function for the full range. Where the range includes predictions that make management solutions infeasible, either the stated objective for the analysis is not possible and must be abandoned, or the modeling analysis can be used to point out types of field data that would allow the set of models to be narrowed, perhaps resulting in no infeasible management situations.
Groundwater modeling has become a self-supporting industry of fantastical promises that cannot be kept and, in most cases, cannot ever be tested. The industry is selling individual models at costs far beyond their true value. Indeed model results have become so realistic and complex in appearance, it is difficult to tell a simulation result from a detailed remote-sensed image of the earth’s surface. The wonderful apparition of a color map, as the model result, containing uncountable details of the groundwater system being represented, is misleading. Who would dare cross a bridge designed with the same level of uncertainty as inherent in such a groundwater model analysis? Managers, who are often not modeling experts, have no means to judge such results, except to appreciate their apparent beauty and complexity. Modelers who create such results are trying to impress—but, in truth, are misleading their clients.
Due to all of the uncertainty inherent in groundwater modeling results, a single groundwater model of a subsurface system is a product of questionable usefulness. This means that the model should not be the product of a modeling study; rather, what is learned from the model development and modeling analysis is the appropriate result. Models of subsurface hydrology are never correct, and these are most often useful only in the hands of the analysts who developed them.
Managers need to be educated regarding what model analysis can and cannot provide. The model should generally not be what is contracted as a product, as is most often the case today; rather, an improvement of understanding of the system in question should be contracted, and particular advice sought from the analyst, who may or may not choose to employ groundwater modeling toward achieving this goal. Managers should buy advice from a competent hydrogeologist; they should not buy a groundwater model.
All beating of the drums for the value and need for complex groundwater models and for blindly fitting as many parameter values as possible must end—before our field of hydrogeological modeling is finally discredited. Modeling is not a science—it is a subjective exercise by a scientist, investigating phenomena that can never be completely measured, with highly limited or no discriminating data, in an attempt to explain as much as possible, on the basis of established physical-mathematical descriptions of groundwater flow and subsurface transport processes. Upon completion of a well-done modeling exercise, this scientist is the best person on earth to give others insight and advice into questions relating to the studied subsurface system. The model is merely one of the hydrologist’s tools, helping him and her to understand the subsurface. We must recognize that the modeler is far more important than the model.
Indiscriminate complexification in modeling is surely not sophistication. Effective simplification is. There is nothing wrong with seeking complex patterns or details—but ending with only one selected representation (no matter how much the modeler likes what was created) is false. It is impossible to describe a geologic fabric in detail. There are reproducible and definable patterns, but one or more forever-to-be-unknown structures that are important to system functioning should always be expected. Therefore, there is no point in relying on a single detailed model representation. Also reliance on a single probabilistic prediction resulting from an ensemble of models somehow generated statistically is not dependable, because the statistics of the geologic fabric and other hydrogeologic factors are never well known. A more trustworthy basis is a suite of simple models (perhaps including some statistical models), each of which has few parameters and approximately fits the important data, because this approach relies on the groundwater processes simulated by the model and on the wisdom of the modeler. This is preferable to the use of a many-parameter model that better fits the data, but has little need of the modeled groundwater physics to achieve the fit.
There are recent efforts underway to create combined macro-models that, for example, link existing separate process models for atmospheric energy balance and moisture models with surface-water models and groundwater models and ecosystem models and so on. Perhaps this adds a third category of possibly overly complex model types to the list: spatially complex models that attempt to represent too much geology, parametrically complex (highly parameterized) models that attempt too close fits with data, and this addition, models that attempt to represent too many physical processes at once. Here it is easy to fall into the trap of creating a macro-model that immediately cannot be understood, in which each sub-model has its own major ambiguities and non-unique representation of its own process. Such efforts require even greater care than groundwater modeling alone to provide meaningful results. Here too, the newly combined models should not be the product, rather what is learned from the combination should be.
All of the aforementioned comments underscore the need for scientific research in hydrogeology. Necessary research is being conducted at many locations regarding how water, energy and substances enter, exit, and migrate through heterogeneous subsurface geologic fabrics. Many potentially valuable approaches are being developed and tested, for example, a variety of geostatistical, geologic-process-mimicking and pattern-mimicking approaches. However, none of the approaches developed to date yet have great practical applicability and are still, indeed, research. These efforts should be continued because none have yet been developed that adequately (meaning, shown to be effective and practicable) deal with heterogeneity without the requirement of immensely detailed and intensive field measurements. Intensive and costly field campaigns cannot be the normal approach to water management, so our field is still floundering regarding how to best describe the subsurface for practical management purposes. Perhaps learning the statistics or patterns of various types of geology from focused intensive measurements in several well-defined type-areas or from geologic process models will help to make statistical approaches more useful in the future, avoiding the need for intensive field programs at each site.
In the view of this writer, the best way to go forward with practical management is to rise above groundwater models as final products, and instead, empower hydrologists to provide advice by using groundwater models in simple ways that are intended to elucidate understanding. Pursuit of complexity in groundwater models intended for practical management is a diversion from the real work at hand.
Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher. (It seems that perfection is reached not when there is nothing more to add, but when nothing more can be removed.) (Terre des Hommes [Land of People] by Antoine de Saint Exupéry, a writer, poet and aviator; Saint Exupéry 1939)
Thanks for helpful and critical reviews of part 1 and part 2 of this opinion piece are due to W. Alley and K. Belitz (US Geological Survey), J. Bredehoeft (The Hydrodynamics Group), S. Ge (University of Colorado), and P. Renard (Université de Neuchâtel).