Real-case benchmark for flow and tracer transport in the fractured rock

The paper is intended to define a benchmark problem related to groundwater flow and natural tracer transport using observations of discharge and isotopic tracers in fractured, crystalline rock. Three numerical simulators: Flow123d, OpenGeoSys, and PFLOTRAN are compared. The data utilized in the project were collected in a water-supply tunnel in granite of the Jizera Mountains, Bedřichov, Czech Republic. The problem configuration combines subdomains of different dimensions, 3D continuum for hard-rock blocks or matrix and 2D features for fractures or fault zones, together with realistic boundary conditions for tunnel-controlled drainage. Steady-state and transient flow and a pulse injection tracer transport problem are solved. The results confirm mostly consistent behavior of the codes. Both the codes Flow123d and OpenGeoSys with 3D–2D coupling implemented differ by several percent in most cases, which is appropriate to, e.g., effects of discrete unknown placing in the mesh. Some of the PFLOTRAN results differ more, which can be explained by effects of the dispersion tensor evaluation scheme and of the numerical diffusion. The phenomenon can get stronger with fracture/matrix coupling and with parameter magnitude contrasts. Although the study was not aimed on inverse solution, the models were fit to the measured data approximately, demonstrating the intended real-case relevance of the benchmark.


Introduction
The presented study is a part of the DECOVALEX project (www.decovalex.org, this thematic issue), a platform for inter-model and model/measurement comparisons, with a history back to 1992, focused on processes in the host rock related to the safety assessment of the geological spent nuclear fuel disposal. Water flow and solute transport in fractured rock are principal phenomena controlling repository safety. Estimating water inflow into the engineered barrier and migration of radionuclides after escape from the engineered barrier is critical to repository safety assessment and long-term performance. In this DECOVALEX task, groundwater discharge and environmental isotope transport were modeled in fractures intersecting the Bedřichov Tunnel of the Czech Republic.
Quality management is nowadays a standard tool for software production and development to ensure a high quality of a produced result. A numerical code dealing with the coupled THMC processes is highly complicated software product, since the different processes have different characteristic features, e.g., time and spatial scales, nonlinearities, and coupling strength. To keep the quality of the developed code high, benchmark testing is necessary; especially if scientists from different disciplinary and different organizations are working on the same code (Kolditz et al. 2012, concentrated on OpenGeoSys). The codes used in this paper participate in such procedures-Flow123d in Hudson and Jing (2012) and Zhao et al. (2013) and PFLOTRAN in Steefel et al. (2015).
Although numerical simulation codes undertake extensive testing and verification on standard problems with analytical solution, it is widely accepted that comparisons with more complex problems and practical application are necessary, especially with increasing complexity of code features and analyzed data. There are many examples of comparison studies in hydrogeological modeling, from over two decades old (Larsson 1992) to very recent (Maxwell et al. 2014;Steefel et al. 2015). One of the former DECOVALEX tasks modeled water inflow into excavations in the FEBEX experiment in Grimsel underground laboratory (Alonso et al. 2005). Eight teams participated in the first step of hydromechanical modeling of the rock, but the comparison was limited to the different conceptual models such as continuum versus discrete fracture modeling and did not focus on the numerical implementation differences.
Tunnel or borehole inflow, and tracer transport observations, is an efficient method for studying hydraulic and transport properties of rock on scales ranging from tens to hundreds of meters. Many of these studies have been conducted in underground laboratories and conventionalpurpose tunnels in mountainous regions. On the other hand, the models used for evaluation are typically simple, for example, analytical models of radial flow perpendicular to the tunnel axis or lumped-parameter models of ground water age distribution from natural tracers. An example of 3D hydraulic and thermal model of an Alpine tunnel is by Marechal et al. (1999), with the rock inhomogeneity composed of blocks of different hydraulic conductivity. Although the concept of coupling subdomains of different dimensions has been used in several simulation codes (Flow123d, OpenGeoSys, HydroGeoSphere, FEFLOW) in the last decade, the consequences of different numerical implementation have not been investigated.
The benchmark problem defined and solved in this paper captures one of the principal features of crystalline rock hydrogeology, namely the multiscale heterogeneity derived from the large differences in flow and transport properties between crystalline matrix blocks and the fracture network and fault zones. The problem considered here was defined so that all the participating codes could implement the model regardless of numerical scheme, allowing the effect of numerical scheme and discretization, instead of conceptualization to be investigated. The problem is based on measurement data at the Bedřichov water-supply tunnel in Bohemian granite massif (Czech Republic) (Klomínský and Woller 2010). Collected data include inflow rates and natural tracer concentrations. The benchmark formulation is a compromise between simplicity and real-world problem features, introducing the main features of the conceptual model of the site, while keeping the geometry simple to allow exact input to the simulation codes. The preceding work with the Bedřichov site data introduced a hydraulic problem solution of the tunnel inflow, based on the concept combining the hard-rock blocks and the planar fault zones in the model, using the mixed-dimensional capability of the used simulation code .
In this paper, the results from three groups with ties to their respective national nuclear fuel cycle authorities or scientific research institute involved in the national nuclear waste disposal program are compared. Each group simulated the defined problem with a code of their choice. The Technical University of Liberec (TUL, Czech Republic, contractor of the Radioactive Waste Repository Authority, SÚ RAO) served as the task coordinator and used the code Flow123d. The Federal Institute for Geosciences and Natural Resources (BGR, Germany) used the code Open-GeoSys. Sandia National Laboratory (SNL, USA) as a representative of the US Department of Energy used the code PFLOTRAN.

Problem and model description
Based on the former hydrogeological interpretation ), we consider a set of conceptualized geometries as numerical benchmarks, with partial motivation to interpret observed data but primarily to compare the models and to investigate the ability and confidence of such data interpretation in a generic sense. Benchmark problems were referenced to real-site conditions using qualitative and quantitative comparison to the actual data. Before the problem definition itself, we summarize main features and the data used to establish the reference model.

Site features
The tunnel is located in the Jizera Mountains, in the north of the Czech Republic ( Fig. 1 left). It is excavated in a portion of the Bohemian massif-the Krkonoše-Jizera Composite Massif (Ž ák et al. 2009;Klomínský and Woller 2010). The tunnel length is 2600 m, with an azimuth 67°. Positions are expressed by distance from the WSW end. The first 890 m of the tunnel was excavated with the tunnel boring machine (TBM) method and remaining part utilizing drill-and-blast methodology (Fig. 1, center and right). The diameter of the TBM part is 3.6 m, while the size of the D&B part is similar but irregular. There are irregularly distributed intervals of bare rock and shotcrete (both in the TBM and in the D&B sections). The altitude of the lower (WSW) end is 657 m, and the upper (ENE) end is 697 m giving a slope of approximately 1.5 %. The highest elevation above the tunnel is at 820 m above mean sea level.
Most inflow to the tunnel is observed where overburden is shallow-in the interval from 50 to 100 m (0.5-1 L/s) and in the interval 2200-2450 m (1.5 L/s). Where the overburden is thicker in the deeper part of the tunnel, there are fully dry intervals, short intervals with some leakage (but not freely flowing water), and several places (faults/ fractures) with medium to strong inflow (ones to tens mL/s); the total inflow into the deep part is below 0.5 L/s. All the inflow is collected in a canal built in the tunnel floor. The resulting hydrogeological conceptual model, discussed in several preceding works (e.g., Hokr et al. 2014), is illustrated in Fig. 2; it is composed of a shallow permeable zone (weathered rock), deeper hard rock crossed by several subvertical fractures or faults.
Here simplified geometries, derived from this conceptual model, are used for the benchmark model formulation and solution in this paper. These benchmark models are based on the coupling of three relevant subdomains: the shallow weathered granite zone, a single vertical ''fracture'' (or fault), and the compact granite block (notation is fully specified in ''Fracture/matrix meaning'' section)- Fig. 3. Permeability decrease with depth was simplified to two zones, the upper more permeable and the lower less permeable, representative of permeability distribution near the tunnel [as proved in the 3D model of Hokr et al. (2014)]. The anisotropy of flow (large-scale permeability) is a result of the geometric configuration of the model, with the 2D fracture domain in the model expressing the preferential direction explicitly. The derived set of benchmark problems covers different combinations of the three main features: deep versus shallow overburden, strong versus weak inflow, and a single fracture versus a broader fault zone, as listed in Table 1.

Measured data
To constrain model boundary conditions and parameters, we use several kinds of measured data: flow rates of individual fractures, inflow into the collecting canal, apparent water ages derived from natural tracers, and climatic and hydrologic data on the surface.  Single discharge measurements were collected in several places along the tunnel. We note that while a particular sampling location cannot be guaranteed to collect all water inflow from a given feature, the temporal changes of the flow rate should be representative of the true changes in discharge. The average value of discharge was estimated by its contribution to the total inflow in the collecting canal, using flow rate measurements in the collecting canal. The single inflow rates can be directly assigned to the discharges from the ''fracture'' domain in the model. Matrix domain discharge was evaluated indirectly by measurement of the increase in canal discharge along segments with no significant individual inflow features (Hokr et al. 2012). The inflow values are presented as a part of the model variants table-Table 1.
Most of the individual inflow locations were sampled and analyzed for natural tracer concentrations. Although the model problem is motivated by analysis of natural tracers to determine the water ''age'' and interpret the rock transport parameters, for these benchmark models we disregard the complex relationship of water age to tracer concentration observed and take the groundwater ages as given. The model is analyzed by means of synthetic ''fictitious tracer'' pulse, which is easier for comparison of solution between different solvers.
There are two kinds of tracer-based water age evaluation methods available: fitting the water molecule stable isotopes evolutions by a lumped-parameter model (dispersion in particular) and transforming the 3 H/ 3 He concentrations into the ''apparent'' age by the decay formula, which assumes the use of the piston-flow model. In both cases, we use data from unpublished analyses, within International Atomic Energy Agency (IAEA) (Š anda 2013) and SÚ RAO (Hokr 2014) projects. The stable isotopes, determined mean 3 9 10 -6 to 1.5 9 10 -5 (time dependent) 5 9 10 -8 5 9 10 -10 5 9 10 -8 Inflow fracture [m 3 /s] (flow rate-direct measurement) None 7 9 10 -6 to 1.4 9 10 -5 1 9 10 -5 average 2 9 10 -8 1.4 9 10 -5  Fig. 3 Model configurations (M1-M4) representing selected real positions in the tunnel and various types of water-permeable features (see Table 1).  Table 1) singlefracture inflow is estimated to 10 years using only observed 3 H concentration, with larger uncertainty. These ages are not considered to be fully accurate, only representative for the benchmark purposes of this paper. The recharge rate cannot be exactly determined. Here climatic and hydrologic measurements were used for indirect recharge estimation. The weather station operated by TUL at the tunnel entrance measures standard set of data, but the precipitation gauge does not melt snow, so the year totals are not accurate. We consider a rounded value from nearby official stations, which is 1000 mm per year, of which 20 % is estimated as the recharge. For the recharge variability, the precipitation evolution itself would not be well representative. Instead, we consider stream flow rate measured near the location as an indicator. We assume that when outflow exceeds the base value, then infiltration/recharge is occurring. These data are assumed to appropriately represent snowfall and melting effects. The infiltration calculated using the discharge excess method is normalized to the overall 200 mm/year average, which is expressed by the following formula for monthly totalized/averaged values [mm]: where Q outflow is the monthly total outflow from the watershed and Q min outflow is its minimum over observation period, expressing the base flow. This method is not expected to accurately measure the recharge rate at the site, but rather provide a representative value for total recharge and its seasonal variability for benchmark purposes.

Fracture/matrix meaning
Depending on model variant, the meaning of hydrogeological objects represented by 2D and 3D subdomains is different. The 2D structure can be either a fracture in the strict sense as an opening between two rock surfaces with the flow controlled by the cubic law or a planar representation of certain higher permeability zone, understood as a porous medium with Darcy's law controlled flow in a block of small finite thickness. Both interpretations are mathematically equivalent with the given transmissivity value.
The 3D domain can either mean a matrix block (compact rock without any fractures more significant than mineral grain interfaces) or an equivalent continuum representation of hard-rock blocks between larger-scale faults including a network of fractures less significant than the one represented by the 2D model domain. For consistent and simpler presentation in the paper, we use the terms ''fracture'' and ''matrix,'' respectively, for 2D and 3D subdomains, without regard to their actual physical role.

Benchmark models configuration
The conceptual model is based on the general situation in Figs. 2 and 5, transformed into individual local blocks representing a particular tunnel position (Figs. 3,4). We include topographically driven flow in the shallow zone and vertical flow in the matrix and fracture below, which is disturbed by the tunnel drainage. We considered two general models to represent deep or shallow tunnel segments (corresponding also to different scale with respect to the topography, as illustrated in Fig. 5): • For the deep tunnel, the topographic effect is simplified-the model surface is horizontal, and the flow directions are controlled by the choice of the boundary conditions ( Fig. 4 left). Most of the recharge water is conducted in the shallow permeable zone horizontally out from the model, while a small part goes vertically into hard-rock (both to the fracture and to the matrix) part of which drains into the tunnel and part of which drains to the deeper local groundwater cycle. • The shallow tunnel case is representative of the tunnel crosscutting the shallow permeable zone. Here the topography effect is simplified to a flat surface of the appropriate angle to the tunnel ( Fig. 4 right). No fracture is considered in the shallow tunnel model. We define one model (notation M1) based on the shallow tunnel configuration (with the measurement representing the shallow tunnel segment as a whole) and three models (notation M2-M4) based on the deep tunnel configuration, for three different depths representing particular measured discharge locations, also distinguishing different meanings of the 2D and 3D subdomains (fracture/faults, matrix/equivalent continuum)- Fig. 3b-d, Table 1.W e apply symmetries for the numerical problem solution, one vertical plane for M1 (along the tunnel axis) and two for M2-M4 (along the tunnel axis and the fracture plane), with the resulting configuration on The M2-M4 domain is a block of 100 m length along the tunnel, 300 m width perpendicular to the tunnel and 400 m high (Fig. 4 left). The dimensions are chosen so that the boundary does not interact much with the tunnel effect. For technical reasons of domain connection in the discretized model, the fracture is extended along the shallow zone up to the surface. The shallow zone depth is 20 m. The tunnel depths and the fracture domain thicknesses are specified in Table 1. The meaning of the thickness value of 1 m for a single fracture is that we regard the hydraulic conductivity value as the transmissivity value in [m 2 /s] (without regard on a real geometry) and the porosity is conceptualized as the ratio of mobile and tracer-accessible water volume to the total volume in the fracture. The permeability in M4 is representative of the permeability of the fracture network in the fault zone, and porosity can be conceptualized at the fracture network porosity for the fault zone.
The domain of M1 is of similar size in vertical (400 m valley side and 580 m hill side) and transversal (400 m) directions, but much longer in the tunnel direction, 1000 m (Figs. 3a, 4 right).

Boundary conditions
The boundary conditions for either steady-state or transient flow are specified in Fig. 4. The infiltration rate (2nd type b.c.), either constant or variable, is prescribed on the top surface. The zero piezometric head (1st type b.c.) is prescribed on the vertical side of the shallow zone in the position opposite to the tunnel-representing the undisturbed equilibrium hydraulic state and controlling the flow direction in the shallow zone, i.e., out from the model, as would be the effect of topography in a real case. The remaining lateral boundaries are with no flow, based on symmetry. The tunnel wall is defined as the atmospheric pressure (1st type b.c.). The piezometric head on the bottom side is derived from the local topography and representative dimensions between the infiltration and drainage zones, i.e., the elevation difference of 200 along 1000 m distance corresponds to 80 m head difference along the 400 m model height. The head h =-80 m corresponds to the pressure head p = 320 m. The conditions are defined consistently on adjacent boundaries of the 3D matrix block and the 2D fracture plane (cases of top flow rate, tunnel pressure, bottom head, and lateral head). The M1 boundary conditions are similar: The top side is a recharge boundary with prescribed infiltration. The tunnel is defined by atmospheric pressure. On the vertical side of the shallow zone at its lower end, we prescribe the constant head h = 0 of the water table at the model domain edge (surface elevation), representing discharge into a stream along the valley, a little below the tunnel elevation. The remaining sides including the bottom are impermeable. We also note some of the simplifications-the cylindrical open space of the tunnel ends below the surface (otherwise would result to a singularity at the boundary intersections), statically defined recharge for any pressure value (no ''seepage face'' b.c.) to keep same model abilities and avoid nonlinearity.
The tracer concentration is prescribed on the recharge boundary and the zero concentration gradient (representing an advection-dominated outflow, i.e., no external disturbance) on the discharge boundaries of the tunnel and the ''valley'' side of the shallow zone. Zero mass flux is prescribed on the remaining boundary. For the purpose of benchmark, we consider a fictitious case of a pulse tracer injection. It is a precursor of the problem with real evolution of natural tracer concentration in the related paper (Gardner et al. DECOVALEX 2015 at http://www.decovalex.org/ resources.html#special-issues). We approximate a Dirac pulse injection by a short period (0, t 1 ) of prescribed concentration (used c = 100), where t 1 is typically a numerical time step. The effect of this discrete pulse time period is negligible for most simulations; however, in the case of the M1 problem, where the tunnel approaches the boundary surface, resulting in very short travel times right at the contact, the effect is apparent to the early time breakthrough.

Model variants and their parameters
In general, hydraulic conductivity, specific storativity, porosity, molecular diffusion coefficient, and dispersivity must be defined for each subdomain, respectively. The parameters are partly prescribed and partly subject of inverse problem solution, depending on problem variant. The full list of model inputs is in Table 2. Some of the parameters are common throughout the paper, while the others are specific for particular solution steps or model variants and are presented within a solution procedure below. We note that the parameter distribution has been greatly simplified, to keep the problems simple enough for comparison purposes.
The problem variants and their parameter sets are organized in the following structure: • Steady-state hydraulic problem of M2-M4: The shallow zone hydraulic conductivity is given 10 -6 m/s, and that for fracture and matrix is evaluated as an inverse problem. We assume the fracture permeability isotropic for simplicity. The effect of anisotropy in the fracture would be negligible with the transverse hydraulic gradient zero or very small. • Steady-state hydraulic problem of M1: The shallow zone hydraulic conductivity is given 2 9 10 -6 m/s, and the matrix hydraulic conductivity is given 1 9 10 -8 m/ s. Besides the reference infiltration rate 200 mm/year, two other values 0 and 500 mm/year are considered, representing ''asymptotic'' states of low/high-infiltration events or periods, as bounds for eventual transient flow solution (not evaluated here). K_shallow [m/s] 2 9 10 -6 hydr.
1 9 10 -6 tracer 1 9 10 -6 1 9 10 -6 1 9 10 -6 K_matrix [m/s] 1 9 10 -8 hydr. Diffusion coeff. [m 2 /s] 1 9 10 -9 1 9 10 -9 1 9 10 -9 1 9 10 -9 • Transient hydraulic problem of M2: Additionally, specific storativities for all subdomains are given as 10 -5 m -1 , and the variable infiltration rate prescribed as specified in ''Measured data'' section. Although the storativity value for the shallow zone (actually unsaturated) appears unrealistic small, this choice is made to fit the measured range of variance. For consistency, the steady-state model solution above is used as the initial condition. • Pulse tracer transport of M1-M4: It is based on the steady-state flow field, so the calibrated hydraulic conductivities of M2-M4 and a little different choice for M1 are used. For the transport problem, the diffusion-dispersion data are defined common for the whole domain, based on general literature ranges. We regard the problems as the same scale, as they are part of one block of rock; therefore, the dispersivities are set the same for all the variants. Porosities of the respective subdomains are used as given in the comparison; the values were determined by a separate raw inverse estimate, to ensure quantitatively relevant problem, without actual goal to fit the ''measured'' water age. More details on the inversion procedure are given in .

Solution methods and procedures Governing equations
We solve standard equations of porous media/fracture flow and solute transport. The fracture is represented by the same equation but with the appropriate meaning of the coefficient. The formulation of the equations for a set of subdomains of mixed dimensions is stated below. This is common for both Flow123d and OpenGeoSys solution although the actual numerical scheme is different. For PFLOTRAN solution, the 3D domain only is considered and the fracture is represented by an equivalent 3D domain. Vice versa, all the softwares are based on models of more generality than presented here and the presented equations are special cases of them. We define the multidimensional problem domain X as X 1 [ X 2 [ X 3 and denote the geometric dimension d = 1, 2, 3. The Darcy's law and mass balance equation are formulated as where the fluxes between the different dimensions are defined proportional to head difference, consistently with the Darcy's law. For more details on the subdomain connection, we refer to the software documentation (TUL 2015 The tracer transport is governed by the advection-diffusion equation where c is the concentration [kg s -1 ], D is the diffusiondispersion tensor [m 2 s -1 ], and n is the porosity [1]. Again, the source/sink term Q d (c) comprises the interaction between the subdomains. The tensor D is defined by where a L and a T are the longitudinal and transversal dispersivities [m], D m is the molecular diffusion coefficient [m 2 s -1 ], and s is the tortuosity [1].

Numerical schemes and software
Three different codes based on different numerical schemes are used for the comparison study in this paper. Although we focus on solution of the specific problem of this paper (the governing equations and the problem formulation specified above), the functionalities of the codes are wider, in quite different features between each other. Concerning the solution in this paper, the main features are compared in Table 3, including the reference to the participating authors' team and introducing a shorter reference for the codes used in the text below (OGS, F123, and PFT). The code details and references are described in the subsections below.
While F123 and OGS share the common feature of mixed-dimensional domain (3D and 2D), PFT simulates only 3D subdomains. The numerical schemes differ in discrete representation of the 3D and 2D interaction in the mixed-dimensional models. F123 uses a mixed-hybrid FEM with independent unknowns in each domain, and OGS uses a standard FEM with shared unknowns at the domain boundary, which has, e.g., a consequence in the fracture boundary flux evaluation (a comment below, in '' OpenGeoSys'' section). PFLOTRAN solves a 3D domain only, using an integral finite volume method, in contrast to the finite element methods used by F123 and OGS. All the three solutions differ in the spatial discretization geometry-unstructured tetrahedral, structured or unstructured hexahedral (Table 3). Although a typical size of an element in the critical area around the tunnel is similar in all the used meshes (examples in Fig. 6), the total number of elements differs significantly (see Table 4); also a different ratio of nodes/elements is related to either tetrahedral or hexahedra choice. Most of these numerical features have a potential impact on the solution.
Next, the solutions differ with the temporal discretization. For the transient hydraulics, all the teams used 1-month time step in accordance with the input data resolution. For the tracer transport, OGS and PFT applied adaptive time-stepping schemes (starting from seconds up to several months) while F123 calculated with a prescribed time step constant through the simulation interval (1 month).  The type of elements is specified in

Flow123d
The code Flow123d is an open-source code developed at the Technical University of Liberec (TUL 2015). It simulates groundwater flow, multicomponent reactive solute and heat transport, in fractured porous media, and supports computations on complex meshes consisting of elements of different dimensions. The mixed-hybrid finite element method was used for the flow problem, with the lowest-order Raviart-Thomas base functions on tetrahedra (3D subdomain), triangles (2D subdomains), and line segments (1D), i.e., piecewise linear functions for the velocity unknown, while the pressures are approximated by piecewise constant functions (Maryška et al. 2008;B řezina and Hokr 2011). The discrete unknowns are fluxes between the elements, pressures in the element centers, and pressures in the element side centers.
For the transport problem, the particular scheme of the discontinuous Galerkin method is based on general principles of the methods in relation to the advection-diffusion problems (e.g., Ern et al. 2009). It uses first-order base functions for the concentration and the non-symmetric variant. The time discretization is by the implicit Euler methods. The discontinuous Galerkin is one of the options to provide a solution of the advection-dominated problems free of oscillations and with a minimal numerical diffusion.
Basic algebraic operations are based on the PETSc library, including the option of parallelization. The code works in the command line regime and outputs to the GSMH (Geuzaine and Remacle 2009) and VTK/ParaView file formats for postprocessing. The simulations have been done with the Flow123d version 1.8.2.

OpenGeoSys
The BGR team of authors used the finite element code OGS (OpenGeoSys), which is based on the transient saturated groundwater flow and mass transport equation. The software code was originally developed by UFZ (Centre for Environmental Research, Leipzig) in an open-source platform (Kolditz et al. 2016). The code, initially for simulating flow and solute transport in fracture network, was extended to a multiphysical code, which can simulate fully coupled thermal-hydraulic-mechanical-chemical processes in the subsurface applications, such as geothermal reservoir engineering, CO 2 storage, construction of underground opening for the repository of radioactive waste, and its long-term performance as well as groundwater quality management. OGS is written in object-oriented C?? language and is parallelized using the MPI schema for all thermal, hydraulic, and mechanical processes. The chemical process can be simulated by using an external coupling mechanism with PhreeqC, ChemApp, and GMS according to the chemical environment. The simulations done for the Bedřichov model used eight domain decompositions.
Different element types in different dimensions can be combined in a finite element mesh, which may enable to describe a single fracture using, e.g., triangle or quadrangle elements while using tetrahedral or hexahedral elements for the rock mass (matrix).
To compute the tunnel inflow rate in the fracture (fractured zone), a postprocessing had to be introduced. The evaluation method was based on the pressure output from a structured quadrangle element mesh, because the accuracy is not sufficient if a flow velocity output was used, especially in case of an unstructured triangle element mesh.

PFLOTRAN
The SNL team of authors used PFLOTRAN, a scalable, parallel, multiphase, multicomponent, non-isothermal reactive flow and transport code to simulate multiple environmental tracer concentrations in heterogeneous 2D and 3D domains (Hammond et al. 2012;Gardner et al. 2015). For all simulations in this paper, PFLOTRAN was run in the Richard's equation mode, which simulates variably saturated single-phase flow and transport, although the boundary conditions applied for the Bedřichov problems lead to saturated-state simulations.
PFLOTRAN is written in object-oriented FORTRAN 9X and uses message passing interface (MPI) for distributed memory, domain decomposition parallelism. The Portable, Extensible Toolkit for Scientific Computation (PETSc) library is used for parallel Newton-Krylov solvers. Parallel IO is achieved using the HDF5 file format. PFLOTRAN can be employed on a variety of architectures and scales from single-processor laptops to 2 17 core petascale simulations.
PFLOTRAN solves the mass and energy balance equations that give rise to the partial differential Eqs. (2-5) using fully implicit, integral finite volume method. The flux of water and solute is computed for all faces in an element using a two-point flux discretization. PFLOTRAN works on only 3D elements, but can be used with structured and unstructured meshes of generic polyhedral elements. For improved accuracy of the finite integral method, hexahedral meshes were used in these simulations.
For the simulations here, unstructured hexahedral meshes were created using the CUBIT meshing software. For the M2 and M3 simulations, discrete fractures were meshed as 3D subdomains with 0.5 m thickness (symmetric half of a unit thickness). For M4, where a larger fault zone was modeled, a 3D subdomain of 2.5 m (one half of the prescribed value) was modeled as a larger, highpermeability zone. For all simulations, an initial simulation with zero concentration was run to steady-state conditions. These steady-state conditions were then used as the initial conditions for the tracer pulse simulation.

Breakthrough curve
In all simulations, the concentration evolution in the tunnel was calculated as the flux averaged concentration from all cells in the fracture domain or the shallow domain intersecting the tunnel for the given time step by: where _ M t is the mass flux of tracer and _ M w is the mass flux of water at time (t), and the integral is over the area A of the respective subdomain intersecting the tunnel. This concentration essentially represents the well mixing concentration of all discharge from the fracture into the tunnel at any time.

Mean transit time
The second part of tracer transport evaluation is based on mean transit time (MTT), a standard temporal transport characteristic (e.g., Maloszewski and Zuber 1996). It is defined as which is the first moment of g(t), the transit time (age) distribution of the particle pathways in the domain. For the pulse input c in (t) = d(t) (the Dirac function), the output concentration evolution is and therefore, the calculated concentration evolution at the discharge point gives the age-distribution approximation, which is then used to calculate MTT by the formula (7).

Inverse solution
We do not in particular focus on the inverse algorithm behavior for the problem and mention them only for completeness, as they have secondary role in the evaluation. The inverse problem of fitting hydraulic conductivities in the steady state is especially simple: First, it should have a unique solution as the number of parameters and the number of fitted observations are the same. Secondly, each of the observations (flow rates) is dominantly sensitive to one of the two parameters: the fracture flow rate on the fracture transmissivity and the matrix flow rate on the matrix conductivity. Therefore, it is easy to iterate manually to an optimal combination of the parameters-a particular algorithm is demonstrated on the data of this paper in its follower . The inverse codes actually used-UCODE coupled to Flow123d and DAKOTA coupled to PFLOTRAN-therefore are supposed to result ''exact'' values of inversion, not affected by an optimization method implemented.

Result: comparison of codes
Steady-state hydraulics of M2-M4 with inversion As the first step, we solve a simpler case of steady-state flow as an inverse problem, i.e., we compare hydraulic conductivities calculated by the models which fit the measured boundary (tunnel) flow rates. The resulting values are in Table 5. Comparing the variants M2-M4 between each other, we see that the evaluated hydraulic conductivities are directly controlled by the order of magnitude of the tunnel inflow rates, for the respective subdomains, the fracture, and the matrix. The results correspond to conceptual assumptions: The most of the infiltrating water is discharged in the shallow zone outer boundary, creating almost horizontal flow with the head difference of 15-20 m which resembles the terrain slope excluded from the model geometry. The lower part of the model is controlled by combination of the vertical flow and the tunnel drainage in a reasonable balance (Fig. 7). The differences between the codes in a range of percents, in case of Flow123d versus OpenGeoSys, are good for such kind of problem and well verify the solution. The difference of PFLOTRAN from the others is acceptable concerning general uncertainties in hydrogeology, but can seem high considering a benchmark-kind of a problem and the linear equation solution. There are several features of the problem compromising a precise numerical solution. One is the tunnel (small-scale geometric feature related to the whole problem and the appropriate mesh refinement) and second is the fracture of either lower dimension or smaller thickness, besides the possibly secondary effect of the contrast of parameter magnitudes. Moreover, the contact between the shallow zone and the fracture along the edge is a situation similar to a singularity . Then the difference between the code concepts, i.e., PFLOTRAN with the 3D subdomains only versus Flow123d and OGS with 3D-2D coupling, becomes significant for the solution results. In this case, we compare the direct solutions by the codes, instead of fitting the measured values, which would be too complex for comparison. Yet, the input values have been iterated by some trial-and-error steps to obtain common inputs being quantitatively relevant (Table 2). Three values are evaluated: water inflow rate from the shallow zone domain into the tunnel (discharge from the model) and two values of the piezometric head: at the top of the model ([-1000; 0; 180]-see Fig. 4)-and above the place of the tunnel and the domain interface intersection (approx. [-150; 0; 27]). While the tunnel inflow rate is meant as a counterpart of the measurement (Table 1), the head values are intended to check a qualitative relevance against a concept of water table following the topography, i.e., varying within the shallow zone.
The illustration of solution, as a distribution of the head, is in Fig. 8. The comparison of the selected quantities is presented in Table 6. The piezometric head at the top (the left column) is in very good accordance between the two model solutions by F123 and OGS, while there are significant differences for the position above the shallow tunnel section (the middle column). In this place, both the tunnel pressure/head boundary condition input and the surface head postprocessing can be sensitive to mesh geometry in connection with position of discrete unknowns (e.g., conversion between pressure and head with input of element/node vertical coordinate). The calculated tunnel inflow rate is consistently changed with the calculated head in the representative place. All the evaluated model values sufficiently agree for the average infiltration case of 200 mm/year. The calculated head is tens of meters below the surface at the top of the hill, while in a correct range near the shallow tunnel section. It was no worth to attempt more precise calibration, due to the major simplification of neglecting the unsaturated zone. The tunnel inflow for ''steady-state'' infiltration is little below the lower bound of the measurement (3 ml/s/m) and exceeds slightly the upper bound (15 ml/s/m) for the asymptotic limit of a heavy infiltration period. Thus, we can see the model concept and data well relevant to reality, within the limitation of the simplifications. In particular, the topography-parallel water table is not actually possible as the water flux in the shallow zone rises uniformly collecting the infiltration while the head gradient had to be constant.

Transient hydraulics of M2
This variant has been selected for test due to significant tunnel inflow rate variability in the shallow tunnel part (compared to M3 and M4), which is observable both in the real data and in the model. As the variable input (infiltration) fluctuates around its average of 200 mm/year (used for the steady-state initial condition), the tunnel inflow rate should also oscillate around its single-value counterpart in the steady-state model presented above.
The results are plotted in Fig. 9. We can see that the tunnel inflow rate oscillations are directly related to the infiltration inputs, which means that the reaction of the system is relatively quick compared to the one-month resolution of the inputs. On the other hand, the transition curve from one infiltration rate to another infiltration rate is relatively steep and therefore sensitive to the temporal discretization. So the graph for a longer period with month resolution is composed of individual peaks and pits. Consequently, the differences of the peak/pit values between the models can be explained by a different temporal discretization or different precision of the numerical scheme.
The benchmark confirms to be well related to the real conditions, as the range of the tunnel inflow rate is very similar to the measured counterpart, although some of the individual peaks or plateaus do not fit between (any of) the models and the measurement (Fig. 9). It is appropriate to the used model geometric simplification, whereas in the real case the hydraulic storage properties, including the unsaturated zone, can be much more complicated. We also note that no calibration has been used for the model parameters, besides the steady-state hydraulic model fitting to the average tunnel inflow (''Steady-state hydraulics of M2-M4 with inversion'' section).

Fictitious pulse-input tracer transport
Although the benchmark model is motivated by fit of tunnel water age and the pulse transport temporal analysis is a preparatory step for calibration with real tracer time evolution, the comparison in this paper has been done as the direct solutions of the three models for given sets of parameters. We recall from the ''Model variants and their parameters'' section that the inputs were previously estimated to produce quantitatively relevant transit times. We evaluate two main results: the breakthrough curve, i.e., the concentration evolution in the water discharging from the fracture domain (M2-M4) or from the shallow zone (M1) into the tunnel (a flow rate-weighted average of the whole  circumference/section length), and the mean transit time estimate, calculated from these breakthrough curve results (Eq. 7). The calculated mean transit time is strongly dependent on the used simulation time interval. The dependence and especially the detection of time interval necessary for convergence of the evaluated integrals is a subject of continuing paper . Here the comparison is made for particular time intervals agreed between the teams, referring to a concept of ''10 times MTT''. Typically, the concentration in the M2-M4 simulations in the final time is between two and three orders of magnitude lower than the peak value, but the MTT value is still relatively far from the limit value of the infinite interval.
The MTTs are presented in Table 7 together with the reference data of the ''measured water age.'' The breakthrough curves are plotted in four individual graphs in Fig. 10, together with the graphically illustrated MTT values, namely to show the pulse asymmetry by the relatively large distance (in time axis) between the peak position and the MTT. Both the curves and the means fit very well for M2, especially considering usually quite observable numerical approximation errors like the numerical diffusion and the significant differences of the three numerical schemes. For M3 and M4, the similar good fit is observed between Flow123d and OpenGeoSys while PFLOTRAN produces curve peaks little sooner and the decrease rates larger than the other two codes; this corresponds to visibly smaller MTT. The reasons are analyzed in more detail in the next section. The fit for M1 is good for the peak position, which results from immediate transit of mass from the surface just above the tunnel. But there are significant differences in the decrease rate of the tail. The results of OGS and F123 are closer to each other, even with reasonable agreement of MTT, while PFT results to MTT closer to the reference water age value. Although the curve relation is opposite to M3-M4, the reason could be similar in dispersion evaluation and numerical diffusion (below).
For all the M2-M4 variants, the MTT is larger than the reference value (but quite little for M2), which comes simply from the inverse model used for the porosities  estimate which used shorter simulation interval than the comparison here. On the other hand, the procedure how the reference values were obtained is also related to a shorter interval: For M2, the stable isotope data processed by a lumped-parameter model are from about 10-year periods (120 months, compared to 600-month simulation), and for M3 and M4, the 3 H/ 3 He ''apparent'' age (based in fact on the piston flow) strongly underestimates any contribution of flow paths longer than about 50 years of transit time (600 months, compared to 6000 months for the pulse simulation-based MTT evaluation). These arguments also partly answer why some of the porosities obtained by the inverse model can seem unrealistically small: The model tries to ''accelerate'' the transport, to decrease the MTT value appropriate for a smaller tracer-capturing period.

PFLOTRAN deviation analysis
PFLOTRAN shows a sharper peak, shorter MTT, and different tailing characteristics for the M3 and M4 models, which have longer transit times in general. In order to explore the reason for the difference between the PFLO-TRAN simulations and the other codes in the experiment, and its relation to the code architectures and numerical schemes (features listed at the beginning of ''Numerical schemes and software'' section), we investigated the effect of dispersion on the breakthrough curve and MTT.
The results from M3 simulations for a dispersion coefficient of zero and a run with a longitudinal dispersion coefficient of 5 m (as in Table 2) are shown in Fig. 11. Additionally, the Flow123d simulation has been done on a refined mesh of about 4-5 times larger numbers of nodes and elements. For higher dispersivity values, we see a broader peak and a longer tail and longer transit times, as expected. The difference between the models gets significantly smaller with decreasing dispersion and with mesh resolution. The effect of mesh refinement was significant for smaller dispersion only, as the DG scheme of F123 is able to compensate the numerical diffusion for less advection-dominated problems.
Therefore, we believe a highly likely source of discrepancy of the models is a combined effect of the numerical implementation of the hydrodynamic dispersion tensor and of the numerical diffusion/dispersion which is a complex process, especially when considering different numerical schemes and meshing topologies. E.g., the dispersion coefficients can be sensitive to approximation of velocity in the mesh. While there is discrepancy between the models, the mean ages produced are generally within a factor of two. Given the complexity of the benchmark model, and the large differences between code architectures, this spread is probably representative of the type of structural model uncertainty that can be expected for the transit time distribution in fracture flow models. It should be noted that the hydraulic comparison was much closer than that of the transit time distribution.

Conclusion
The benchmark problem definition and solution fulfilled their goals to compare the simulation codes in a real-worldrelated problem with several exceptional features. The configuration with a planar vertical feature and a hard-rock block allows studying codes with a state-of-the-art concept of multidimensional coupling and their comparison to a traditional 3D domain code. Then, the problem makes a practical intermediate step between water age estimation from natural tracers by either the lumped-parameter models or computationally expensive and data-demanding fullgeometry 3D hydraulic and transport models.
The particular code comparison resulted into their successful verification of the given problems. In particular, the different implementation of the 3D and 2D coupling by means of discrete unknowns, in Flow123d and Open-GeoSys, does not influence the evaluated results more than other usual meshing and time-stepping-related numerical errors. The detected deviations between the codes (PFLOTRAN versus others) were explained by an additional test. It suggests that the discrepancy is not directly related to either of the fracture representation choice (2D or 3D) but likely by the dispersion coefficients processing in the scheme and by the numerical diffusion. On the other hand, the evaluation of dispersion from velocities (as the hydraulic model postprocessing) based on discrete values can depend on the form of the fracture and matrix degrees of freedom division. Other numerical effects are studied in the related work , where the timestepping error and the injection boundary condition precision are found of minor importance for Flow123d.
Although the study was not primarily intended to inverse modeling, i.e., rock parameter estimation from the observed data, we demonstrated that in most cases and problem features we were able to get both qualitatively and quantitatively relevant conditions with respect to the realsite conditions.
The parametric sets are used as initial estimations for the related data interpretation study (Gardner et al. DECOVALEX 2015 at http://www.decovalex.org/resour ces.html#special-issues). The presented problem solution offers an efficient procedure for processing of natural tracer data sampled in a tunnel, which potentially offer more information than a mean age of age-distribution parameters, in particular estimating rock transport parameters within some simple inhomogeneity pattern in connection with the hydraulic model.