Three-dimensional hydrostratigraphy and groundwater flow models in complex Quaternary deposits and weathered/fractured bedrock: evaluating increasing model complexity

Greater complexity in three-dimensional (3D) model structures yields more plausible groundwater recharge/discharge patterns, especially in groundwater/surface-water interactions. The construction of a 3D hydrostratigraphic model prior to flow modelling is beneficial when the hydraulic conductivity of geological units varies considerably. A workflow for 3D hydrostratigraphic modelling with Leapfrog Geo and flow modelling with MODFLOW-NWT was developed. It was used to evaluate how the modelling results for groundwater flow and recharge/discharge patterns differ when using simple or more complex hydrostratigraphic models. The workflow was applied to a study site consisting of complex Quaternary sediments underlain by fractured and weathered crystalline bedrock. Increasing the hydrostratigraphic detail appeared to improve the fit between the observed and simulated water table, and created more plausible groundwater flow patterns. Interlayered zones of low and high conductivity disperse the recharge/discharge patterns, increasing the vertical flow component. Groundwater flow was predominantly horizontal in models in which Quaternary sediments and bedrock were simplified as one layer per unit. It appears to be important to define the interlayered low-conductivity units, which can limit groundwater infiltration and also affect groundwater discharge patterns. Explicit modelling with Leapfrog Geo was found to be effective but time-consuming in the generation of scattered and thin-layered strata.


Introduction
The main aim of hydrogeological studies is often to model groundwater flow, in which the three-dimensional (3D) geological structure is simplified; however, variation in hydraulic conductivity and the geological structure has a considerable impact on groundwater flow and groundwater discharge patterns, as presented in the study of Freeze and Witherspoon (1967). The construction of a 3D hydrostratigraphic model prior to flow modelling is essential to accurately characterize groundwater flow patterns (e.g. Artimo et al. 2004;Wycisk et al. 2009;Cox et al. 2013;Huntington et al. 2013), especially when the hydraulic conductivity of geological units varies considerably. In addition, variations in hydraulic conductivity in 3D groundwater flow models are usually based on model calibration, probabilities or uncertainty analysis (Refsgaard et al. 2012;Feinstein et al. 2012); however, the interpolation of scattered data can cause significant misinterpretation, as areas having highly dispersed aquifers, confining units and deposits are formed in a complex and unpred i c t a b l e m a t t e r . F o r e x a m p l e , m o d e l l i n g o f t h e hydrostratigraphy of an area that has formed during several glaciations with weak glacial erosion is challenging due to the local and scattered nature of the aquifers and confining units (Ross et al. 2005;Howett et al. 2015;Åberg et al. 2019). The hydraulic conductivity (K) of glacial till is orders of magnitudes lower than that of sands and gravels (10 −5 -10 −2 m s −1 , e.g. Freeze and Cherry 1979), and understanding of the continuity of the units is thus important. In addition, weak glacial erosion has enabled the preservation of weathered bedrock, which commonly occurs in areas where intrusive or metamorphosed bedrock is exposed (Wright 1992;Hall et al. 2015).
The chemically weathered top of crystalline bedrock can be clayey, and its hydraulic conductivity thus tends to be low (10 −9 -10 −7 m s −1 , Welby 1981; Schoeneberger and Amoozegar 1990). By comparison, less weathered sandy or till-like weathered bedrock (gruss) can be more conductive (~10 −6 m s −1 , Maréchal et al. 2004;Schoeneberger and Amoozegar 1990). It is also common that the fractured upper part of bedrock underlying the weathered zone has even higher hydraulic conductivity (10 −6 -10 −5 m s −1 , Maréchal et al. 2004), and it can behave like an aquifer depending on the fracturing rate and connections between the fractures. Highly simplified single or two-layered groundwater flow models are quick to produce but usually not sufficient, especially in the case of transport modelling-for example, Artimo (2002) found that a two-layer model, one layer with unconsolidated Quaternary deposits and the other with weathered/fractured bedrock, was not adequate for transport modelling. Freeze and Cherry (1979) also illustrated the effect of a high conductivity zone on groundwater flow paths, demonstrating the importance of K variation in transport models.
The ore potential of northern Finland is high, promoting intensive exploration activities (Eilu 2012). Northern Finland has been an area of intensive exploration for gold, nickel, copper and platinum group elements (PGE) since the 1970s (Eilu 2012) and glacial stratigraphy and geochemistry have been methods applied (Hirvas et al. 1977;Gustavsson et al. 1979) in many investigations. Extensive data collection has also enabled the study of Quaternary glaciation history (e.g. Kujansuu 1976;Hirvas et al. 1977;Hirvas 1991), which has revealed the presence of complex sediment arrangements of alternating glacial and interstadial deposits. Due to the close proximity of former ice-divide zones, complex sediment packages are a recurring feature in ore-prospective areas in Finnish Lapland (Hirvas 1991;Helmens et al. 2000;Salonen et al. 2014;Howett et al. 2015;Lunkka et al. 2015;Åberg et al. 2017a). Previous studies have also revealed that weathered bedrock is a prominent feature in Central Lapland (Hirvas 1991;Hall et al. 2015), and has been preserved due to weak glacial erosion (Pulkkinen 1983;Hirvas 1991).
In this study, the Sakatti mining development project in northern Finland (Brownscombe et al. 2015) was used as a study area, since a large amount of detailed information on Quaternary sediments and their hydraulic properties already exists (Hirvas et al. 1977;Gustavsson et al. 1979;Hirvas 1991;Åberg et al. 2017a;Åberg et al. 2019). The project is located in an area of complex hydrostratigraphy consisting of peat, alternating till and sorted sediment units and weathered/fractured bedrock overlying more competent bedrock (Åberg et al. 2017a, 2019). In this mining development project, hydrology and hydrogeology have an important role in environmental impact assessment (EIA), the mine design and infrastructure planning and it has been investigated with three-dimensional (3D) flow modelling. Understanding of the connections between the main hydrostratigraphic units, i.e. peat, glacial and fluvial sediments, weathered/fractured bedrock, competent bedrock and nearby river, is needed. The surficial aquifer-aquitard system consists of relatively thin (ca. 10 m on average) but complex unconsolidated Quaternary deposits. Aquifers that mainly consist of glacigenic and fluvial deposits can be disconnected due to interlayered tills. In addition, underlying fractured bedrock can host a bedrock aquifer, in which connections with the Quaternary aquifer-aquitard system are poorly known.
Recently, a 3D geological model of the Quaternary deposits (Åberg et al. 2017a) and a 3D groundwater flow model (Åberg et al. 2017b, 2019) were constructed for the Sakatti mining development site. In this study, a four times as large 3D geological model was constructed. Based on additional data, a model with more detailed hydraulic properties of sedimentary units and weathered and fractured crystalline bedrock was constructed. Concurrently, a workflow for generating a 3D flow model using Leapfrog Geo (Seequent Ltd. 2020) was described. The groundwater flow model was used to examine the groundwater flow, recharge and discharge patterns of the western part of the Viiankiaapa mire. Detailed understanding of the flow patterns is needed to assess possible groundwater dependency of the ecosystems in mire areas as well as to understand the possible interaction between surface water, shallow groundwater, and water in weathered bedrock or bedrock fractures.
The aim of this study was to analyse how the modelling results for groundwater flow and recharge/discharge patterns differ when simple or more elaborate hydrostratigraphic models of the study site are used. A workflow originally applied by Åberg et al. (2019), which combines 3D modelling in Leapfrog Geo using the hydrogeology tool of Leapfrog and the graphical user interface (GUI) ModelMuse (Winston 2009 USGS) with the flow modelling code MODFLOW-NWT (Niswonger et al. 2011), was described and further developed. The workflow was used to examine how hydrostratigraphic model complexity affects the results. It was hypothesized that a simple hydrostratigraphic structure is inadequate for modelling sitescale groundwater flow and recharge/discharge patterns in complex formerly glaciated sedimentary environment, while greater complexity in a hydrostratigraphic model yields more detailed and plausible groundwater flow patterns. The hypothesis was evaluated with four different versions of hydrostratigraphic models.
This paper discusses the effects of K heterogeneity on sitescale flow patterns, groundwater discharge in mire areas, groundwater flow model calibration, and 3D modelling methods, including simplification of the geological model in the hydrostratigraphic model.

Geological and hydrological background
The study area is located in the municipality of Sodankylä in northern Finland (Fig. 1) and lies in the valley of the River Kitinen, which is surrounded by vast mire areas. The topography within the valley is relatively flat, lying 180-200 m above sea level. The valley is surrounded by inselbergs rising 10-100 m above the flat-lying areas.
The study area is situated in the boreal and subarctic regions of the Northern Hemisphere, with cold winters and cool summers. The annual mean temperature is −0.4°C and the average annual precipitation varies between 500 and 600 mm (Ilmatieteen laitos, 2018).
The Proterozoic crystalline bedrock within the study area mainly comprises mafic volcanic rocks, gabbros, quartzite and mica schists (Tyrväinen 1980(Tyrväinen , 1983. The upper part of the crystalline bedrock is highly fractured (10-100 m) and the top of the upper part is partly chemically weathered (Åberg et al., 2019). According to Hall et al. (2015), the weathering indices indicate a variable degree of weathering within the study area. Gruss-type weathering appears to dominate in the study area, while clay-type weathering and weakly weathered areas have patchy occurrences.
The rugged bedrock surface is covered with Quaternary sediments (Åberg et al. 2017a). Multiple glaciation cycles have deposited alternating glacial tills and glaciofluvial and fluvial sediments Åberg et al. 2017a), which have been preserved due to weak glacial erosion (Hirvas 1991;Johansson and Kujansuu 2005) within the study area. In addition, aeolian Holocene/deglacial deposits have been observed in the study area. A diverse sediment package that consists of at least four till units and four sorted deposit units, mainly gravel and sand, creates a highly heterogeneous aquifer-aquitard system in Kärväsniemi (Fig. 1). Comparable sediment packages have also been encountered in the river banks and beneath the mire (Åberg et al. 2017a).
According to Åberg et al. (2019), the annual water budget is positive since the mean annual precipitation rate of 550 mm exceeds the mean annual evaporation rate of 300 mm, and groundwater recharge dominates within the valley of the River Kitinen. Major groundwater recharge occurs from April to September. The highest recharge rate in spring results from the spring thaw and the secondary maximum results from intense rainfall in summer and autumn. The River Kitinen is a gaining river, and the groundwater originates from the surrounding river banks and the vast mire areas.
The study of Åberg et al. (2019) indicated that the hydraulic conductivities of the Quaternary sediments are highly variable, ranging from 2.2 × 10 −7 to 4.3 × 10 −3 m s −1 (AA Sakatti Mining Oy, unpublished data, 2019;Åberg et al. 2019). Groundwater is mainly unconfined, and scattered groundwater reservoirs exist in the river banks and beneath the Viiankiaapa mire. However, local confined or perched groundwater was observed at least in Kersilönkangas area ( Fig. 1), where till may act as a confining unit. The relatively high hydraulic conductivity of the fractured bedrock, from 2.7 × 10 −8 to 6.5 × 10 −5 m s −1 , indicates the presence of a bedrock aquifer (SRK, unpublished data, 2019;AA Sakatti Mining Oy, unpublished data, 2019;Åberg et al. 2019).
Minerogenic Quaternary deposits are overlain by peat deposits. The mire located on the western side of the River Kitinen is called Vanttioaapa. The eastern side of the River Kitinen is covered by the Viiankiaapa mire, which hosts endangered groundwater-dependent ecosystems and is a Natura 2000 conservation area (Hjelt and Pääkkö 2006). The Viiankiaapa mire is a patterned mire-fen complex, which started to develop about 10,000 years ago (Suonperä 2016), soon after the Scandinavian Ice Sheet retreated from the study area (Johansson and Kujansuu 2005;Stroeven et al. 2016). Groundwater discharge in the mire is probably related to the underlying fluvial sediments (Åberg et al. 2019).

General overview
A workflow was generated starting with 3D geological modelling using Leapfrog Geo (Seequent Ltd. 2020) followed by 3D hydrostratigraphic modelling and groundwater flow modelling using MODFLOW-NWT (Niswonger et al. 2011). In this study, two 3D geological models, GM1 and GM2, were gene r a t e d u s i n g L e a p f r o g G e o . I n a d d i t i o n , f o u r hydrostratigraphic models, HSM1, HSM2, HSM3 and HSM4, were generated from the GM models to test how model complexity affects groundwater flow patterns and the distribution of recharge and discharge zones ( Table 1). The structures of HSM1, HSM2 and HSM3 were based on GM1, whereas the structure of HSM4 was based on GM2.
Leapfrog Geo is designed for rapid implicit modelling. The 3D mesh generation in Leapfrog Geo is based on contact surfaces that define the upper and lower limit of the unit. The interpreted stratigraphy, which defines the relative age of the units, affects the behaviour of the defined contacts. Leapfrog Geo offers three types of contact surfaces: sediment contact, erosive contact and intrusion contact (Seequent Ltd. 2020). The contact surfaces of the 3D geological model were generated with the first two types. The FastRBF interpolation method of Leapfrog Geo, which resembles the kriging method (Oliver 1990), was used to generate the contact surfaces.

Geological models GM1 and GM2
The geological models were constructed based on drilling data from the Kersilö database (see Åberg et al. 2017a for the drilling data references) and data obtained from base-of-till (BOT) drilling by AA Sakatti Mining Oy (unpublished data, 2017). In addition, the detailed till geochemistry and bedrock observation databases of the Geological Survey of Finland (GTK) were used. Moreover, ground penetrating radar (GPR) profiles from Åberg et al. (2017a) were utilized. The geological models GM1 and GM2 were extended and modified from the geological model presented by Åberg et al. (2017a). The extent of the models was set to 6 × 8 km (48 km 2 ) to cover the study area presented in Fig. 1. The GM models consisted of ten units (3D solids), including the River Kitinen, peat, four sorted sediment units, four till units, weathered bedrock and fractured bedrock ( Table 1). The geological models were constructed with both explicit and implicit modelling using the methods of Åberg et al. (2017a). The horizontal resolution of both GM models was 20 × 20 m, while the vertical resolution was not applicable in Leapfrog. The River Kitinen was modelled as an erosive contact. However, in GM2, the river bottom was slightly edited compared to GM1 based on the bedrock observations database of GTK.
The peat bottom contact was generated with ArcMap (ESRI) as depth contours. Next, a triangulated irregular network (TIN) surface was generated from the peat contours with ArcMap. After this, the TIN surface that depicted the peat bottom depth was converted into a raster, and the raster was converted to the altitude above sea level by subtracting it from the topography. The converted raster was used as an erosion-type contact defining the peat bottom in a geological model in Leapfrog. However, the peat bottom contact in GM2 was reedited based on new GPR profiles in spring 2019. During contour generation, 0.5-m-thick contours were created 2 m inwards from the border of the mire areas to prevent the peat close to the borders from being too thin. In addition, areas with no peat were defined with 20-m-thick contours 5 m outwards from the border of the mire areas. A random negative height rising above the topography was selected.
Minerogenic Quaternary sediment units (presented in Table 1) were modelled in an explicit manner as sediment contacts with the polyline tool. In GM2, revisions to the Quaternary sediment units were made compared to GM1, including changes to the layer structure of the Sahankangas area ( Fig. 1) and expansion of the lowest till unit (Fig. 2).
The bedrock surface contact for both GM models was generated as an erosive contact with implicit modelling. It was adjusted with the polyline tool in those areas where drilling did not reach the bedrock using the methods described in Åberg et al. (2017a).
In both GM models, the bedrock was modelled with two layers: the upper depicted the chemically weathered zone, whereas the lower zone depicted fractured bedrock, as in Durand et al. (2017) and Wright (1992), among others. In GM1, chemically weathered bedrock was modelled with a deposit-type contact above the fractured bedrock surface, as in Åberg et al. (2017a). In GM2, a more precise and extensive chemically weathered zone was generated based on Hall et al. (2015), Raatikainen (2019), AA Sakatti Mining Oy, unpublished data, 2019; and GPR surveys in 2015. The weathering zone was extended to cover the whole model area (Fig. 2), contrary to the restricted coverage of GM1. It was generated with Leapfrog as an offset surface from the bedrock surface with a minimum thickness of 3 m. A few dummy points and polylines were added to offset the surface to manually improve the interpolated bottom contact. In GM1, everything below the bedrock surface was assumed to be fractured bedrock, and the bottom was set to 140 m asl. In GM2, the bottom of the fractured zone was generated as an offset surface from the bedrock surface with a minimum thickness of 5 m (Fig. 2). The offset surface was also adjusted with a few dummy points and polylines. The fractured zone delineated the base of the model.

Hydrostratigraphic HSM models
Model complexity increased from HSM1 to HSM4 (Table 2). HSM1 had a uniform Quaternary sediment package and uniform fractured bedrock, whereas HSM2 had a uniform Quaternary sediment package but variable fractured bedrock. HSM3 had multi-layered Quaternary sediments and a single layer of variable fractured bedrock, whereas HSM4 had multi- Table 1 The hydrostratigraphic units used in 3D geological models ( Fig. 2) and 3D groundwater flow models Modelled geological unit GM 1-2 HSM1 HSM2 HSM3 HMS4 River Kitinen In HSM1 and HSM2, the Quaternary deposit layers, except peat, were combined into one layer layered Quaternary sediments and two-layered variable bedrock. HSM1 and HSM2 were simple models that consisted of four layers including the units the River Kitinen, peat, Quaternary sediments and fractured bedrock. Simplification from the geological model to the hydrostratigraphic model was performed in Leapfrog by removing contact surfaces between the sediment units (Table 1).
In the complex models, HSM3 and HSM4, most of the units from the GM models were retained (Tables 1, 3 and 4) and the Quaternary sediments consisted of several units, contrary to one combined unit in HSM1 and HSM2. HSM3 had eight layers, whereas HSM4 had ten layers including the units in Table 1. In HSM3, the middle till, lowest till and weathered bedrock units were removed due to their originally assumed small coverage, and the lower sorted and middle sorted units were combined into one unit. In HSM4, middle sorted deposits and lower sorted deposits were likewise combined into one unit, and middle till was removed from the model. Contrary to the construction of HSM3, the weathered bedrock and lowest till units were retained in HSM4 due to the expansion of their coverage.

Hydrogeology tool and conversion to a groundwater flow model grid
The hydrogeology tool of Leapfrog was used to convert the 3D hydrostratigraphic models into MODFLOW grids (Figs. 3 and 4). The units of the 3D hydrostratigraphic model were projected within the grid using the evaluation tool of Leapfrog. The resolution of the horizontal cell size was 20 × 20 m, and the minimum allowed cell size was 0.5 m for vertical resolution (Fig. 4c), except in HSM4, in which it was 0.0017 m (Fig.  4d). This minimum thickness appeared between the scattered units as "pinched" cells (e.g. Feinstein et al. 2012), while the basic MODFLOW codes only allow continuous layers. The bases of the MODFLOW grids were edited in HSM1-3 to reach 35 m (assumed average thickness of the fractured bedrock) (Fig. 4c) downwards from the bedrock surface, instead of 140 m asl in GM1 (as in Fig. 4a). In HSM4, the irregularly elevated base of the model (Fig. 4b,d), corresponding to the solid bedrock surface, was used as the base extent as such. The river top was reduced to 0.5 m above the bottom of the river in all HSM models, as described in Åberg et al. (2019). Layer 1 in the mire area was modified in complex models HSM3 and HSM4 to represent two-thirds of the mire thickness, as presented in Åberg et al. (2019), to correspond to the mire acrotelm. Layer 2 in the mire area corresponded to the mire catotelm. Acrotelm is the less decomposed upper part of peat with higher hydraulic conductivity and catotelm is more decomposed peat below the acrotelm (Holden and Burt 2003). In the simple models HSM1 and HSM2, layer 1 was unmodified from GM1 and peat was uniform.

Parametrization and boundary conditions of the groundwater flow models
The 3D units of the HSM models were used as zones for horizontal hydraulic conductivity (HK) parameters and vertical anisotropy (VANI) parameters in ModelMuse. The HK  and VANI parameters of the HSM models were based on earlier slug test and grain-size analyses, as described in Åberg et al. (2019), and more recent grain-size analyses (SRK, unpublished data, 2019; AA Sakatti Mining Oy, unpublished data, 2019). The HSM models were set as steadystate, which was assumed to be adequate for the generic understanding of groundwater flow patterns since annual watertable variation in most continuous monitoring wells is less than 1 m (Åberg et al. 2019). The precipitation rate was 560 mm year −1 for the hydrological year lasting from November 2013 to November 2014 to correspond to a single snow-covered period as in Åberg et al. (2019), and this configuration was used for recharge estimations. The recharge rate was interpolated from recharge estimates for two zones called "open water and mire area" and "forested area", as described in Åberg et al. (2019). The recharge rate of the mire area was assumed to be uniform for simplicity. The recharge rates of the remaining areas were interpolated with multiplier points, as described in Åberg et al. (2019), based on the results of the episodic master recession (EMR; Nimmo et al. 2015) and water-table fluctuation (WTF; Meinzer 1923) methods. All the models were surrounded by constant head boundaries in locations where the mire contacted the model border. Head-dependent flux boundaries were added in sediment boundary areas with the general head boundary (GHB) package. Their heads were calculated from the closest open water bodies, and conductance was defined as in Åberg et al. (2019). The drain package (DRN) was used to simulate discharge within the open mire area and the gravel intake area (Fig. 1). The River Kitinen was modelled with the river package (RIV) to simulate the river discharge.

Definition of fractured and weathered bedrock zones in the HSM models
In the simplest model, HSM1, the fractured bedrock was defined with a uniform initial hydraulic conductivity of 5.3 × 10 −6 m s −1 . In models HSM2, HSM3 and HSM4, the fractured bedrock was defined similarly as in Åberg et al. (2019). The hydraulic conductivity of the fractured bedrock was interpolated from hydrogeological testing in boreholes, including packer, spinner and constant head tests (SRK, unpublished data, 2019), along with two slug tests (Golder Associates, unpublished data, 2012). The thrust faults and brittle faults  Cross-sections of the Leapfrog models vs. MODFLOW grids from complex models HSM3 and HSM4 with initial hydraulic conductivities. The location of cross-section W-E is depicted in Fig. 1. a Cross-section of geological model GM1 used as a basis for HSM3. Everything below the bedrock surface was assumed to be fractured bedrock, and the bottom was set to 140 m asl. b Cross-section of GM2 used as a basis for HSM4. c Cross-section of model HSM3 presented as a modified MODFLOW grid in ModelMuse. d Cross-section of model HSM4 presented as a modified MODFLOW grid in ModelMuse. The thin "pinched" parts connect the scattered layers as a whole. The parameter zones of the HSM models were defined with the Leapfrog evaluation tool. The river was modelled as a separate unit, as it penetrated multiple layers including fractures were defined as their own parameter zones (Table 2). In HSM4, the type of weathered bedrock zone was defined as gruss type, clay type or nonweathered based on the Weathering Index of Parker (WIP, Parker 1970;Hall et al. 2015). The clay and the gruss zones were delineated based on WIP calculations of till geochemistry data (weathered bedrock samples and fine fraction of till) in Hall et al. (2015) and Raatikainen (2019). Gruss and clay-zone polygons were drawn and used as parameter zones (Fig. 5b). The remaining least weathered areas were included in the same parameter zone as fractured bedrock.

Calibration of the HSM groundwater flow models
A total of 165 observations, including head observations from groundwater monitoring wells (n = 34), water-table measurements from flarks and other open water bodies (n = 126) on the LiDAR DEM (light detection and ranging digital elevation model) and spring observations (n = 5), were used as the calibration points in all the HSM models. The head observations of the wells were added to the layer that corresponded the depth of the screen of the well ( Table 6). The observations from LiDAR DEM were added to layers 1 or 2 depending on the expected water table. The hydraulic conductivities were first manually defined with interpolated HK parameter multipliers in ModelMuse, as presented in Åberg et al. (2019). Then, the hydraulic conductivity of the peat acrotelm, thrust faults, brittle faults and fractures, the gruss zone and clay zone, and RIV and DRN conductance were calibrated with PEST (Doherty 2015), depending on the model version (see Table 5). Parameters that had low sensitivity or indicated illposedness were set as fixed parameters. The variation of Marquardt lambda and singular value decomposition were enabled during parameter estimation, which helped to find the best improvement of the objective function of the head observation (see Doherty 2015). The limits for parameter values were set as follows: the maximum allowed value was 0.01 for all calibrated parameters except brittle faults and fractures, which had a maximum of 1, and the minimum allowed value was 1 × 10 −10 for all parameters to maintain the convergence of the model. The unit was m s −1 or m 2 s −1 depending on the parameter. Starting values for RIV and DRN were 1 × 10 −5 and 1 × 10 −6 m 2 s −1 , respectively, based on test calibrations. The starting K of peat, Quaternary sediments and fractured bedrock were 5.4 × 10 −5 , 1 × 10 −5 and 5.3 × 10 −6 m s −1 (Table 5), respectively, based on the definition in Åberg et al. (2019). Starting K for gruss and clay were 1 × 10 −5 and 1 × 10 −8 m s −1 (only used in HSM4), respectively. The initial VANI parameters were defined as in Åberg et al. (2019) as 10 for Quaternary deposits, 100 for peat and 1 for all bedrock parameters. The VANI parameters remained fixed due to their low sensitivities (see e.g. Hill 1998;Poeter et al. 2014). Root mean square error (RMSE), mean absolute error (MAE) and bias were calculated from residuals of the head observations. Linear uncertainty was calculated for calibrated parameters with PEST. Since all calibrated parameters used were logtransformed, the uncertainty variances were calculated to logarithms of parameters (Doherty 2018).

Verification and validation of groundwater flow models
Model evaluation was performed with particle tracking with measured mean δ 18 O, δD and d-excess isotope values of the water samples in the Viiankiaapa mire based on data presented in Bigler (2019) and measured in September or October 2016. Particle tracking was performed in forward and backward directions with MODPATH (Pollock 1994) to describe the discharge and recharge locations as well as flow routes of the water having certain isotope composition. Isotope compositions were assumed to reflect the evaporation state of water in the recharge area. Neither altitude effect (altitude varies from 180 to 200 m) nor groundwater transport were supposed to affect the isotope values (Gat 2010). The δ 18 O, δD and d-excess values were inserted to modelled flow paths with ArcMap to study the flow paths of different isotope compositions in 3D with Leapfrog. Moreover, the isotope values of the particle flow paths were compared to measured isotope values of springs on the shore of the River Kitinen presented in Åberg et al. (2019) measured in August 2015. Firstly, the isotope values of the springs were grouped into five groups based on their locations. Secondly, the flow paths close to the defined five groups were selected and studied in 3D view. Then, the fit between isotope values of the selected flow paths and spring observations were calculated and defined based on the following criteria: the deviation from spring observation is ±1, ±5, and ±1.5‰ in δ 18 O, δD and d-excess values, respectively. The difference is assumed to be acceptable if the value is within one-sixth part of the ranges −14.8 to -9.9, −108.8 to -77.0, and −4.4 to 8.6‰ in δ 18 O, δD and d-excess values, respectively. If all isotope values of flow paths coincided with the spring observations within acceptable limits, the fit of the group was good. If only some of the spring values coincided with flow path values, the fit of the group was moderate. If all flow path values deviated from spring observations, the fit was poor. If no particle paths crossed the spring observation, the group was categorised as "no-match". To evaluate the models, overall points were calculated to HSM models with values 2, 1, 0 and empty for good, moderate, poor and no-match fits of the groups, respectively. The final decision of the group match was based on the fit of d-excess, which has a less seasonal variation (Gat 2010).

General distribution of Quaternary sediments and fractured/weathered bedrock
Based on the geological and hydrostratigraphic models, the study area consists of 51.7-59.0% till, 26.7-34.7% sorted sediments (silt, sand or gravel) and 13.6-14.4% peat, and their volumes are ca. 0.2, 0.1 and 0.05 km 3 , respectively. The Quaternary sediment package is 8 m thick on average, and it consists of till, sorted sediment and peat units that are 2 m thick on average (Tables 3 and 4). The altitude of the bedrock topography varies from 135 to 206 m asl, with an average altitude of 180 m asl (Fig. 5). According to the most complex models, GM2 and HSM4, the thickness of the fractured bedrock is highly variable (0-77.9 m), with the average thickness being 26.0 m. The average thickness of weathered bedrock is 6 m.

Legend
Elevation (m a.s.l.) 200.1-206 198.1-200 196.1-198 194.1-196 192.1-194 190.1-192 188 Groundwater flow budget and recharge/discharge patterns Flow modelling budgets (see Appendix for the second through fifth tables therein) after calibration indicated that the percentage of groundwater discharge to the River Kitinen was a lower proportion of the overall outflow (21.8-29.9%) in the simple models HSM1 and HSM2 than in the complex models HSM3 and HMS4 (34.7-35.5%). However, the percentage of mire discharge in the overall outflow appeared to be higher in the simple models (65.7-74.6%) than the complex models (57.2-59.1%).
In the simple models HSM1 and HSM2, the recharge rate was estimated to be 25% of the precipitation in forested areas. According to the complex models HSM3 and HSM4, the recharge rate in forested areas varied from 28 mm year −1 (5% of precipitation) to 380 mm year −1 (68% of precipitation; Fig. 6). In the mire area, a constant recharge rate of 100 mm year −1 (19% of precipitation) was used in all HSM models based on Åberg et al. (2019). The calibrated hydraulic conductivity of fractured bedrock was 1.7 × 10 −6 m s −1 in HSM1, whereas K of bedrock units varied from 8.4 × 10 −9 to 1.3 × 10 −3 m s −1 in models HSM2, HSM3 and HSM4 (Table 5). In HSM4, the calibrated hydraulic conductivity for gruss and clay zones was 2.2 × 10 −5 and 8.4 × 10 −9 m s −1 , respectively.
The hydraulic conductivity of tills varied from 5.0 × 10 −8 to 1.3 × 10 −4 m s −1 and that of sorted sediments from 1.2 × 10 −6 to 6.1 × 10 −4 m s −1 in models HSM3 and HSM4 (Table 11). The root mean square error (RMSE) of the calibrated HSM models varied from 0.38 to 0.33 m, and the residual errors between the simulated and observed water table were within ±1.6 m ( Fig. 7) with Pearson correlation coefficient (SPSS ver. 24) of 0.995-0.997 (p value <0.01). The RMSE appeared to decrease when model complexity increased from model HSM1 → HSM2 → HSM3. Overall MAE and bias were similar in all models; however, groundwater wells in Table 12 indicated a notable decrease of RMSE, MAE and bias from HSM1 to HSM4.
The calculated uncertainties with 95% confidence limits indicated that parameters brittle faults and Quaternary sediments (simple models) have similar confidence limits and uncertainty variance (0.05-0.1) regardless of the model. However, some parameters like thrust fault and river parameter had different variance. In HSM3, the variance of the thrust fault was smaller (~0.5) than in HSM2 and HSM4 (>1.5). In HSM 2, the variance of the river parameter was two compared to the smaller variance of~0.5 in other models. Brittle fault seemed to have the lowest variance in HSM2-4 (<0.1) indicating higher certainty. The variance in drain parameter was high (1.3-1.4) in both simple models. Gruss parameter had Brittle faults and fractures K -

Groundwater flow patterns
The flow model results indicated that groundwater mainly flows towards the River Kitinen (Fig. 7). In addition, horizontal flow was dominant in the simple HSM models (Fig. 8a,b), while the downward and upward flow component increased in the complex HSM models (Fig. 8c,d).
The variation in hydraulic conductivity appeared to be an important factor determining groundwater flow patterns. When a high-conductivity zone, e.g. a sand unit, lay below the peat catotelm, vertical flow areas in peat layers expanded. Subtill sorted sediments appeared to cause a similar effect. 1 8 6 1 8 5 1 8 4 9 7 1 9 6 1 9 5 1 9 0 1 8 7 1 8 3 1 9 7 1 9 6 1 9 5 1 9 4 1 9 3 1 9 2 1 9 1 1 9 0  Moreover, the low hydraulic conductivity of the clay type in weathered bedrock caused a partial increase in the hydraulic gradient in HSM4 (Fig. 7 location C). In addition, the existence of low hydraulic conductivity fractures or faults increased the altitude of the water table (see e.g. Bense et al. 2003). Flow model results indicated that groundwater discharge appeared to occur in river shoreline and mire areas (Fig. 9). The discharge areas were interpreted from drain cells or flooding cells, as presented in Åberg et al. (2019). The complex models provided more evenly distributed groundwater discharge patterns, reflecting the string and flark pattern of the mire (Fig. 9, location E). The Matarakoski hydroelectric power plant created a recharge area in the river due to the higher river stage compared to the water table on the surrounding shore areas (Fig. 9, location F).
Particle tracking indicated that the groundwater flow in surficial deposits occurred mainly in high hydraulic conductivity zones. The K of brittle faults and fractures seem to affect the particle paths indicating their importance to model results in HSM2-4. The deepest flow paths also seemed to mimic the model bottom indicating the importance of the topography of more competent bedrock, which was used as the model bottom. Moreover, particle tracking indicated groundwater recharge areas in river banks and in the Viiankiaapa mire, as well (Fig. 10).

Flow patterns vs. complexity
In the simplest model, HSM1, the small variation in hydraulic conductivity between units led to the dominance of horizontal flow (Fig. 8a). However, the more prominent variation in hydraulic conductivity created complexity in flow patterns due to the refraction of groundwater flow lines in locations of conductivity change (see Freeze and Cherry 1979), which can be seen in the more advanced models HSM2-4. The refraction of flow lines due to hydraulic conductivity variation also appears to be a factor that affects the occurrence of discharge areas (Freeze and Witherspoon 1967). High variation leads to higher refraction, increasing the vertical flow component.
In addition to refraction, the order of high and low conductivity layers appears to affect the site-scale flow patterns. Horizontal flow seemed to dominate when lower co nd uc tiv ity b ed ro ck a pp ea re d b en ea th hi gh er conductivity surficial deposits in HSM1 and HSM2 (Fig.  8a,b). The dominance of horizontal flow when a lower conductivity unit (e.g. till) was overlain by a high conductivity zone was already observed by Freeze and Witherspoon (1967). However, if the hydraulic conductivity of the bedrock was higher than that of Quaternary deposits, as in HSM2, the vertical flow component slightly increased (Fig. 8b, location D). The expansion in vertical flow in a mire due to high conductivity subpeat sediments was also observed by Reeve et al. (2000). Based on particle tracking the recharge areas seem to be quite similar in all models. However, modifications in high K units affected recharge areas of the flow paths, as seen in HSM4 in the southeastern corner i n Fig. 10d. Simultaneously, the discharge locations vary in the shore of the River Kitinen and the mire, as well.

Groundwater discharge in mires
The mire discharge areas in the simple models HSM1 and HSM2 were tilted compared to the complex models HSM3 and HSM4, in which discharge zones matched the flark and string patterns more closely (Fig. 9). As higher variation in hydraulic conductivity increased the upward component in flow (Fig. 8c,d), discharge was more widespread in the complex models (Fig. 11). However, the lower percentage of discharge (see Appendix for the second through fifth tables therein) in DRN in the complex models appears to be related to the lower hydraulic conductivity of the peat catotelm, which limits the groundwater discharge (Peat layer 2 K in Table 5). Even though model HSM1 had the most restricted distribution of discharge ( Fig. 9) in the mire area, it had the highest discharge rate due to highest overall K in peat (peat layer 1 K in Table 5). Åberg et al. (2019) used the "flow lower" dataset of MODFLOW from a cell-by-cell output file (see Harbaugh 2005;Niswonger et al. 2011), which describes groundwater in or outflow in every cell bottom, as a definition of groundwater discharge. The flow lower dataset indicated similar patterns to those in the DRN outflow cells in every model: upward flow (presented as a negative value) in flow lower datasets in layer 2 (peat bottom) resembled the areas of DRN outflow in the mire (Fig. 11). This means that water flowing upward from sediment discharges out of the model in flarks if the head rises above the model surface. However, in the simple models, the water table was observed to be tilted in flarks, and the discharge rate was two orders of magnitude higher in an upstream direction compared to a downstream direction (Fig. 9, location G). In contrast, in the complex models, the water table in flarks was smoother. The observed difference can be explained due to the lower refraction of flow lines and more horizontal flow direction in peat layers due to lower variation in hydraulic conductivity between the peat and the sediment beneath it. These findings indicate that the hydraulic conductivity of peat is the limiting factor for groundwater discharge. It is apparent that DRN may overestimate the discharge rate, as it assumes that all water that rises above the water table in DRN cells is discharge. Moreover, using DRN in open mires has its limitations, as discussed in Åberg et al. (2019) and in more detail in Batelaan and De Smedt (1998).

Groundwater flow model fit vs. model reality
The overall slight decrease of RMSE from HSM1 to HSM3 was not statistically significant. In contrast, the difference in   (Table 12) was statistically significant (p < 0.05) between simple and complex models according to the Mann-Whitney U test, indicating that model fit increases when more complex multi-layered models are used. However, the differences in residuals between models HSM3 and HSM4 can be related to the local structural revisions rather than the overall increase in complexity. Further changes to the structures of bedrock and surficial deposits between HSM3 and HSM4 indicate that only relying on obtaining a good fit is not adequate in model development. HSM4 was the only model with a weathered bedrock unit, and it had a higher overall RMSE than HSM3, indicating that model complexity also added more uncertainties to the structure. In the Kokkolampi area (Fig. 7  location A), adding a low conductivity clay zone in the weathered bedrock appeared to improve the model fit, while it caused an increase in the hydraulic gradient. The areas of the groundwater flow model with a poor fit could indicate defects in the layer structure, the definition of hydraulic conductivity, the recharge rate, or the existence of perched groundwater-for example, the residuals of heads indicated that a constant area of poor fit exists in the Kärväslampi area in all models (Fig. 7, location B). HSM4 had the poorest fit, which may be related to changes in the distribution and decreased volume of till deposits (Fig. 7,  location B). Moreover, stable δ 18 O isotope studies (Åberg et al. 2019) indicated that Kärväslampi probably represents a perched water table. These differences in residuals between models indicate that the calibration results could be used for model development by identifying the poorly fitting areas needing reconsideration of the hydrostratigraphic/geological structure or the recharge rate.
In groundwater flow modelling, flooding is usually considered as a model defect; however, in some cases, for example close to river cells (Feinstein et al. 2012), it can be considered as a simplification of groundwater discharge. In the complex models, low hydraulic conductivity units caused unrealistic flooding with moderate slopes, whereas in the simple models, unrealistic flooding was present when a smooth or high water table crossed the topography. Flooding also existed in areas where low conductivity features that were probably too thick appeared close to the model top, simultaneously with an overall high water table. In addition, if the estimated recharge is too high in relation to hydraulic conductivity, this can cause flooding. While the definitions of HK and the recharge multiplier were based on interpolation, some unexpected combinations might appear with overall low conductivity and too high recharge. Moreover, as noticed from the flow patterns, ignoring the clay zone (in HSM1-3) causes unrealistic connections between surficial deposits and the bedrock in clay weathering areas (Fig. 5b), while the refraction of flow lines and the limitation of infiltration due to a low conductivity zone are omitted.
In water balance studies, simple models can give as reasonable results as complex models (e.g. Hudon-Gagnon et al. 2015); however, HSM models indicated that details are important in studies related to recharge/discharge patterns since too simple models led to the underestimation of the vertical flow component and K variation between layers. Especially in mires, locations of groundwater discharge are notably affected by vertical flow component; thus, the importance of model complexity depends on the research questions.

Effects of low conductivity zones and transmissivity
The variations in hydraulic conductivity were found to be an important factor in site-scale flow patterns. The low hydraulic conductivity units such as lens-like till units, clay-type weathering zones or bedrock elevations, were observed to behave similarly to low hydraulic conductivity faults or fractures. The existence of a low conductivity zone created a barrier to groundwater flow, raising the water table in the upstream direction and lowering it in the downstream direction. However, the water table already started to lower in the middle of the blocking unit due to the steepened hydraulic gradient; thus, the dimension of the lens-like low conductivity zone appears to be the important factor in complex aquiferaquitard systems.
During model development, manual sensitivity analysis (see Reilly and Harbaugh 2004) indicated that changing the hydraulic conductivity by two orders of magnitude in sediment layers changed the water level by about 1 m, whereas a one order of magnitude change only had a minor effect. In addition to hydraulic conductivity, transmissivity depends on the layer thickness, which varied between 0 and 50 m for Quaternary sediments and between 0 and 80 m for bedrock, causing prominent variation in internal transmissivity. Differences in flow patterns between HMS3 and HSM4 could be due to variation in the layer thicknesses.

Effect of interlayered tills on recharge
While variation in hydraulic conductivities existed in the complex models HSM3 and HSM4, the recharge could be estimated in more detail. Moreover, the layer structure should be known when adjusting the recharge rates, as an impeding layer between sorted sediments may reduce the infiltration rate. The existence of interlayered tills (Åberg et al. 2017a) causes a reduction in the recharge rate. The upper till represented in the models was mainly Brittle faults and fractures appear to affect the river discharge in location I. The River Kitinen (RIV cells) and mire borders (DRN cells) are modified and reproduced after the National Land Survey of Finland sandy melt-out till, whereas the lower till assemblage represents more confining subglacial traction till; however, the structure of the lower till assemblage was highly variable (Åberg et al. 2017a; Åberg et al. 2019).

Simplification of the weathered bedrock zone in groundwater flow models
Within the study area, the bedrock is partially highly fractured to depths of 30-100 m and can be considered as an aquifer or aquitard (Åberg et al. 2019;Salminen 1975;Lestinen 1980;Hall et al. 2015). Two combinations of weathering profiles have been observed in Finnish Lapland: tripartite clay-grusssaprock and bipartite gruss-saprock (Hall et al. 2015). The gruss-saprock type dominates in the study area, but the clay type also exists (Fig. 5b). In HSM4, a simplification was made concerning the clay type: the existence of plausible gruss-type weathering beneath the clay type was omitted due to its unknown distribution. This simplification possibly led to an overestimation of the clay weathering zone thickness, which was found to be the reason for flooding in the NW parts, including the Kersilö and Vanttioaapa areas (Fig. 7, location C).
Modelling of the weathered bedrock zone could be improved by generating separate gruss and clay units with Leapfrog. However, according to the drilling data (AA Sakatti Mining Oy, unpublished data, 2019), thicknesses of weathered and fractured bedrock zones are highly variable within short distances (<50-100 m) and interbedded fresh bedrock exists within fractured media (Fig. 5b), causing challenges in interpolation.

Hydraulic connection between surficial deposits and weathered/fractured bedrock
Based on Åberg et al. (2019) and model results of this study, there is groundwater discharge from sediments and fractured bedrock into the river; however, the existence, distribution and magnitude of a hydraulic connection between groundwater in Quaternary deposits and the bedrock is still somewhat an open question. According to modelling, there is groundwater recharge in the mire. This can be seen in stable isotope composition (δ 18 O, δD) of groundwater and surface waters, as well (Åberg et al. 2019;J. Karhu et al., University of Helsinki, unpublished report, 2020). The evaporated component in groundwater was observed in sediments and upper bedrock in an area located in western part of Viiankiaapa mire, Pahanlaaksonmaa and Kersilönkangas areas. Based on flow modelling, the evaporated component in groundwater originates from the open water in the mire area. The thickness and composition of unconsolidated sediment affects the connec-tion of water in the surficial sediment system and the bedrock. Subpeat sediments consist of mainly sand and gravel overlying till on top of the bedrock. Till deposits can be scattered, and the hydraulic conductivity of till can be variable (5.0 × 10 −8 -1.3 × 10 −4 m s −1 SRK, unpublished data, 2019; AA Sakatti Mining Oy, unpublished data, 2019) enabling groundwater flow to fractured bedrock. Kvaerner and Snilsberg (2011) also observed that there is hydraulic connection from mires to bedrock fractures through thin subpeat till deposits (<0.5 m) in Norway.

Calibration of the HSM groundwater flow models
Calibration of complex multi-layered groundwater flow models As noted by Åberg et al. (2019), the calibration of complex groundwater flow models that have scattered Quaternary sediments and fractured bedrock is challenging due to the low sensitivities of the parameters in thin layers. Calibrating simple models such as HSM1 and HSM2 is more straightforward than complex models (HSM3 and HSM4), as their layers are thicker and their sensitivity to changes in the water table is higher. Åberg et al. (2019) also noted that the existence of perched groundwater causes uncertainties in calibration, and the observation points representing perched groundwater should be omitted. The known challenge in the previous model and HSM models is that the head observations from LiDAR DEM are at a higher level compared to the heads in groundwater monitoring wells (head variation can be as high as 0.3 m), leading to the overestimation of groundwater discharge in the mire area (see Åberg et al. 2019). Observations from monitoring wells were based on GPS altitudes fitted to the N2000 altitude system. In Finland, there is land uplift of 7 mm year -1 due to recovery from the last glaciation (Vestøl et al. 2019). Thus, the difference in altitudes between N2000 measures and LiDAR DEM produced in the 2010s is about 10 cm. In addition, LiDAR DEMs were produced in May 2010 and 2015 when the mire water table is the highest due to spring thaw.
The influence of hydraulic conductivities of the thrust faults, brittle faults and fractures was found to be spatially limited due to their slab-like nature, and the calibration results depended on the observations nearby. Therefore, the differences in calibration results between models (Table 5) and the high uncertainty variance indicated that the hydraulic conductivity of thrust faults is difficult to estimate, and it requires further measurements and nearby observations. However, the calibration results for brittle faults were quite similar with the low uncertainty variance in models HSM2, HSM3 and HSM4, indicating some certainty. The direction and distribu-tion of the trust faults, brittle faults and the fractures appeared to affect the calibration results: if a brittle fault or a thrust fault was more perpendicular to the groundwater flow direction, the calibration was more likely to be successful than if it was parallel to the groundwater flow direction. Multiple thrust faults were parallel to the flow direction, which probably caused high uncertainty in the thrust fault parameters. Moreover, the hydraulic gradient was also important, and many thrust faults occurred in low-gradient areas where calibration is generally challenging due to low sensitivities as also noticed by Reilly and Harbaugh (2004) in their earlier studies.
The benefit of using a transient model instead of a steadystate model could be more detailed modelling of the groundwater recharge during the spring thaw. Model fit of the transient model would most probably be poorer in simple models due to the conflict between oversimplified Quaternary sediments and high variation of the head in monitoring wells presented in Åberg et al. (2019). In contrast, in complex transient models, the model fit could be better but it would include probably more misinterpretations due to the higher number of parameters depending on the detailed subsurface structure.

Calibration of boundary conditions
Previous studies such as Doherty (2015), have indicated the importance of defining the boundary conditions; however, the variation in hydraulic conductivities and recharge appeared to have a higher impact on the results in the HSM models. During model development, it was observed that in relatively extensive models with a generally low hydraulic gradient, changing the conductance of the GHB boundary condition affects the water table near the boundary. GHB conductance changed the water table by more than 10 cm, while the distance from the model border was less than 700 m if the change in the GHB conductance value was more than two orders of magnitude.
The RIV conductance had a considerable effect on the calibration results due to the observed steep hydraulic gradient towards the River Kitinen (Fig. 7). The calibrated uniform RIV conductance values of the different HSM models varied from 4.4 × 10 −7 to 1.2 × 10 −4 m 2 s −1 (Table 5), reflecting the hydraulic conductivity of sediments and brittle faults below ( Fig. 9 location H, Fig. 11, location I). However, as Åberg et al. (2019) noted, the bottom sediments of the River Kitinen are poorly known and require further understanding for better estimation of the RIV conductance value and also estimation of the discharge area in the river. In the complex models HSM3 and HSM4, DRN conductance was found to be challenging to calibrate, as its sensitivity varied considerably, and the uncertainty variance was high. Therefore, DRN conductance was set to a constant value of 1 × 10 −5 m 2 s −1 . The calibration could be improved by adding flow observations, since only head observations were used in the HSM models. Flow observations could improve the uniqueness of the parameters (see Hill 1998;Poeter et al. 2014), and thus enhance the calibration results. Altogether, the calibration results were only easy to evaluate when the hydraulic gradient was steep enough. Calibration and the uncertainty of the calibrated parameters, both in the river and drain parameters, could be enhanced with Markov chain Monte Carlo (MCMC) uncertainty analysis to avoid possible local minima in parameters estimation (Poeter et al. 2014). However, confidence limits of linear uncertainty analysis provided reasonable values (10 −8 -10 −2 m −2 s −1 ) representing the most probable variation of hydraulic conductivity in the river bottom and peat layers.

Validation with particle tracking
The simulated flow paths based on the d-excess value dataset from Bigler (2019) were verified with the selected d-excess verification groups (1-5) based on a dataset from Åberg et al. (2019;Fig. 12). The verification groups and the simulated flow paths had the best match of d-excess values in the noncalibrated HSM4 model (Fig. 12, see the last table of the Appendix). Groups 2 and 3 had good fit, and groups 1 and 4 had mainly intermediate or poor fit in all models (Table 13). The intermediate fit of group 1 can be due to the lack of observations in the southwestern area (Fig. 12). In HSM1, group 1 had no match since the southernmost particles discharged already before the river (Fig. 12). Group 5 seemed to have a poor fit or no match in all models. The isotope values of group 5 are probably related to infiltration of evaporated water in the Kärväslampi Pond. Similarly, highly evaporated water in group 4 is probably infiltrated water from the mire (Fig. 12). Since MODPATH considers only advection, the mixing of different isotope compositions was not computed. In addition, evaluation of no match group is challenging since it can be related to a lack of measurements instead of a poor match. However, fairly good overall match with the highly variable d-excess composition in relatively short distances indicated that the flow paths from different sources can be characterized with simple methods like MODPATH.
Converting the geological structure to the flow model grid

Construction of a MODFLOW-NWT grid based on the Leapfrog model
With MODFLOW-NWT, it was possible to use relatively thin cells of 0.5 and 0.0017 m (Fig. 4c,d) as "pinched" cells in the HSM models, since the upstream weighting (UPW) package co-used with the Newton solver in the code prevented cells from drying, enabling model convergence (Niswonger et al. 2011). This means that the water table is also simultaneously calculated for "dry cells", making solver convergence easier for highly nonlinear models. It also allows a more realistic approach for recharge related to precipitation, which can pass through unsaturated zones to the saturated cells, since all cells in MODFLOW-NWT are kept active (Niswonger et al. 2011).
As for the geological point of view, the hydrogeology tool creating the parameter zones in Leapfrog makes the variation of the different units more realistic. Leapfrog assumes that thin cells belong to the geological unit that covers the cells the most; therefore, differences in discretization or cell resolution can generate different results. However, the variable cell thicknesses can cause errors in flow patterns due to the staggered nature of the cell heights (Gao 2011), while the flow only occurs on the faces of the six neighbouring cells when the rectangular prism grid is used (Harbaugh 2005).
Adding more vertical discretization within a hydrostratigraphic unit would enhance model accuracy (see Gao 2011), while head variation within a hydrostratigraphic unit could be taken into account in groundwater flow modelling. However, in the locations of "pinched" cells, the discontinuity of the layer can cause more instability and errors in the flow calculations if more layers are added. In addition, the block model in Leapfrog could be used to create parameter zonation for multilayered scattered units, and the block model could be imported into ModelMuse. However, using the block model in discretization with Leapfrog instead of the hydrogeology tool was more challenging and time consuming-in particular, adding the boundary conditions to the correct layer was more challenging.
In the complex HSM models, the peat unit had two layers, whereas other geological units had only one layer-for example, Reeve et al. (2000) used multiple layers in both peat and subpeat sediment units in crosssectional groundwater flow models to obtain more accurate flow patterns. McDonald and Harbaugh (1988) also suggested that in low conductivity units, it is preferable to use many layers with layer discretization to yield more realistic flow patterns. In the simple models, the upper layer of the mire was 0.5 m thick and the hydraulic conductivity was uniform for the whole peat thickness, while the complex models had an acrotelm of varying thickness on top of a lower conductivity catotelm.

Explicit vs. implicit modelling
Leapfrog is designed for fast implicit modelling, but it can also be used in explicit modelling. Implicit modelling requires a dense drill-hole network if a realistic representation of the geological structure is desired, whereas in explicit modelling, less data is sufficient, since geological knowledge can be used to construct the continuum of the strata. Explicit modelling with the polyline tool enabled the correction of unrealistic interpolated surfaces to more closely resemble the natural shape of geological structures.

Interpolation challenges with leapfrog
FastRBF interpolation generated bullseyes for the interpolated surface of the bedrock topography when the snap option was used for drill holes and polylines (Fig. 5). The snap option forces the interpolated surface to follow the exact altitude of an observation (e.g. drill holes, polylines), enhancing the undulation of the surface, which is useful in areas with a rugged bedrock surface (as in Fig. 4a,b). To improve the interpolation of the bedrock surface, the W-E orientation of schistosity could have been taken into account in the trend or anisotropy. Additionally, to obtain a smoother interpolated surface (as for the peat bottom), external software (ArcGIS) could be used.

Modelling of scattered units with contact surfaces and the polyline editing tool
The construction of Quaternary sediment units with Leapfrog and selected contact types depends on the geological structure. Moreover, the contact surfaces can be used in a creative way differing from its original purpose-for example, in this study, the bottom contact of the peat layer was generated using the erosion type. This type is suitable if the shape of the unit is exported from external software such as ArcGIS and the aim is to preserve the shape. In addition, using the erosional contact type is useful when the invasion of a lower unit is not intended, and the penetration of an older unit is avoided.
Leapfrog Geo appears to lack a predefined option or tool to model lens-like structures other than intrusions (Seequent Ltd. 2020) suitable, for example, to model a braided river environment, eskers or drumlins. The most difficult scenario in Leapfrog is when two different adjacent sediment units are observed in drill-hole data such as fill and bar-like structures (Fig. 13). This can be corrected with explicit modelling by adding additional dummy polylines or points bending the contacts, thereby avoiding the invasion of inappropriate units (Fig. 13a). Fill-like structures can be created by raising the borders of the bottom contact above overlying younger contacts (Fig. 13b). Bending the surface of a deposit-type contact beneath an older contact surface (either deposit or erosion) creates lens-like structures (Fig. 13c). The configuration in Fig. 13b could be used in fill structures such as fluvial or glaciofluvial fills, whereas the configuration in Fig. 13c could be used in braided river bars, drumlins or eskers. Crosssection-based modelling (e.g. GSIS, see Ming et al. 2010) could be more effective for constructing lens-like structures.
Alternative options to construct models with complex quaternary sediments and weathered and fractured bedrock In this study, the variation in hydraulic conductivities was generated within the predefined hydrostratigraphic units by interpolation with points. Alternatively, the units could be separated into smaller units, either with geostatistical tools such as T-PROGS (Carle 1999) or by defining smaller areas with polygons. There are also mathematical codes and stochastic methods that generate complex hydraulic conductivity fields based on pseudo-genetic models such as braided river bar structures (e.g. Felletti et al. 2006;Pirot 2015).
MODFLOW, which was used in this study, assumes that all units used in the model are equivalent to continuous porous media (CPM), including the fractured bedrock layer. However, Darcy's law is only valid when the fracture network is dense enough and behaves like a porous medium (Singhal and Gupta 1999). In addition, if the hydraulic conductivity of a fracture is too high, turbulent flow appears and Darcy's law is not valid. Thus, to improve the modelling accuracy for fractured bedrock, software such as ConnectFlow (Jackson et al. 2000) could be used to generate grids, applying both CPM and a discrete fractured network (DFN) (Hartley and Joyce 2013). Furthermore, ConnectFlow algorithms can be used to calculate K within the Quaternary sediment system underlain by fractured bedrock (Hartley et al. 2009;Hartley and Joyce 2013).
In most of the MODFLOW series codes, including MODFLOW-NWT, horizontally continuous layers are assumed, and the pinched cell approach is the only way to create lens-like structures. MODFLOW 6 (Hughes et al. 2017) could be an option for creating scattered units using unstructured grids, where the number of cells between layers can vary. Unfortunately, the hydrogeology tool of Leapfrog cannot currently create unstructured grids; however, Leapfrog is able to create uneven cell sizes in X and Y directions, enabling denser gridding for river banks, for example, and looser gridding for poorly known or homogeneous sediment areas.
To further develop the modelling workflow used in this study, multiple layers in bedrock units should be added to the MODFLOW grid to create fractures with varying dips. Additionally, fractures could have been generated with the deposit or erosion type of contact in Leapfrog and used as such for the hydrogeology tool, although this would be complicated. Moreover, Leapfrog Geo version 4.4 and older did not allow the use of the vein and intrusion type as contact surfaces in the hydrogeology tool. To model the hydraulic properties of fractures in more detail, the MODFLOW package Horizontal Flow Barrier (HFB) could be used to create low conductivity zones at 90°surrounding the faults and fractures. In addition, in groundwater flow modelling with the code FEFLOW, it is possible to generate discrete features such as conduits or fault structures inside the cell (Diersch 2014). Leapfrog could generate a FEFLOW grid with the hydrogeology tool, and it would therefore be possible to test a corresponding workflow.

Conclusions
The hydrostratigraphic model and its corresponding groundwater flow model for complex Quaternary sediments underlain by fractured and weathered crystalline bedrock was possible to construct with the tested workflow. A workflow, starting with 3D geological modelling using Leapfrog Geo and followed by 3D hydrostratigraphic modelling and groundwater flow modelling with MODFLOW-NWT and ModelMuse, was developed and tested for models differing in complexity. An increase in hydrostratigraphic details appears to improve the fit of the modelled and observed heads in groundwater wells at different depths. The appropriate model complexity is dependent on the research question. In water balance studies, simple models can give as reasonable results as complex models; however, if detailed understanding of groundwater flow patterns is needed, more detailed models are recommended. This is suggested especially in studies related to mire areas or groundwater dependent ecosystems.
In hydrostratigraphic modelling, Leapfrog is easy to use, and the polyline editing tool is advantageous for explicit modelling when the input data are not sufficient to generate realistic models by implicit modelling or when generating thin (<2 m) and undulating sediment units. However, using contact-based 3D modelling software in areas with unconsolidated sediments consisting of glacigenic and fluvial material seems to be challenging, since they often form lens-like structures and scattered layers. The lens-like structures can be explicitly created if dummy points and the contact surfaces are used innovatively. Cross-section-based software could enhance the automatic modelling of lens-like structures. The main advantage of using Leapfrog is that Leapfrog Geo with the hydrogeology tool interacts well with ModelMuse and MODFLOW-NWT. However, changing the geological 3D model in an explicit manner to test the effect of 3D flow model results was found to be time consuming.
MODFLOW-NWT code is suitable for groundwater flow modelling with complex but thin layered Quaternary strata h a v i n g m u l t i p l e u n s a t u r a t e d u n i t s . S c a t t e r e d hydrostratigraphic units can be modelled with this code as continuous layers when the "unnecessary thin cells" between scattered units are defined using the Leapfrog evaluation tool. Schematic model constructed with drill-hole data and dummy points. Dummy points are used to generate the bottom contact of a fill structure. This fill type was used, for example, in the top deposits of GM1 and GM2 underlying an erosion-type peat contact. c Schematic model constructed with drill-hole data and dummy points generating a bar-like structure (K) with an upper contact (3). This type was used, for example, in middle sorted deposits and upper till in GM1 and GM2 Details of the hydrostratigraphic model can easily be added in ModelMuse with polygons and interpolations between points, which adds detail to the groundwater flow patterns and groundwater discharge areas, and simultaneously reduces the time for working with Leapfrog.
Modelling results indicated the importance of variation in hydraulic conductivity: low conductivity units behave like low conductivity faults or barriers, regardless of their shape. Thus, for example, scattered till units with low hydraulic conductivity or clay-type weathered bedrock should be taken into account in model construction. Interlayered low and high conductivity zones disperse the recharge/discharge patterns in areas with a flat topography. In addition, high conductivity units beneath a mire add vertical flow. It is important to estimate the thickness of the upper fractured zone of bedrock, as the thickness of this layer affects its transmissivity. Model complexity generated in Leapfrog adds details to groundwater flow paths and discharge patterns and improves model fit compared to simple models when high variation in hydraulic conductivity is present.
The structure of a 3D geological model can be evaluated with groundwater flow modelling results, and the areas that may need reconstruction of the recharge rate, hydraulic conductivity or layer structure can be detected. Areas with a flat low hydraulic gradient, such as extensive mire areas, are difficult to calibrate, whereas high-gradient areas such as river banks, are easier to calibrate due to their higher sensitivity. In areas with a high hydraulic gradient, changing the grid was found to have a large local effect on the groundwater flow modelling results. Overall, it is worth constructing a detailed geological model if high variation in hydraulic conductivities between geological units is present and the target is to understand groundwater discharge and recharge patterns accurately. However, there should be enough data and time available for modelling.   Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.