Evaluation of Process Oriented Water Quality Predictions for Pit Lakes

Reliable prediction of water quality is essential to meet the official targets set for effluent composition and usage of pit lakes, and to identify the appropriate remediation technology. For this purpose, a complex conceptual model was established and tested on two pit lakes in central Germany with highly different acid loads. To assess the reliability of the predictions, we compared monitoring data from the past 8–10 years with previous water quality predictions. In the case of Lake Schladitz, with low acid load and sufficient buffering capacity, a simple setup with few model elements appeared adequate, and no readjustment of model settings was necessary. For six (pseudo-) conservative ions, the total average deviation between measured and predicted values was − 9%, as opposed to − 2.7% during calibration. For the repeatedly conditioned Lake Bockwitz, a model parameterized with more elements and specific process parameters determined from field and laboratory investigations proved adequate for calculating the technical alkalinity demand. During calibration, the total average deviation between measured and modelled conservative ions was +0.2%; it reached +4.5% during the prediction evaluation period. With continued reacidification of the lake, the predicted concentrations of dissolved Fe and Al increasingly deviated from lake measurements. Deviations diminished after the solubility constants of schwertmannite and hydrobasaluminite were included in the thermodynamic database. The developed conceptual workflow offers a tool to improve water quality predictions for other lake settings.


Introduction and Objectives
Open pits are remnants of surface mining, usually flooded by rising groundwater and precipitation. Sometimes flooding is enabled by pumping additional groundwater from the dewatering of active mines, and/or by connection to nearby rivers or existing lakes (Schultze et al. 2010). In Europe, lakes of a certain size (including man-made lakes) have to meet the standards of at least "good" ecological potential, according to the European Community Water Framework Directive (EC-WFD 2000). In addition, regional or federal state quality standards may apply for all discharges from pit lakes into connected surface waters. For instance, in the lignite mining region of central Germany, water quality standards are set for pH, dissolved iron (Fe diss ), total iron (Fe tot ), ammonia (NH 4 + ), sulfate (SO 4 2− ), and specific trace metals. A variety of remediation measures, including in-lake water conditioning, are available to meet these standards and develop and maintain lake water quality for the targeted use (LMBV 2017).
The company responsible for remediation and recovery of the former lignite mining area in central Germany and Lusatia is the "Lausitz and Central-German Mining Admin Company" (LMBV). For coordinated surveillance, action and quality assurance, this company has introduced an operation standard in 2007 that has been revised in 2019, called Montanhydrological Monitoring (MHM;LMBV 2007). This standard includes guidelines for sampling, sample preparation, analytical procedures, and quality control according to German (DIN) and European (EN) harmonized standards. The MHM dictates a uniform and coordinated program of measurements and analyses for all water quality components. This should satisfy both the requirements of quality assurance (including an acceptable ion balance) and model-based decision making. It also means that all specific values or parameters required for the comparison between model-supported prediction and actual state shall be properly recorded.
The MHM standard also claims that after some time (five or more years), the former prediction shall be evaluated. This is a justified step of control since the progress of field conditions may differ from former parameterization of model elements. One reason is that some model parameters are fed by statistical means reflecting "average" conditions in the past, or by assumptions. The evaluation shall be carried out by comparing the previous prediction with the actual water quality data collected by continued monitoring. The present paper aims at showing how and to what detail this task can be fulfilled in practice.
The LMBV is also responsible for water treatment measures to meet the official regulations. Cost estimation and decisions on the most appropriate conditioning technology, and for planning the necessary amounts of conditioning substance(s) (e.g. caustic soda, soda ash, caustic lime, crushed limestone) for initial, and possibly for repeated, neutralization actions require detailed lake water prediction (LMBV 2017). Thus, there are regulatory, compliance, and economic reasons for reliable long-term water quality predictions. As a consequence, cost-effective approaches are desired by which preferably simple, yet valuable and reliable predictions can be gained. Such approaches typically rely on water quality models that are omnipresent and commonly used within the communities of mining, reclamation, limnology, among others (e.g. Castendyk 2009;Vandenberg et al. 2011).
Comparing the performance of established lake models is beyond the scope of this study. Rather, our goal was to develop and test a conceptual workflow with which simple yet powerful lake models can be assembled, calibrated, applied for prediction, and then re-evaluated after some time to assess their reproducibility and predictive aptitude. To the best of our knowledge, there is no common approach to date to quantify all systematic and stochastic variances that contribute to the overall uncertainty of prediction, especially since some of the variances may cancel each other out. Therefore, we wanted to explore the most sensitive indicators that might serve as a proxy of predictive aptitude. This paper presents the procedures and evaluates model performance by comparing the predicted and the monitored water quality of the pit Lakes Bockwitz and Schladitz after 8-10 years. Concerning Lake Bockwitz, this publication builds on the inventory described and illustrated in a previous study on this topic (Ulrich et al. 2012). Our main objective was to demonstrate the applicability of the developed conceptual workflow as exemplified by two lake models with more or less complexity. We also evaluated the adequacy of model setup and incorporated parameters to the site-specific requirements, identified the most important processes and parameters accounting for reliable prediction, and proposed factors and processes that have minor effects on lake water quality and so can be ignored, on a case-by-case basis. This study was enabled through several commercial projects assigned by the authorized remediation company LMBV, from 2000 until now.

Study Sites
Two dimictic lignite mining pit lakes in central Germany with different geologic bedrock but similar genesis and water residence time (≈ 6 years) were studied to evaluate the respective lake models: Lake Bockwitz and Lake Schladitz, located south and north of Leipzig, respectively. Both lakes formed after mine closure in the early 1990s, without any additional supply of water. While the final water level in Lake Bockwitz was already attained in 2004, Lake Schladitz has filled very slowly. According to the most recent hydrological prediction (IBGW 2019), it will not be full until about 2050. Based on the volume ratio (V hypolimnion / V epilimnion ) calculated according to LAWA (1999), as well as the mean and maximum depth, highly stable dimictic conditions are expected for both lakes. Due to a different geologic bedrock, Lake Bockwitz is much more affected by pyrite weathering products than Lake Schladitz. The two study sites are briefly characterized.
The former pit of Lake Bockwitz was confined by dumps to the south and west, and by undisturbed terrain to the north and east. The area consists of glacial and fluvial sedimentary deposits truncated by the bank slopes and laterally connected to the dump terrain. The pit bottom belonged to the lignite seam II, which was only partially excavated. Fluvial sands located between seams II and IV act as aquifers that are connected to upper aquifers due to erosional processes in the Tertiary and Quaternary age deposits. While the soils of these aquifers are characterized by sulfide contents around 0.2% of dry weight (d.w.), the mixed substrates along the bank slopes show highly variable sulfide contents, from < 0.005 to 0.74% d.w. (Ulrich et al. 2012). This publication gives further information on bedrock geology and permeability.
Lake Bockwitz formed as the last lake downstream and the largest lake in a series of smaller lakes that formed within the former Borna-East lignite mine. The difference between total water inflow (groundwater, surface water and precipitation) and outflow of Lake Bockwitz roughly matches the amount of calculated evaporation from the lake's surface. Prominent hydrologic and morphologic data are tabulated (Table 1). Based on the lake's morphology and the low phosphorus concentration of the groundwater inflow, oligotrophic conditions are expected (Hildebrandt et al. 2013a, b). Lake Schladitz remains from the Breitenfeld surface lignite mine located in the Delitzsch-Breitenfeld mining area, north of Leipzig. During mining, from 1986 to 1991, 7.4 × 10 6 t (metric tons) of raw lignite were mined, which was only 3% of the planned amount. When mining stopped, a dam with its crest at +95.1 m above sea level (asl) divided the pit into a northern and a southern basin. These two basins began to fill in 1992. As soon as the water level rose above the altitude of the dam's crest, the two pit lakes merged into one lake, called Lake Schladitz. Concerning the residence times and some other characteristics, both study lakes were comparatively similar as of 2018 (Table 1). However, in contrast to Lake Bockwitz, Lake Schladitz is fed only by quaternary groundwater from the southwest, south, and east, and by precipitation (IBGW 2019). The influent groundwater appears almost unaffected by pyrite weathering products and has not shown signs of acidification. The groundwater effluent is directed northwest towards Lake Werbelin. This part of the shore was built by dumped substrate. Based on the lake's morphology and the low phosphorus concentration of the groundwater inflow, oligo-to mesotrophic conditions are expected in the future.

Conceptual Workflow
The backbone and structural guideline of this study is the conceptual workflow that we developed (Fig. 1). Our approach, testing whether previous water quality predictions are still valid, may also serve as a self-check of our model settings.
The workflow begins with a general field survey, assembly, and plausibility check of existing data (e.g. past time series) and an initial system structure analysis (Fig. 1). The model elements (Fig. 2a) were carefully selected according to the initially analyzed system structure, and parameters and values were implemented based on best current knowledge  and availability. A subsequent sensitivity analysis revealed which elements had major or minor influence on the model output (calculated lake water quality). Our procedure was to set up the lake model with the fundamental elements in analogy to Fig. 2b (component SW in included, Eq. 1), and then step-by-step add other elements and check their influence on the model output. All these additional elements (see Fig. 2a) are explained and quantified in terms of acidity/ alkalinity loads in Ulrich et al. (2012). Model calibration was achieved by comparing the measured time series from the past 3-5 years with the modelled values of the same period of time. Realistic values and statistical means (medians) were used to achieve high conformity for all parameters. When the deviations between measured and modelled values of six (pseudo-) conservative ions (Cl − , SO 4 2− , Na + , K + , Mg 2+ , Ca 2+ ) stayed below ± 5% on average, we considered the model calibration as satisfactory. Details of the calibration procedure are described in Ulrich et al. (2012). If the model consistency was unsatisfactory, the model setup was revisited, refined, and the database improved, e.g. by using inflow loads from daily instead of monthly discharge measurements, or by shortening the monitoring intervals. Our workflow indicates this model consistency check by solid "short loop" blue arrows (Fig. 1).
If these database and model adjustments did not properly reduce deviations and inconsistencies, an "extended (1) calibration loop" became necessary. Its aim was to identify other lake specific processes and interactions, and/or determine critical parameters from field investigation and/ or laboratory experiments instead of implementing arbitrary assumptions. Through the case example of Lake Bockwitz this approach has been demonstrated in detail elsewhere (Ulrich et al. 2012). After these process parameters were integrated in the model, the check of model consistency was repeated. Theoretically, both the short and the extended calibration loops can be repeated until acceptable model consistency is reached (Fig. 1). Thereafter, the lake model appeared calibrated and we applied it for lake water prediction. This was also the basis for planning and estimating the efforts of lake water conditioning. The execution of in-lake water treatment in turn enabled us to relate the real supply of neutralization agent to the estimated demand. From this comparison, one can derive a rough estimator of uncertainty for the lake model as a whole, while accounting for an independently determined efficiency factor for the neutralization agent.
According to the proposed workflow, the lake model will be evaluated after five or more years. The aim is to check whether or not the model settings are still appropriate. Thus, we have calculated the differences between measured and previously predicted concentrations of the six (pseudo-) conservative ions and for some reactive constituents like pH, K S4.3 /K B4.3 , and Fe diss . We consider the lake models as adequate when the total average deviation ranges around or below ± 10%. If true, the prediction is kept and can even be extended. With larger deviations, the model settings will be checked and refined. This "correction loop" is shown by a dotted blue arrow in Fig. 1. It provides a repeated check of model consistency with the possible need for subsequently passing the "short loop" or "extended loop," according to the conceptual workflow.

Data Acquisition
All monitoring data from the field used in the Lake Bockwitz and Lake Schladitz models were obtained according to the MHM operation standard (LMBV 2007). This standard foresees annual measurements of groundwater quality, quarterly measurements of lake water quality, and monthly measurements of surface runoff (Supplemental Table S-1). All hydrologic data (for example, inflow of groundwater or water level variations) were taken from the hydrogeological model by IBGW, as described in the next section. This model is based on rainfall and climatological time series and groundwater level monitoring.
The hydrochemical data were collected through routine monitoring and specific consultant contracts under the guidance of LMBV. Major water quality components included pH, electrical conductivity, concentrations of oxygen and major anions (SO 4 2− , Cl − , NO 3 − , HCO 3 − , or total inorganic carbon) and cations (Na + , K + , Mg 2+ , Ca 2+ , Fe 2+ , Fe diss , Al diss , Mn 2+ , NH 4 + ) and specific trace elements, if relevant. The buffering capacity and the neutralization potential were measured in the field and calculated in the model by titration with acid (K S4.3 ) or base (K B4.3 , K B8.2 ).
The supply of soda ash into Lake Bockwitz was incorporated in the lake model until 24.11.2010, based on the monthly total amount. Additional soda ash was incorporated until 27.07.2011 in a later revision of the model. For the "extended calibration loop," the intent was to identify the predominant sources of acidity from the given balance areas. As previously reported (Ulrich et al. 2012), additional data (Table S-1) were gained from: • erosion monitoring station installed on the shore banks for that purpose, • seepage collection near the shoreline, • temporary groundwater monitoring wells deployed near the shoreline for that purpose, • laboratory (column) experiments on post-acidification and elution of vadose zone substrates and lake sediments.
Based on this work, the methodology was evaluated to obtain critical parameters from field and lab investigations and integrate the parameters into hydrogeochemical and physical transport models.

Model Description
For each study lake, a hydraulic mass balance model was set up in 2003, primarily to estimate the demands for remediation; for instance, initial lake water neutralization and follow-up care (only relevant for Lake Bockwitz, where the supply of soda ash had to be predicted). Ionic and compound concentrations were coupled with hydraulics to calculate mass balances for all recorded influx, efflux, and inlake physicochemical reactions. The fundamental terms of the hydraulic and mass balance equations are listed (Eq. 1). This equation includes all elements of this balance including precipitation, bank shore runoff and erosion, river inflow and effluent, groundwater inflow and effluent, hypodermic interflow, flooding water, geochemical precipitation processes, and interactions with the atmosphere (evaporation, oxygen/carbon dioxide gas exchange). In the current setup, the lake is considered an aerobic reactor mixed at a water temperature of 10 °C, ignoring biological processes. Hence, the model results describe the lake water quality for the seasonal periods of full circulation, as there was no separation of water layers (epi-/hypolimnion).
Each lake was sectioned into local balance areas. Fundamental model inputs were output data (discharge volumes or proportions) for balance areas taken from the HydroGeologic large-scale model designed for the greater area north (HGMN) and south (HGMS) of the city of Leipzig, based on the finite-volume groundwater flow and transport model PCGEOFIM Mansel et al. 2011;Sames et al. 2010Sames et al. , 2019. Stochastic data of groundwater levels from monitoring wells were transferred to mean spatial hydraulic flow and groundwater recharge for each balance area (IBGW 2010). Then, groundwater inflow and effluent were converted from the unit of m 3 /min to m 3 /time step in the lake model. The sum of all inflow volumes and the lake volume were used to define a unit volume of 1 in order to determine the relative share of inflows and lake water.
The water composition data were linked with the respective inflow and effluent elements to calculate mass loadings. For precipitation, rainwater composition data were taken from BGD (2002). The model elements precipitation, surface water inflow, and the distant-shore groundwater balance areas were assigned constant quality data over the entire modeling period. The water quality of near-shore groundwater and interflow was modelled as changing over time (see Ulrich et al. 2012 for details).
Only measured water quality data with an ion balance error of less than ± 5% were used for the initial concentration of a model run, after a preliminary check 1 3 for plausibility and consistency.
Step 1 of the model run then solved the balance equation (Eq. 1), and the obtained result served as the lake water composition's value for the start of the next time step. The gas exchange with the atmosphere was simulated by establishing steady-state equilibrium of the lake water with the partial pressures of CO 2 and O 2 . Evaporation was considered as a loss of pure water.
Mineral precipitation reactions were considered individually for each lake. The water composition (from measurements) was analyzed by the open-source PHREEQC code (Parkhurst and Appelo 1999) to determine the thermodynamic hydrogeochemical species distribution. Information on the saturation indices of mineral phases such as goethite were kept constant throughout the prediction period. One should be aware that PHREEQC calculates mineral precipitation up to the stage of thermodynamic equilibrium. However, equilibrium conditions are rarely reached in nature.
The sensitivity analysis showed that different model elements had to be selected for the model setup. For Lake Schladitz, four major system elements were identified as sensitive for the lake model: groundwater inflow and effluent, precipitation, bank slope runoff, and erosion (Fig. 2b). According to the conceptual workflow ( Fig. 1), the "short loop" was sufficient to prove the consistency of the model calibration. Obviously, the system was relatively simple and homogeneous due to the inflow of quaternary groundwater as the only water source besides precipitation, and the prevailing buffering capacity. The calibration period lasted from 2003 to 2008.
In contrast, the case example of Lake Bockwitz was more complex. Because the initial model calibration trial failed, an "extended calibration loop" was necessary to identify the most sensitive processes. So, we explored erosion and groundwater recharge along the shore banks, interflow, and exchange processes at the sediment-water interface, and determined respective parameters from field and laboratory experiments (including pyrite oxidation rates, elution and diffusion rates, recharge rates, seepage, and erosion loadings). These data were collected from 2006 to mid-2011 and then implemented in the lake model (Fig. 2a). In the second trial (calibration period 2008 to mid-2011), convincing conformity between field data and modelled data was achieved for almost all water quality parameters (BGD 2012). Based on this refined model calibration, the water quality has been predicted until the year 2050 (Ulrich et al. 2012), and the results were shared with the client.
For both lake models, the following general assumptions and side conditions applied: • Estimation of average hydrological and hydrogeological conditions, i.e. average water balance in the catchment, including average annual precipitation and evaporation, average annual groundwater recharge and discharge conditions for surface influents; • Homogenous water composition of the whole lake water body, i.e. mixing conditions; • Steady-state groundwater level and composition (unless processes were known that changed these conditions); • Regional or general hydrogeology and hydrochemistry, concerning in particular predominant mineral phases, thermodynamics, and saturation indices; • Constant or steady-state partial pressure conditions unless modelled as reaction-dependent variables (e.g. pCO 2 as a function of [TIC], pH and temperature).
It is important to note that lake models cannot predict stochastic weather conditions like flood events, droughts, consecutive above-average "wet years" or below-average "dry years," and subsequent variations of the groundwater table and flow regime. Moreover, statistical quantification of data-fitting quality was beyond the scope of this study and was not feasible within the commissioned orders.
The model evaluation according to the workflow ( Fig. 1) took place after 8 years for Lake Bockwitz and after 10 years for Lake Schladitz. The evaluation started by comparing the predicted lake water quality with the field data collected from mid of 2011 until 2018 in case of Lake Bockwitz, and from 2009 to 2018 for Lake Schladitz.

Lake Bockwitz
The previous water quality prediction was evaluated in two steps. First, the conservative and less-reactive water constituents were studied. Acceptable conformity between modelled and monitored conservative constituents would prove the overall consistency of the hydraulic balance of the whole system. Second, the more reactive water constituents were used to indicate whether the hydrogeochemical settings were appropriate, or if any readjustments were required.

Conservative Water Quality Constituents
The lake water concentrations of conservative and lessreactive ions showed acceptable conformity between monitored field data and past prediction values for SO 4 2− , Na + (Fig. 3a), Cl − , K + (Fig. 3b), Ca 2+ , and Mg 2+ (Fig. 3c). While the ion specific deviations between measured and predicted values for the lake mixing periods ranged between about − 6% and +13% (n = 16), the total average deviation for these six (pseudo-) conservative ions amounted to +4.5% (Table 2; additional data are given in Supplemental Table  S-2). Although this deviation was significantly higher than  during the calibration period (+0.2%), this finding demonstrates an excellent predictive power for the less reactive water constituents. Despite some fluctuation, the SO 4 2− and Na + concentrations revealed declining trends. No significant trends were observed for the Ca 2+ , Mg 2+ , K + , and Cl − concentrations. However, the measured Cl − concentration deviated from the predicted close to steady-state concentration by a slight concentration increase of up to 22% from 2013 onward (Fig. 3b). Therefore, the average deviation during the 8 years of prediction evaluation reached almost +13%, compared to the calibration period (− 0.3%). Almost similar figures with opposite signs were found for K + . Considering Cl − is a strongly conservative ion, the discrepancies between real and modeled hydraulic data may provide an explanation for these deviations. Therefore, we checked the sensitivity of the implemented hydrological data in the Lake Bockwitz case study (Table 3).

Precipitation
The most reliable hydrological data set was precipitation. While the lake model used a 30 year average precipitation of 606 mm/a for each predicted year, the real precipitation data ranged between 342 and 684 mm/a, with a mean ± standard deviation of 553 ± 103 mm/a and a median of 555 mm/a during the period 2011-2018. Hence, the long-term average used in the prediction ranges within the standard deviation of annual precipitation calculated for this time period. Thus, the assumption made for the prediction appears valid for longer-term climatic conditions. However, stochastic weather conditions and seasonal fluctuations of precipitation are not predictable for a distinct year. Therefore, unless the local groundwater table had not yet reached stationary conditions (see below), other hydraulic conditions and processes in the model affected by regional precipitation could only be assumed to be close to steady-state or to be balanced on the longer term. This assumption includes, for instance, fluctuations of the lake water level, infiltration and groundwater recharge, surface runoff, and river discharge.
While it was not possible to evaluate the natural variation of groundwater composition in detail, because groundwater recharge is affected by seasonal precipitation fluctuations, we wanted to explore how many years it would take until annual precipitation variations would be balanced and converge to the 30-year average condition. This information may also indicate after what period of time slight changes of climatic conditions would require readjustment of the model settings. To address this question, we compared the monthly sum of precipitation of individual years in the prediction period (starting in 2010) with the average distribution calculated from monthly precipitation of the years 1981 to 2010 for the Rötha monitoring station near Lake Bockwitz (courtesy of Deutscher Wetterdienst, DWD). Figure 4a shows that the monthly sum of precipitation of an individual year during the prediction period sometimes greatly deviated from the average value, and that rather wet years with elevated annual precipitation occurred in 2010 (802 mm/a) and 2017 (684 mm/a), and a notably dry year with diminished annual precipitation occurred in 2018 (342 mm/a). Next, we tested after what period of time the stochastic fluctuations would merge towards the average value. Figure 4b depicts the monthly average precipitation calculated for the years 2010-2011, 2010-2012, 2010-2013, 2010-2014, 2010-2015, 2010-2016, 2010-2017, and 2010-2018 Table S-3). Hence, at least 7 years of precipitation records were needed to converge the monthly 30-year averages as Overall insignificant changes Steady-state, no changes Water quality of pit lake Südkippe Permanently pH < 3 with decreasing trend of mining specific indicator parameters Slightly higher Fe diss concentration, slightly lower Al diss and SO 4 2− concentration In-lake neutralisation with soda ash (∑ 2011) 757 t/a = 13.8 × 10 6 mol eq until Dec. 2011 304 t/a = 5.6 × 10 6 mol eq between Jan. and July 2011 implemented in the lake water quality model. However, in February and April, all the calculated medians were less than the 30-year average precipitation of these months. This observation could be an indicator for (regional) climatic change with lower winter/springtime precipitation. This example demonstrates that for the evaluation of reliability of prediction of average (boundary) conditions, extrapolated from the past to the future, no single year during the prediction period shall be selected. Rather, an average of at least 7 to 10 years, shall be used.

Fluctuation of Lake Water Level
The mean lake water level for the years 2011 to 2018 was +146.46 ± 0.11 m asl and thus about 0.5 m higher than the targeted lake water level (+146.0 m asl). Fluctuations were mainly due to variations of precipitation and water inflow, regulated by a flashboard weir in vertical steps of 20 cm. The dam and weir have operated since 12/2006.

Surface Water Inflow and Discharge
The previous prediction was initially based on the water balance of IBGW (2010), in which the average inflow from pit  -2011, 2010-2012, 2010-2013, 2010-2014, 2010-2015, 2010-2016, 2010-2017, 2010-2018. Data courtesy of Deutscher Wetterdienst lake Südkippe into Lake Bockwitz was estimated at 792 m 3 / day (Table 3). Selected reference day measurements (n = 50) within the monitoring period 2012 to 2018 indicated an average inflow of about 916 m 3 /day (0.64 m 3 /min). This number matches the aforementioned reference day measurements quite well. Concerning the influent water quality, the monitoring data agreed with the model prediction, with the exception of slightly higher Fe diss concentrations and slightly lower Al diss and SO 4 2− concentrations than predicted by the lake model (data not shown).

Groundwater Hydraulics and Composition
To evaluate the model assumptions used for lake water quality prediction, hydraulic data from the previous prediction (IBGW 2010) and data for 2016 (IBGW 2017) were compared. The groundwater table intermittently responded with a slight rise after the wet years 2010 and 2013, and lowered again after the relatively drier 2012 and 2014. Hence, on a longer term, approximately balanced groundwater inflow into Lake Bockwitz from the upper aquifers can still be expected.
The groundwater quality of the dominant inflow showed either no significant trend or a minor decline of mining sensitive constituents like SO 4 2− and Fe diss . In contrast, seepage water from the dump aquifers revealed significantly increased Fe diss concentrations, for instance from 14 mg/L in 2004 to 28 mg/L in 2016 at one monitoring well (#36951). However, the hydraulic proportion of this type of groundwater inflow was estimated to only be 6% in 2016 and 7% at stationary conditions. Overall, the observed quality changes appeared almost balanced; thus, the assumption of steady composition of the distant-shore groundwater was kept in the model settings.

Conclusion for Conservative Water Quality Parameters
The evaluation of the previous model prediction for the conservative and low-reactive water quality constituents and the verifiable hydraulic conditions demonstrate that the general model setup and assumptions made for hydraulic conditions are overall consistent and valid for Lake Bockwitz. Thus, we can proceed in the model evaluation process by comparing the monitored field data and past

Soda [t/month] pH [-]
Soda ash supply pH model result pH analyzed in lake pH analyzed in lake effluent Calibration period a prediction results for more reactive water quality parameters that are usually sensitive to mining induced changes of redox state chemistry and pH.

Reactive Water Quality Constituents
High influx into the lake coupled with substantial acid generation and parallel neutralization measures with discontinuous supply of soda ash into the lake will enhance complex geochemical reactions and interactions of the reactants, of which H + (recorded as pH), HCO 3 − (measured as K S4.3 ), Al diss , and Fe diss are most prominent and shown next. For the pH time series, an excellent conformity between measured and modelled values was found over the whole monitoring time, i.e. for the periods of calibration (mean deviation − 1.5%, n = 6) and evaluation (2011-2018; mean deviation − 0.5%, n = 16) (Fig. 5a). It is particularly remarkable that the model data correctly predicted the fast pH decline after the soda ash supply terminated by the end of 2011. One and a half years later, the pH fell below pH 4.0 and levelled off at around pH 3.5 during the following years. It was assumed that the continuously high influx of acidity from interflow and ion exchange reactions with the lake sediment were the major drivers for the rapid re-acidification of the lake water.
The overall conformity between measured and modelled K S4.3 values was excellent during the calibration period, when soda ash was supplied to Lake Bockwitz (mean deviation -0.1%, n = 6). During the period of soda ash supply, the measured values showed higher fluctuations than the modelled values, which can be explained by the seasonal changes of lake mixing and stagnation not reflected in the stirred reactor model (Fig. 5b). After the pH dropped below pH 4.3, K S4.3 values turned negative, determined as K B4.3 by titration with base (0.1 N NaOH). From 2014 on, the model predicted slightly higher values of K B4.3 (acidity) than were measured in the lake (Fig. 5b). Hence, the conformity between measured and previously predicted values deteriorated during model evaluation (mean deviation +36%, n = 14). Causes for stronger proton buffering in Lake Bockwitz are so far unknown.
Concerning the Al diss concentration, an obvious mismatch between the recorded and predicted time series can be seen in two respects (Fig. 6). First, there is a time shift in the onset of concentration increase from close to zero, corresponding to the shift of pH decrease due to the additional in-lake neutralization, as described above. Second, the Al diss concentrations in the lake water from 2013 on (1.5-2 mg/L) was substantially less than the predicted concentrations, and appeared close to a steady-state condition, while the predicted Al diss concentration rose steadily and reached 6 mg/L by the beginning of 2019 (Fig. 6).
Another even greater mismatch was found for the Fe diss concentration. Here, the lake model predicted a pH-dependent onset of concentration increase from close to zero at the same time (second half of the year 2011) as the Al diss concentrations increased (Fig. 6). For both metals, the predicted concentration functions had a logarithmic shape. In 2019, the Fe diss concentration would reach and probably exceed 24 mg/L. However, until 2018, the analyzed values did not show such a severe concentration increase at all. The Fe diss concentration reached 1 mg/L in 2016 and an interim maximum of 1.5 mg/L in 2018 (Fig. 6).
Possible causes for these surprisingly high deviations between predicted and measured field data could be overestimation of the influx of Al diss and Fe diss (first hypothesis), or underestimation of Al and Fe depositional processes in the lake (alternative hypothesis). To check the first hypothesis, an overestimation of cationic influx into the lake (or dissolution of Al and Fe minerals in the lake) would require an overestimation of the charge-balancing anionic influx (or dissolution of Al and Fe minerals containing the responsible counter-ions). However, the time series of measured SO 4 2− concentration matched the model prediction (Fig. 3a). To prove this fact, we also checked the ionic balances of near-shore bank slope seepage sampled from the most loaded temporary monitoring wells located close to the lake's shoreline. As an example, the ion balance of seepage water collected near the east shore of Lake Bockwitz shows that the predominant charge-balancing counter-ions of SO 4 2− were Fe 2+ , Al diss , and Ca 2+ (Fig. 7). Hence, because the real influx of SO 4 2− did not substantially deviate from the model prediction (Fig. 3a), the real influx of chargebalancing cations likewise cannot substantially deviate from the model prediction (Fig. 8). Thus, the second hypothesis comes into consideration, i.e. inconsistent model settings of in-lake depositional processes for Al diss and Fe diss under changing pH conditions.

Refinement of Thermodynamic Database
The distinct mismatch of the past model prediction from 2011 (Ulrich et al. 2012) with the monitored Fe diss and Al diss concentrations showed that with decreasing lake water pH, precipitation of Fe and Al mineral phases were underestimated in the lake. Since the thermodynamic data set of PHREEQC.dat (Parkhurst and Appelo 1999) used for the previous prediction did not include the intermediate mineral phases of schwertmannite and hydrobasaluminite, we imported the log K sp data published by Sanchez-España et al. (2011) into the PHREEQC database. Both these minerals are known to form in the environment of acid mine drainage (AMD) exposed to the atmosphere . All these mineral phases are difficult to identify; often the thermodynamic data depend on the particular stoichiometry, and solubility constants are rather variable or even unknown. During lake neutralization, there was no need to consider these poorly defined phases in the lake water model because the pH was either < 4 or > 6.5. This refinement of the thermodynamic database considerably reduced the level of the predicted Fe diss concentration (Fig. 6) and substantially improved the conformity between measured and modelled Fe diss concentrations, now showing a mean of -3.5% (Fig. 8). While the Fe diss concentration measured in 2017 and 2018 varied around 0.5 mg/L in the lake, the modelled Fe diss concentration slightly increased up to 1.8 mg/L in 2018. In contrast, the initially modelled Al diss concentration (Fig. 6) only slightly diminished, and a considerable mismatch of up to 3 mg/L still remained at the end of 2018. These results together show that under the current environmental conditions, schwertmannite can be considered the major solubility controlling phase for iron in Lake Bockwitz.
However, the published solubility constant of hydrobasaluminite does not unequivocally indicate this mineral as the solubility controlling phase for aluminum. Moreover, other processes like sorption and cation exchange may be involved but were currently not adequately parameterized in the thermodynamic database.

Lake Schladitz
Based on the simple lake model and successful model calibration for the time period 2003 to 2008 (Fig. 2b), the prediction period for Lake Schladitz lasted from 2009 until stationary conditions were expected (around the year 2050) (BGD 2009). Excellent conformity was observed for almost all of the (pseudo-) conservative and more reactive water quality constituents analyzed until 2018 (IFUA 2018) (Supplemental Table S -4). While the ion specific deviations between measured and predicted values for the lake mixing periods ranged between about − 1.3% and − 32% (n = 21), the total average deviation for the six less reactive ions amounted to − 9.1% (Table 2; Supplemental Table S -5). Although this deviation was considerably higher than during the calibration period (− 2.7%), the result demonstrates an acceptable predictive power for the (pseudo-) conservative water constituents and thus the system water balance (Supplemental Tables S-6). Figure 9 illustrates the comparison between analyzed and modelled lake water concentrations for SO 4 2− and Na + (Fig. 9a), Mg 2+ and K + (Fig. 9b). Checking the more reactive water constituents, the mean deviations between measured and previously predicted values were excellent for pH (− 0.2%, n = 21) and fair for the K S4.3 data (− 17%, n = 21) (Fig. 9c). However, the percent deviations were much greater for the Fe diss concentrations (− 399%, n = 10), because the predicted values were as low Al-diss analyzed in lake Al-diss analyzed in effluent as 0.1 mg/L and the analyzed Fe diss concentrations ranged between 0.08 and 0.01 mg/L (limit of detection). For such a low concentration level, evaluating the difference on a percentage basis appears meaningless. Thus, the simple model setup and the parameter settings were considered to be appropriate, and no adjustments or change of model elements were required.
Concluding for Lake Schladitz, the lake water quality model consistently matched the groundwater influx. The loadings of acidic ions like iron, aluminum, and sulfate were of minor importance, and consequently coupled geochemical reactions in the lake water and their parameter settings were not relevant in this particular case.

Discussion
The results of this study show that assessing previous predictions of pit lake quality after several years of continued lake water monitoring may help increase their reliability and trustworthiness. This in turn affirms the applicability of our conceptual workflow (Fig. 1). While for one study site (Lake Schladitz), we confirmed that the lake model was well suited for future water quality prediction (Table  S-3), we had to improve the initial model of Lake Bockwitz. The model assessment, according to the workflow (Fig. 1), revealed that specific adjustments to model settings were necessary. Below, we explain the reasoning and discuss the Fig. 9 Comparison of Lake Schladitz water quality data from model prediction (solid lines) with field data (symbols, dotted lines) for a SO 4 2− and Na + , b Mg 2+ and K + , c pH and K S4.3  uncertainty of our predictions. In addition, we want to elaborate the prominent processes and parameters that contribute to reliable predictions as opposed to factors and processes with minor relevance. This approach could be considered as retrospective sensitivity analysis. The general approach was to start with a simple yet adequate model setup that includes the most relevant system elements and processes and their proper parameterization. Of course, one would like to quantify the contribution of uncertainties of each model element or process to the overall uncertainty of prediction. We are not aware of a common methodology by which process specific uncertainty could be quantified. In general, stochastic errors (e.g. climatic variability) and systematic errors (including monitoring planning, field sampling, and analytics) will contribute by varying degree and direction to the overall uncertainty of prediction. Some elements of uncertainty may even cancel each other out.
Hence, our approach was to case-specifically feed the lake models with all of the relevant processes with the best possible parameterization gained through field and laboratory studies. This effort intrinsically reduced the uncertainty of predictions. Two sets of water constituents appear useful for model calibration and later evaluation: (i) less reactive (conservative) ions, by which the overall consistency of the water balance can be checked, and (ii) more reactive constituents affected by redox, acid-base, and dissolution/ precipitation reactions, among others. While for the first set of constituents deviations between ± 5 to ± 10% of the average total were used for calibration and ideally also for the period of prediction, larger deviations may arise for more reactive water constituents. Among these, acidity or alkalinity (analyzed as K B4.3 or K S4.3 ) represent the most characteristic values, because they are affected by many interacting processes including loading of acidity and alkalinity, oxidation, hydrolysis, and mineralization reactions in the whole lake system.
In the case of Lake Schladitz, the absence of surface water inflow and the origin of matter loadings primarily from quaternary aquifers meant comparatively low acid loads. As a consequence, the hydrogeochemical processes were simple and could be described by a low number of model elements and process parameters (Fig. 2b). The major contributor to the mass balance and the most sensitive model element was groundwater inflow. Of minor relevance, but still significant, was matter imported by surface runoff and erosion from slopes due to precipitation and waves. In this particular case, the lake model could be calibrated without additional process parameter identification. No "extended calibration loop" (Fig. 1) was necessary. Moreover, evaluation of the model prediction after 10 years of continued monitoring showed that there was no need for any readjustments to the model settings or parameterization. The model elements and their architecture proved adequate and reliable and are recommended for future predictions. We would expect a similar approach with simple parameterization for other lakes where the bedrock and groundwater chemistry is known to be relatively homogeneous over space and time.
The case of Lake Bockwitz was different in many ways. Both the model calibration and validation procedures required an iterative approach, following the conceptual workflow (Fig. 1). Several loops and model refinements were necessary to improve the consistency between measured and predicted (modelled) water quality values. This was largely due to the site's mining history and the condition of the mine at the stage of closure; in addition, the bedrock and groundwater chemistry were extremely heterogeneous, both spatially and over time. Hence, the lake model had to be expanded step by step to integrate all of the relevant process parameters. At the onset, around 2004, the processes and sources generating and delivering acidity into Lake Bockwitz were unknown. For some process parameters, initial guesses were obtained from the literature, but these guesses appeared to be unreliable and misleading. This insight became the motivation for sophisticated process parameter identification, a procedure that lasted several years. The major outcome has been published (Neumann et al. 2007(Neumann et al. , 2008Ulrich et al. 2012).
In brief, these studies demonstrated that for Lake Bockwitz, the major contributor to the overall loading of acidity was the interflow, i.e. the seepage along the bank shore and through adjacent dumps and inner-mine dumps. This process also presented the most sensitive model element (Supplemental Fig. S-1). Other major contributors were surface water and groundwater inflow. For the latter, the sensitivity was depending on the load contribution of the individual balance area, resulting from the respective product of inflow (Q GW,in ) and matter concentration (c GW,in ).
Further sensitive and meaningful contributors to the lake water quality were settling solid materials and precipitates, typically formed by the pH-dependent hydrolysis reactions of iron and aluminum (less significant for manganese). The importance of adequate parameterization of these processes by implementing additional geochemical reactions and their thermodynamic constants into the PHREEQC database was the major outcome of the present study. During the phase of initial technical neutralization, no modifications to at-thetime standard thermodynamic database were required. However, during the observational monitoring, the lake water pH slowly declined due to reacidification, and within the range between pH 3 and 5, incorporating the solubility of schwertmannite helped improve the reliability of the model prediction for the Fe diss concentration. The appearance of schwertmannite is widely known from acid-mine settings worldwide, including the Iberian Pyrite Belt (Sanchez-España et al. 2011), the Berkeley pit lake in Montana, USA (Gammons and Duaime 2006), and the Lusatian mining district, Germany (Peiffer et al. 2013;Regenspurg et al. 2004;Totsche et al. 2003, among other citations).
Diffusional and ionic exchange processes across the sediment-water interface were also quantified by simulation and soil experiments in the laboratory (for details, see Ulrich et al. 2012). These processes had a definite effect on the overall mass balance, but were less important for Lake Bockwitz than the aforementioned processes. In contrast, processes like bank shore runoff and erosion, as well as wind-driven erosion by waves were determined to be meaningless and thus negligible in the cases of Lake Bockwitz and Lake Hainer See. The latter is a mining lake not far from Lake Bockwitz where another field erosion monitoring station had been installed and used for comparison. Although these model elements were not sensitive for the model of Lake Bockwitz, the contributions of precipitation, evaporation, as well as runoff from shore banks and erosion should still be checked for pit lakes lacking above-ground tributaries and flooding water inflow (as was the case at Lake Schladitz). While the quantity of such water sources may be comparatively low, they can still play a more than negligible role in matter loadings. Consequently, no generalization about the major contributing processes can be suggested so far, meaning that all process identifications have to be done site-specifically.
Please note that a self-consistent mass balance is of fundamental importance. For pit lakes, major contributors to this balance are the fluxes of acidity and alkalinity. Nevertheless, all major cations and anions relevant for calculating an ion balance must be monitored on a regular basis. The present work also demonstrates that with appropriate parameterization of the essential processes, stirred tank reactor models are normally sufficient to achieve a high level of prediction reliability and thus can be considered economically adequate. In order to cross-check the model accuracy, we used the Bockwitz Lake model for predicting the amount of soda ash needed for neutralizing the acid load. The deviation between predicted soda ash demand and technical supply ranged within an order of less than ± 5% in terms of molar equivalents. This is consistent with the mean deviations between measured and predicted water constituents reported above, verifying an excellent predictive aptitude of the lake model. Altogether, these results indicate that the sensitivity analysis at an early stage of the proposed workflow (box #3 in Fig. 1) has been expedient.
In the future, the model could be a valuable tool to predict effects of hydrological changes in the context of climate change on lake water quality and future remediation efforts. This would require both consistent hydrogeological models like PCGEOFIM Mansel et al. 2011;Sames et al. 2010Sames et al. , 2019) and best quality monitoring data of high spatial and temporal resolution. However, minor deficits still remain. The mineral precipitation reactions, which are subject to complex interactions under field conditions, can only be approached to some degree with PHREEQC. Obviously, the correct mineral phases and solubility constant for the observed aluminum precipitation still remain to be found. One cannot exclude that in addition to mineral solution/precipitation reactions, other processes like adsorption/desorption and cation exchange have to be better configured in the PHREEQC database. Hence, from a scientific perspective, more extensive and detailed investigations on the parameterization of hydrogeological interactions would help fill the gaps and improve the understanding of natural process interactions. However, from a practical perspective motivated by remediation management, it appears questionable whether such extensive research would be commensurate with the expected degree of prediction improvement, given the stochastic variability of natural processes in pit lake systems.

Conclusions
The ultimate goal of water quality prediction of pit lakes is to achieve the highest possible reliability with reasonable economic effort. From the present case studies and similar work in the past, we achieved some general insights: 1. While a (statistical) quantification of prediction uncertainty is highly desirable, only a first approach to this objective was feasible in the present work, given the system's complexity and amount of interacting factors. To compensate, we tried to identify and quantify all major processes that affected the lake water quality. We found that the reliability of model predictions increased when the determining processes were better implemented and parameterized in the lake water quality model. The major prerequisite was site-specific system analysis to transfer process understanding into appropriate model structure and settings. We think it is important to agree on an acceptable range of predictive power in advance, with proactive input from authorities and stakeholders. 2. A well-founded conceptual model, early sensitivity analysis, and consistent model calibration were also prerequisites to reach the prediction objectives. The proposed conceptual workflow (Fig. 1) offers useful guidance for determining required expenditures for pit lake remediation with improved trustworthiness. 3. Still of fundamental importance is expert knowledge about the process determining parameters, including, for instance (pyrite) weathering and oxidation rates, elution rates, sediment-water exchange rates, erosion rates, and groundwater recharge rates. Such rates must be determined site-specifically. Applicable guesses can be obtained from laboratory database systems, as recently developed by the BGD ECOSAX Company (Dost et al. 2018(Dost et al. , 2019. The proposed conceptual workflow (Fig. 1) and the basic water quality model structure (Fig. 2a) are well suited to be applied on a broad range of man-made pit lakes and may be extended to natural lakes. For future practice, we recommend continuously improving the site-specific process understanding. This target shall include careful identification of the dominant processes and reliable parameters from field studies and laboratory experiments, collection of standardized high-quality monitoring data, and application of reasonable statistical tools to transfer multiple field information into structurally simplified model settings. Concerning future developments, verification of predominant model-sensitive compounds and aqueous ions by statistical tools as principal component analysis (PCA) may be a next step to gain further insights and reliability for water quality prediction.