Critical evaluation of stable isotope mixing end-members for estimating groundwater recharge sources: case study from the South Rim of the Grand Canyon, Arizona, USA

Springs and groundwater seeps along the South Rim of the Grand Canyon (Arizona, USA) are important for the region’s ecosystems, residents (human and animal), and economy. However, these springs and seeps are potentially vulnerable to contamination, increased groundwater extraction, or reduced recharge due to climate change. In this study, statistical methods are used to investigate δ2H and δ18O in precipitation, surface water, and groundwater to determine groundwater source. A mixing model for δ18O is developed using statistically distinct seasonal end-members represented by modeled winter (Nov–Apr) precipitation and summer (May–Oct) surface-water run-off. The calculated fraction of winter recharge (Fwin) indicates that South Rim groundwater is primarily sourced from snow-melt and winter rains with an average Fwin of 0.97 ± 0.09. Groundwater sourced from the highest elevations of the study area are more depleted than the winter end-member, suggesting values of Fwin are overestimated or a meaningful portion of winter recharge occurs at lower elevations. Lower-elevation recharge from the Coconino Plateau is supported by consistent spatial trends in δ2H and δ18O with respect to longitude, Fwin values <0.9 for 9 of the 50 samples, and age tracer data indicating young groundwater discharging from springs which is distinct from old groundwater observed in the regional flow system. These results suggest a new conceptual model is needed to account for recharge sources from low elevation and summer precipitation. Results imply resource managers may need to reconsider current land-use and water management practices on the South Rim to protect future water quantity and quality.


Introduction
Natural variations of the stable isotopic ratios of hydrogen and oxygen in water (δ 2 H and δ 18 O, respectively) provide valuable tracers for understanding elevation and seasonally variable recharge (e.g., Ingraham et al. 1991;Jasechko et al. 2014), inter-basin groundwater flow (e.g., Davisson et al. 1999;Genereux and Jordan 2006), and paleo-recharge (e.g., Fontes et al. 1991;Phillips et al. 1986;Zhu et al. 1996). Water isotopes are well suited for such investigations because they are conservative tracers, excepting evapoconcentration (Craig and Gordon 1965;Gat 1996), and measured values generally fall along a meteoric water line (MWL) characterized by global or local measurements of precipitation (Craig 1961;Putman et al. 2019). Because identifying recharge sources and quantifying their relative contribution to groundwater is critical for understanding groundwater flow systems, a wide variety of isotope mixing models have been presented in the literature. Two main themes in the development and evaluation of mixing models is the need for accurate conceptualization of the flow system and definition of representative end-members (Klaus and McDonnell 2013a and references therein). Conceptualization of primary recharge sources and direction of groundwater movement, including primary flow paths and discharge points, is necessary to appropriately constrain the mixing model. Refining the system conceptualization is, in large part, the purpose of mixing model investigations but an initial conceptualization is required to establish model parameters and goals. End-members are then selected within the context of the conceptual model and are dictated by data quality and variability. End-member selection is an important driver of research questions (i.e., what can be reasonably deduced from the data?) and the level of uncertainty in results.
The most common approach in groundwater studies for defining δ 2 H and δ 18 O mixing model end-members is to use isotopes in precipitation. Substantial effort has focused on understanding long-term precipitation δ 2 H and δ 18 O trends and appropriate methods for aggregating the data (e.g., time period, precipitation weighting) for a specific study's intended purpose (e.g., Eastoe and Dettman 2016). Robust estimation of seasonal and long-term variability of δ 2 H and δ 18 O precipitation requires extensive data due to substantial interannual variability (e.g., Ingraham 1998) where a fundamental issue in data collection has been the lack of colocated, coeval sampling. In practice, year-round sampling is difficult and resource intensive to conduct over extended periods. Predictive modeling of isoscapes (e.g., Bowen 2018) has provided one possible solution to the problem of data availability. Once appropriate data are collected and/or compiled, investigators must then verify that temporal or spatial end-members are meaningfully different from one another. Establishing significant differences of mixing model end-members has often been overlooked or not explicitly reported in many previous hydrologic investigations, presenting a challenge in evaluating the results of such research. Uncertainty in mixing models' results are largely controlled by difference between and certainty in end-members (Genereux 1998). Evaluation of endmember values, whether by formal statistical methods or by consideration of the standard errors, provides critical information about the certainty/robustness of the mixing model results.
In terms of evaluating the conceptual model, in some cases, δ 2 H and δ 18 O of precipitation may not be representative of groundwater recharge. Two examples are rainfall evapoconcentration prior to infiltration and recharge (e.g., Barnes and Turner 1998;Ingraham et al. 1998) and evolution of snowpack through melt and sublimation (Cooper 1998;Earman et al. 2006), both resulting in a more isotopically enriched recharge compared to the original precipitation. Careful consideration of such secondary processes is prudent in end-member selection.
This study investigated δ 2 H and δ 18 O in precipitation, surface water, and groundwater to quantify recharge sources to aquifers on the South Rim of the Grand Canyon, Arizona, USA. Within the study area, monsoon events, characterized by high rain rates, can account for the majority of annual precipitation. However, the prevailing conceptual model for recharge in previous investigations by Monroe et al. (2005) and Springer et al. (2017), assumes that very little recharge occurs from lower-elevation recharge areas or monsoonal precipitation. The two main physical-based justifications have been (1) physical limitations on infiltration where the low hydraulic conductivity of dry soils results in infiltration excess and surface runoff, and (2) dominance of evaporation and transpiration (ET) fluxes over transport to the deeper groundwater system once water has infiltrated. However, other investigations in arid and semiarid drainages have indicated that intense summer precipitation events can result in significant recharge in washes and ephemeral stream beds Newman et al. 2006;Meredith et al. 2015;Yang et al. 2017). As a point of emphasis, an estimated peak run-off of 70 m 3 /s from a high-intensity monsoonal storm on the Coconino Plateau (in the study area) was observed to completely infiltrate in less than a few miles (USDA 1986). Likewise, ET fluxes may come from a distinct and possibly even physically separated water source relative to bulk soil water and groundwater (Evaristo et al. 2015;Good et al. 2015) and preferential flow paths might even bypass the root zone altogether (Walvoord and Scanlon 2003), suggesting that ET could have a relatively smaller impact on annual recharge volumes than previously assumed. Understanding the contribution of surface-water run-off and summer monsoons to groundwater recharge is critical for understanding long-term water resource protection. Specifically, concentration of land-surface contaminants in runoff and focused recharge from channel bottoms and preferential flow paths could have an out-sized impact on groundwater quality.
A second goal of the study is critical evaluation of isotopic end-members. Seasonal variation of δ 2 H and δ 18 O in Arizona has been well documented (e.g., Simpson et al. 1972;Eastoe and Dettman 2016). Winter precipitation is produced by midlatitude frontal systems originating from the Pacific Ocean (Sheppard et al. 2002) with mainly Rayleigh distillation leading to depleted δ 2 H and δ 18 O. Summer rainfall occurs predominantly as localized high-intensity convective storms associated with the North American Monsoon (NAM; see Carleton 1985), and is characterized by enriched δ 2 H and δ 18 O. This enrichment is in part because the summer precipitation may contain more 'recycled' water from land-surface evaporation and raindrops often undergo secondary evaporation before reaching land surface, a process termed 'subcloud evaporation'. The result being that summer precipitation is distinctly heavier than winter precipitation providing a strong basis for defining mixing model end-members. In this study, available data was critically examined and a framework was provided for conceptualizing and supporting a δ 2 H and δ 18 O based groundwater mixing model for the South Rim of the Grand Canyon. Results of this study show that recharge timing (i.e., modern versus paleo) and elevation do not provide sufficient separation in end-members such that a mixing fraction can be calculated with reasonable uncertainty. Recharge source is critical to understanding the groundwater flow system and how system-wide changes such as climate change and land-use practices, potentially affect groundwater quantity and quality. Stable isotope mixing models provide relatively robust methods for quantify recharge source. Specifically, identification of primary recharge areas and sources will enable managers of South Rim groundwater resources to better identify potential threats (e.g., groundwater extraction and mining activities) to the groundwater resources of the area.

Study area
This study is focused on groundwater along the South Rim of the Grand Canyon (Fig. 1). The study area encompasses the Coconino Plateau north to south from the South Rim to the San Francisco Peaks (SF Peaks), respectively, and east to west from the Little Colorado River to Mohawk Canyon, respectively, where surface elevation ranges from approximately 530-3,850 m (m) above North American Vertical Datum of 1988 (NAVD88). The primary units of South Rim hydrogeology are layered Paleozoic sedimentary rock intersected by a complex system of faults and fractures (Haynes and Hackman 1978;Billingsley and Hampton 2000;Billingsley et al. 2006;Billingsley et al. 2007). Most groundwater is found in the shallow sandstone Coconino aquifer and the deep carbonate Redwall-Muav aquifer, separated by the thick Supai Group of fine-grained sandstone and mudstone. A minor aquifer is present in the underlying Proterozoic rock (Metzger 1961;Cooley 1976). Groundwater generally flows from south to north and discharges at springs along the South Rim or directly to the Colorado River (Hart et al. 2002;Bills et al. 2007).
A recharge area was estimated to constrain δ 2 H and δ 18 O of precipitation data representative of recharge to South Rim springs and wells (Fig. 1). Without detailed maps of groundwater levels, the recharge area was based on surface watersheds (Steeves and Nebert 1994), land-surface elevation, regional geology, ground-based knowledge of spring location and geologic formations, and previously reported groundwater potentiometric surfaces (Hart et al. 2002;Bills et al. 2007). The entire Little Colorado River drainage basin was not included in the recharge area. Two groundwater sites for which this is most relevant (Blue Spring and GC-1 well) were located on the west side of the river, where substantially more precipitation occurs, implying limited recharge from east of the river would be captured. Further, the upper head waters of the Little Colorado River cover a vast area that is likely not representative of the recharge captured at the majority of springs and wells, and inclusion would have likely biased results of this work. More detailed delineation of recharge areas for individual springs is unwarranted given the available data.

Climate and precipitation data
Climatic data were compiled for weather stations- Fig. 1 and Table S1 of the electronic supplemental material (ESM1)-as 30-year normal (1981-2010) monthly total precipitation and snowfall (National Climatic Data Center;NOAA 2018). An approximation of the average snow water equivalent (SWE; depth of water (mm) if snow was completely melted) was estimated as follows. The 'observed' SWE was calculated as monthly total precipitation divided by monthly snowfall for the three coldest months (Dec, Jan, Feb). Sites with an unreasonable average 'observed' SWE (>50%) for the 3-month period or did not have snowfall data (WNM, PS, PR, GRCAA, and DMR; Fig. 1) were excluded from average SWE. Snowfall in the warmer winter months (May, Apr, Nov) likely has a higher SWE meaning the contribution of snowfall to total precipitation might be underestimated for those months. The fraction of winter (Nov-Apr) precipitation was calculated as the contribution of precipitation during the winter months to the total annual precipitation amount.
Observed δ 2 H and δ 18 O in precipitation and precipitation amount data (Table S2 of the ESM1) were compiled from the Global Network of Isotopes in Precipitation (GNIP; IAEA 2018), Waterisotopes Database (WID 2019) and from Beisner et al. (2016). Evaporative effects on observed δ 2 H and δ 18 O were investigated through deuterium excess values (dexcess = δ 2 H -8 × δ 18 O; Dansgaard 1964). Grids of predicted mean monthly δ 2 H and δ 18 O in precipitation (Bowen 2018) were precipitation weighted using 30-year (1981-2010) mean monthly precipitation (PRISM Climate Group 2012) to account for the spatial variability of precipitation across the study area. δ 2 H and δ 18 O grids were resampled to match the resolution and cell extent of the precipitation dataset prior to the calculation. Precipitation weighting of gridded δ 2 H and δ 18 O was conducted on a cell-by-cell basis. Gridded data were subset using the estimated recharge area boundaries to calculate the respective mean modeled δ 2 H and δ 18 O monthly and seasonal end-members. Mixing end-member δ 2 H and δ 18 O were calculated for observed and modeled data on a precipitation-amount-weighted basis for elevation and seasonal groups as: where δ wgt is the weighted mean isotopic value for the respective grouping, δ i is the individual isotopic ratio, p i is the related precipitation amount, and p t is the total precipitation amount for the respective grouping. A pooled standard deviation was calculated for each respective weighted mean. The definition of seasons, for winter as November through April and summer as May through October were based on the climatic data compiled for this study, seasonal weather patterns in northern Arizona (group 3 in Figs For the purpose of this study, high and low elevation groups were separated by collection site elevation above or below 2,000 m above NAVD88, respectively. This elevation breakpoint was chosen based on clear differences in observed long-term precipitation data above and below 2,000 m.

Groundwater and surface-water data
Groundwater data were collected for this study between 2016 and 2018 by US Geological Survey (USGS) and National Park Service (NPS) staff following standard procedures (Gibs et al. 2012;Radtke et al. 2002;Ritz and Collins 2008;Rounds and Wilde 2012;Rounds et al. 2013;Skrobialowski 2016;USGS 2006USGS , 2019aWilde et al. 2014;Wilde 2002Wilde , 2004Wilde , 2006 Table S1 of the ESM1. Grand Canyon National Park (GRCA) boundary identified. Land-surface elevation (USGS 2017) from~500 m (green) to~3,900 m (white) indicated. Latitude and longitude (North American Datum 1983) indicated by tick marks and labels groundwater δ 2 H and δ 18 O and expand the analysis of South Rim groundwater. Surface-water data were compiled from Wagner (1987), Waterisotopes Database (WID 2019), and NWIS (USGS 2019b) to constrain an end-member that might better represent recharge from surface run-off (Table S4 of the  ESM1). Surface-water sites clearly and mainly derived from groundwater discharge were excluded.

Statistical methods
Statistical analysis was used to identify differences in mean values, verify the relationship between δ 2 H and δ 18 O and explanatory variables, and evaluate trends in δ 2 H and δ 18 O to refine the conceptual model. Statistical difference in means were evaluated using nonparametric Wilcoxon test assuming unequal variance for nonweighted δ 2 H and δ 18 O arithmetic mean end-members and a Welsh test assuming nonnormal distribution for weighted end-members. Monotonic correlations between δ 2 H and δ 18 O and elevation, precipitation amount, and longitude were tested using Spearman's ρ. The statistical significance of correlation and difference in means values (H 0 = difference in means is equal to 0) was tested at the 95% confidence interval (α = 0.05). Statistical analysis was conducted in R programming language (R Core Team 2018) using the base 'stats' package for hypothesis testing as wilcox.test and calculating correlation as cor.test(method = "spearman") and the 'weights' package (Pasek et al. 2018) for the Welsh test as wtd.t.test(bootse = TRUE). Isotopic ratios in precipitation and groundwater were compared to independent explanatory variables (precipitation amount, elevation, longitude, groundwater age) using an ordinary least square regression (OLSR). Relationships between individual and seasonal weighted δ 2 H and δ 18 O and elevation were assessed in R as lm(weights = precipitation amount). Goodness of model fits were assessed using the square of Pearson's correlation coefficient (R 2 ). Statistical significance of regression fit and/ or slope was evaluated at the 95% confidence level (α = 0.05). A local meteoric water line (LMWL), groundwater line (GWL), and surface-water line (SWL) were calculated by reduced major axis (RMA) regression (Crawford et al. 2014) as appropriate for this analysis since both variables are independent and have associated error. Precipitation weighting of the LMWL was not conducted as there was only a very weak observed relationship between δ 2 H and δ 18 O and precipitation amount (see section 'Precipitation amount'). A total of 12 surface-water samples of δ 2 H and δ 18 O were available (Table S4 of the ESM1)-Lake Mary and Lake Marshall southeast of the San Francisco Peaks near WCNM weather station, Volunteer Wash near the southwest base of the San Francisco Peaks, and the Little Colorado at Cameron approximately located between WNM and Blue Spring ( Fig. 1)-were used to calculate the SWL.

Mixing model
The fraction of recharge sourced from winter precipitation (F win ; fraction winter) for a given groundwater sample was calculated using a precipitation weighted isotopic mass balance model described by Jasechko et al. (2014) as: where δ gw is the measured isotopic ratio in groundwater, δ s is the precipitation weighted mean isotopic ratio of summer precipitation or surface water, and δ w is the precipitationweighted-mean-isotopic ratio of winter precipitation. F win was calculated for both δ 2 H and δ 18 O. Uncertainty in F win was calculated following Phillips and Gregg (2001) as: where σ is the respective standard deviation for each component.
The σ of groundwater is taken to be the analytical uncertainty (1‰ for δ 2 H, 0.2‰ for δ 18 O; Révész and Coplen 2008a, b). This is an underestimate of analytical uncertainty for groundwater data reported by Zukosky (1995). The σ of weighted seasonal end-members (δ s and δ w ) were taken to be the standard deviation of the nonweighted values of δ 2 H and δ 18 O in precipitation for the given season. The method of uncertainty estimation described by Genereux (1998) provided similar results; Eq. (3) was adopted in this study for clarity in notation. Mixing model inputs and complete model outputs are provided in Table 1 and in ESM1. Table S3 of ESM1 is also reported in a USGS ScienceBase Data Release (Solder 2020). Two important caveats to the δ 2 H and δ 18 O mixing model is the potential error in end-members associated with (1) evaporative concentration of isotopes in precipitation prior to recharge (i.e., enrichment ;Stewart 1975) and (2) groundwater recharged under a different climatic regime where δ 2 H and δ 18 O in precipitation differed from modern values. In this study, the effects of evapoconcentration and paleoclimate cannot be disentangled from the seasonal signal on a sample-bysample basis; yet, the drivers of the variations are central to identifying recharge sources in the study area.
To constrain the effect of evaporated waters on estimates of F win , a mean and weighted mean δ 2 H and δ 18 O of surface water and summer precipitation, respectively, were calculated. The mean δ 2 H and δ 18 O of surface water was calculated by taking the average of three measurements reported by Wagner (1987), collected close in space and time, as a single value to include in the average of the remaining values to avoid biasing the limited surface-water dataset (n = 5). As summer precipitation was less influenced by secondary evaporation, calculated values of F win using the precipitation end-member represent a minimum value.
The effect of paleoclimatic variation was investigated by plotting δ 2 H and δ 18 O versus mean ages reported by Solder et al. (2020) for select sites. Site selection criteria included (1) estimated groundwater ages greater than a few thousand years and not indicated to have a large component of modern water (i.e., low tritium ( 3 H) and high terrigenic He ( 4 He terr ); Solder et al. 2020), (2) locations can be used to construct a hypothetical flow path (i.e., not necessarily physically valid but representative of increasing groundwater age down hydraulic gradient), and (3) sites were not likely to have captured substantial groundwater from shallow aquifers. Indications of mixtures of modern (< a few 100 years) and paleo-recharge (>10,000 years) is the most common factor limiting data that could be used for the comparison (see Solder et al. 2020). Statistical analysis of correlation and fitted OLSRs were used to evaluate the relationship between δ 2 H and δ 18 O and groundwater age.

Climate
Climatic data compiled for long-term weather stations show precipitation falls entirely as rain between May and November and there are distinct seasonal peaks in total precipitation (Fig. 2a) associated with winter low pressure systems and the summer North American Monsoon. Winter season precipitation does not fall entirely as snow in the study area with SWE totals tending to be less than total precipitation (Fig. 2a). Total annual precipitation and snowfall are correlated with elevation ( Fig. 2b) where higher elevations receive more precipitation. The fraction of total annual precipitation that falls during winter months does not change significantly with elevation (Fig. 2b), so although higher elevations receive more precipitation, overall there is not an elevational bias to seasonal precipitation amounts.

Isotopes in precipitation
Observed δ 2 H and δ 18 O in precipitation ranged from −158.2 to 11.2‰ and −20.4 to 7.6‰, respectively, and plot along the global meteoric water line (GMWL; Craig 1961) except for some data falling below the line (i.e., low d-excess), which tend to occur for samples with high values of δ 2 H and δ 18 O (Fig. 3, Table S2 of the ESM1). Mean monthly and seasonal modeled precipitation data show a similar trend to the observed data but fall closer to the GMWL than the LMWL with high d-excess observational data being the source of disagreement (Fig. 3).
Seasonality δ 2 H and δ 18 O in precipitation is at a minimum during winter and a maximum during summer (Fig. 3), consistent with most midlatitude sites. Individual observed samples in each season significantly overlap but seasonal weighted means are distinct (Fig. 3). Weighted means are similar for modeled and observed summer precipitation (not shown in Fig. 3), while weighted means for modeled winter precipitation are considerably more isotopically depleted than the observed winter precipitation ( Fig. 3; Table 1). A statistically significant difference is found between seasons for nonweighted seasonal means (not shown in Fig. 3), in both the observed and modeled data (Table 2). Weighted seasonal means are statistically different between seasons with a high level of confidence for both observed and modeled data ( Fig. 3; Table 1).

Precipitation amount
Variability in δ 2 H and δ 18 O is related to the observed precipitation amount with greater variability observed for smaller precipitation amounts (Fig. 4a). There is a statistically significant but extremely weak trend between summer δ 2 H and precipitation amount and between all groupings of δ 18 O and precipitation amount ( Table 2). The weak correlations indicate that total precipitation amount is a secondary effect and not a useful predictor of δ 2 H and δ 18 O, thus precipitation weighting was not used for calculating the LMWL.

Elevation
Observed δ 2 H and δ 18 O have high temporal variability at all elevations (Fig. 4b); however, there is a statistically significant relationship of unweighted means of δ 2 H and δ 18 O with elevation (Table 2) though it explains little variance in the isotopic data (R 2 ≤ 0.02). Based on the strong correlation between precipitation amount and elevation (Fig. 2)

Isotopes in groundwater and surface water
In total, individual measurements of δ 2 H and δ 18 O (n = 167) were available for 50 groundwater (springs and wells) sample locations in the study area (Table 3 and Table S3 of the ESM1; Fig. 3 p-value <0.001, n = 7). The long period of record over which samples were collected at these sites (24 and 22 years, respectively) and the limited sample sizes mean the correlation has little statistical power but does provide context for mixing model results for those sites. Groundwater δ 2 H and δ 18 O generally cluster together and fall along the GMWL, with a clear evaporative signal (values farther falling to the right of the GMWL) at more enriched values of δ 2 H and δ 18 O (Fig. 3). Surface-water samples δ 2 H ranged from 0.5 to −11.8‰ with a mean and standard deviation of −8 ± 4.1‰ and δ 18 O ranged from −23.1 to −86.8 with a mean and standard deviation of −65 ± 20‰ (n = 12; Fig. 3; Table S4 of the ESM1). Mean δ 2 H and δ 18 O in surface water after correction for sampling bias used for calculating F win was −38.3 ± 18.03 and − 2.4 ± 1.08, respectively (Fig. 3). Observed summer precipitation and surface-water means and the modeled winter precipitation means constrain the

Longitudinal trend
Groundwater δ 2 H and δ 18 O for select sites near the South Rim generally increase, becoming more enriched, moving from east to west across the study area ( Fig. 5; Table S3 of the ESM1; Monroe et al. 2005). Excluded sites are close to the San Francisco Peaks where longitude is not related to distance from recharge area. A statistically significant linear trend between δ 18 O and longitude (also δ 2 H and longitude not shown on   Table 2. δ 2 H has a similar pattern and is not shown for clarity

Paleo-recharge
In assessing paleo-recharge, five sites strictly met the selection criteria (Bar Four, Canyon Mine Observation, Canyon Mine, Patch Karr, and Sunset Crater wells) and a statistically significant correlation was found between δ 2 H and mean age (Spearman's ρ = −0.74, p-value = 0.046), but not for δ 18 O and mean age. The linear trend between δ 2 H and mean age is clearly driven by a single sample from Bar Four well (Fig. S2 of the ESM1). Inclusion of the large discharge springs of study area (Blue, Havasu, and Indian Garden Springs), which are largely old water, indicates there is no statistically significant relationship between δ 2 H and δ 18 O and mean age ( Table 2). As such, modern δ 2 H and δ 18 O of precipitation and surface water were assumed to be representative end-members for the groundwater water ages encountered in this study.

Mixing model
Calculated values of F win differ based on use of δ 2 H or δ 18 O and on mixing model end-member selection (Figs. 6 and 7; Table 3, and Table S3 of the ESM1). Secondary evaporation of precipitation is readily apparent in the study area with data points falling below the meteoric water lines (Fig. 3). F win based on δ 2 H are almost always less than those calculated using δ 18 O (Fig. 6), indicative of the secondary evaporative processes having a disproportionate effect on δ 2 H. As δ 18 O is less altered by these processes, and thus more representative of recharge from precipitation, values of F win based on δ 18 O are considered more reliable for identifying recharge source. F win is generally larger using surface water than weighted observed precipitation as the summer end-member (Fig. 6). Calculated F win for both summer precipitation and surface-water end-members ranges from 0.25 to greater than 1, with respective means and standard deviations of 0.95 ± 0.17 and 0.97 ± 0.09 (Table 3 and Table S3 of the ESM1). Uncertainty in F win is generally large overall and higher for summer precipitation end-member (average 43% error) than the surfacewater end-member (average 20% error) reflective of the uncertainty and respective separation in the seasonal end-members. While F win based on summer surface water is more consistent with the observed evaporative signal in groundwater and the conceptualized summer recharge mechanism of runoff infiltration, limited data availability of surface-water δ 18 O suggests a high level of uncertainty. Calculated F win based on summer precipitation and summer surface-water end-members are likely better considered as reasonable constraints on the 'true' value of F win . Values of F win greater than 1 indicate spatial aggregation error in the use of study area-wide winter end-member definition at those sites that likely have a smaller recharge area with a different δ 18 O in recharge. Coherent spatial patterns of F win are apparent with a general reduction of F win from east to west and from near the South Rim to further toward the inner canyon (Fig. 7).

Discussion
In modeling the groundwater δ 2 H and δ 18 O system of the South Rim, precipitation, groundwater, and surface-water data were evaluated to develop a conceptual model and appropriately define end-member separations. Selection of appropriate end-members required evaluation of δ 2 H and δ 18 O long-term stability in precipitation, variability in precipitation with respect to elevation and season, and potential alterations of precipitation prior to recharge. Understanding the relative  Table 2. Map ID numbers listed in Table 3.
Color groups refer to specific subregions discussed in the text. δ 2 H is not displayed here for clarity but has a similar pattern influence of each control on end-member values provided context for interpreting groundwater δ 2 H and δ 18 O mixing model results, and a framework for testing and revising the conceptual model. Long-term variations in δ 2 H and δ 18 O of precipitation (i.e., change in climatic regime), if over the same time scale as groundwater age in the study area, could have resulted in a condition where modern precipitation was not representative of paleo-recharge. Results of this study indicate δ 2 H and δ 18 O in precipitation was likely stable over the long-term with no statistically significant relationship between groundwater δ 2 H and δ 18 O and groundwater age (Fig. S2 of the ESM2; Table 2). This finding was consistent with isotopic analysis of Cataract Canyon travertine (O'Brien 2006) and southern Nevada groundwater (Davisson et al. 1999). In contrast, similar groundwater investigations in far northeastern Arizona and northern New Mexico found groundwater older than 7,000 years to be comparatively depleted, suggestive of recharge during cooler, relative to modern, climate conditions (Phillips et al. 1986;Zhu et al. 1996). The inconsistency between studies suggested that local results were likely the best indicator of the influence of changing climate on precipitation. If a shift in paleo-precipitation δ 18 O is assumed similar to that previously reported (average 0.13‰/1,000 years; Phillips et al. 1986;Zhu et al. 1996), calculated values of F win (using the surface-water summer end-member) decrease~0.1/ 7500 years in mean age. Further accounting for the~1‰ decrease in the ocean δ 18 O, the ultimate source of precipitation, between the late-Pleistocene and late-Holocene (Schrag et al. 1996) would only further decrease calculated F win in paleo-groundwater. Such corrections were not applied in this study given no relationship was apparent between groundwater isotopic ratios and mean age in the study area, and uncertainty in modern and premodern groundwater mixing ratios ) would further reduce certainty in such 'corrected' results. If such a climatic shift did occur in the study area and was captured in the groundwater, reported values of F win are a likely maximum, further supporting the conclusion of summer recharge being an important contribution to South Rim groundwater.
Isotope end-member separation by elevation is initially appealing along the South Rim as the San Francisco Peaks are an obvious conceptual choice as the dominant recharge zone. Within the study area, data from higher elevation weather stations showing increased annual precipitation (Fig. 2b) in combination with the large difference in elevation between most of the study area and the San Francisco Peaks (Fig. 1), resulting in isotopic fractionation (e.g., Poage and Chamberlain 2001), provides a strong basis for an elevation-based separation. But closer inspection of available δ 2 H and δ 18 O in precipitation data shows that elevation poorly describes the variation in unweighted δ 2 H and δ 18 O in precipitation (R 2 ≤ 0.05, Table 2) and no statistically significant relationship was found between elevation and weighted δ 2 H and δ 18 O in precipitation ( Fig. 4b; Table 2). Beisner et al. (2016) similarly reported no relationship between elevation and δ 2 H and δ 18 O of precipitation for the Mogollan Rim, just south of the study area, except when separated by season and physical location of transect. It is important to note that limited availability of observed data of δ 2 H and δ 18 O in precipitation within/near the study area, particularly at high elevation, is a major limitation. The present analysis is insufficient to conclude that no elevational gradient in δ 2 H and δ 18 O exists in the study area, but it does show that elevation end-members based on available data are not sufficiently different from one another (Table 1), resulting in unacceptable uncertainty in the calculated mixing fraction. Use of elevation-isotope relationships, which are either not statistically supportable (e.g., two data points used to define  Table S3 of the ESM1) not shown linear relationship) or seasonally based when groundwater is a mixture of seasonal recharge sources (e.g., Blasch and Bryson 2007;Springer et al. 2017), introduce unquantified uncertainty making evaluation of such work difficult. Further empirical data collection may validate an elevation separation, but current data are inadequate for such an approach in determining South Rim groundwater recharge sources.
Bi-modal distribution of annual precipitation (Fig. 2), differences in seasonal controls and vapor sources of δ 2 H and δ 18 O in precipitation (Fig. 3), and conceptual understanding of recharge processes indicated that seasonal-based δ 2 H and δ 18 O separation may be appropriate. Further investigation of observed precipitation data showed (1) high spatial variability in precipitation amount (Fig. 2b) and increased variability of δ 2 H and δ 18 O in precipitation for lower amount events (Fig.  4a), which indicated precipitation weighting of δ 2 H and δ 18 O precipitation data is necessary; (2) while total precipitation amount varied with elevation, the relative seasonal contribution to total precipitation (i.e., fraction winter precipitation; Fig. 2b) was relatively invariant which, in conjunction with the lack of a relationship between isotopes and elevation, indicated that elevation could reasonably be controlled for in a seasonal separation; and (3) although individual measurements of δ 2 H and δ 18 O for a given season had significant overlap, the respective seasonal means of δ 2 H and δ 18 O were statically different from one another ( Fig. 3; Table 1) all supporting use of seasonal precipitation end-members to identify groundwater recharge sources.
An important check of end-member selection was to plot them against the mixed samples (i.e., groundwater) to determine if they provided reasonable constraints (Phillips et al. 2014). In the case of South Rim groundwater, the weighted δ 2 H and δ 18 O winter end-member calculated from observed precipitation data did not constrain many of the samples (Fig.  3), likely a result of the limited empirical precipitation data. In this study, grids of predicted δ 2 H and δ 18 O in precipitation (Bowen 2018) were used to expand the temporal and spatial data coverage, specifically in the coldest winter months (Dec, Jan, Feb) and at high elevations. Precipitation weighted modeled winter δ 2 H and δ 18 O in precipitation better constrained the groundwater data (Fig. 3). After precipitation weighting, modeled and observed summer δ 2 H and δ 18 O did not significantly differ (Table 1). Inspection of groundwater data showed an increasing evaporative signal (i.e., deviation from the GMWL) for the relatively more depleted groundwater samples (Fig. 3) suggestive of an evapo-concentrated recharge source. Based on other investigations of summer runoff infiltration in channel bottoms in the desert southwest Newman et al. 2006;Meredith et al. 2015;Yang et al. 2017) and historical observation of such events on the Coconino Plateau (USDA 1986), data for δ 2 H and δ 18 O in surface water were compiled and used to define a second summer end-member (Table S4 of the ESM1) representing focused recharge of evapo-concentrated rainfall from high intensity monsoon events. The summer endmembers based on observed precipitation and surface-water measurements provided some constraint (i.e., visually represented by the GWL passing between the two end-members, Fig. 3) on the uncertainty in F win associated with the potential effect of evapoconcentration before infiltration.
Groundwater δ 2 H and δ 18 O were relatively invariant with respect to precipitation (Fig. 3) and were temporally stable; repeat sampling indicates δ 2 H and δ 18 O varied by less than 1.1% on average at a given site (Table S3 of the ESM1). The implication of these findings is that long-term South Rim recharge sources were fairly consistent over time, and groundwater was sufficiently mixed such that timing of sample collection did not affect δ 2 H and δ 18 O mixing calculations at most sites. Seasonality in age tracers and trace element chemistry, which are more sensitive to shorter time-scale recharge variations, was readily apparent Beisner et al. 2019), suggesting the overall long-term stability of recharge source may be overlain by subdecadal or inter-annual variations in recharge amount and location. F win was calculated for both δ 2 H and δ 18 O using two summer end-members (weighted observations of precipitation and surface-water observations) such that four values of F win are available for each groundwater sample. Values of F win based on δ 18 O were considered more reliable than δ 2 H, as the oxygen isotopic ratio is less influenced by subcloud evaporation observed in the precipitation data (Fig. 3) and secondary evapoconcentration associated with (1) summer ET (e.g., Barnes and Turner 1998;Ingraham et al. 1998) or (2) winter snowpack melt and sublimation (e.g., Cooper 1998;Earman et al. 2006). Indeed, values of F win calculated using δ 2 H were consistently less than those using δ 18 O (Fig. 6), suggesting a possible systematic bias. The high amount of overlap between individual measurements of δ 18 O in precipitation for summer and winter months ( Fig. 3 and Table S2 of the ESM1) translated into a high standard error in end-members, and thus high uncertainty in F win (Tables 2 and 3 and Table S3 of the ESM1). As pointed out by Genereux (1998), greater separation between the winter precipitation and summer surface-water end-members (Table 2; Fig. 3) resulted in reduced uncertainty for the respective values of F win .
Samples with F win > 1 were indicative of the challenge in assigning a single end-member for all sample locations based on precipitation across a large study area. This was exemplified in the sample from the San Francisco Peaks well where only recharge from the high elevations could be captured and F win > 1.26 (Table 3). While the recharge area for this well could have been reasonably constrained based on topography, the recharge areas for other springs were not so clearly defined. Important to note, for every sample where F win was greater than 1, the respective standard errors did capture 1 except for samples from Bar Four and San Francisco Peaks wells. Additional collection of isotope data in precipitation and ancillary data for understanding the flow system would improve future similar studies of recharge source in the area.
Important implications of the F win results ( Table 3; Table S3 of the ESM1; Fig. 7) are (1) summer recharge is an important source of recharge to South Rim aquifers, (2) coherent spatial patterns of F win (and δ 18 O, Fig. 5) indicate a second major recharge area along the South Rim at the eastern extent of the Coconino Plateau, and (3) groundwater recharged at the San Francisco Peaks could be a smaller fraction of total discharge than previously thought. Even accounting for the full range of the standard deviation on F win (Table 3; Table S3 of the ESM1), a quantifiable amount of summer recharge was captured in Coconino aquifer wells (Canyon Mine Observation, Waputaki HQ) and springs relatively near the canyon rim or possibly disconnected from the regional groundwater system (JT, Sam Magee, Grapevine East; Fig. 7). Supporting the overall findings of the mixing analysis were spatial patterns of decreasing F win and increasing δ 18 O moving east to west across the study area (Figs. 5 and 7), indicating increased capture of summer recharge. Importantly, the distance between San Francisco Peaks and each of the South Rim sample sites near Grand Canyon Village (Figs. 1 and 7), are not substantially different and likely did not fully explain the observed changes in F win and δ 18 O. Instead, a more likely explanation is the presence of a secondary recharge zone located on the eastern extent of the Coconino Plateau near the South Rim. This new conceptual recharge source was further supported by relatively high annual precipitation in the subregion (PRISM 2018) and the presence of modern tracers (tritium, 3 H) in samples from many South Rim springs necessitating a relatively near-by recharge source . The overall implication of the results presented here is that recharge to South Rim groundwater likely occurs in multiple locations and across the entirety of the year. Previous conceptual models that identified snowmelt from the San Francisco Peaks as the singular recharge source need to be revisited.

Conclusion
Stable isotope mixing models are frequently used to identify and quantify groundwater recharge sources, but the exercise is nontrivial because the models are frequently data-limited and multiple processes/effects control water isotopic ratios. Critical evaluation of such models requires reporting the specific considerations in building the conceptual model, methods used for identifying end-members, and the uncertainty in calculated relative contribution from multiple recharge sources. In this study, recharge sources to groundwater aquifers south of the Grand Canyon were investigated using δ 2 H and δ 18 O. Limited availability of empirical δ 2 H and δ 18 O data in precipitation was addressed by leveraging modeled data describing δ 2 H and δ 18 O in precipitation across the study area and critical examination of the separation method used to define end-members.
A seasonal separation, based on the bi-modal nature of annual precipitation amount and a distinct difference in vapor sources and atmospheric processes, was used to define winter (Nov-Apr) and summer (May-Oct) precipitation endmembers and calculate the fraction of recharge from winter precipitation (F win ). Investigation of δ 2 H and δ 18 O in precipitation and groundwater showed that separation based on recharge timing (i.e., modern versus paleo-recharge) or elevation (i.e., low elevation versus high elevation recharge) was not supported by the available data, resulting in very high uncertainty in mixing results based on those end-members. δ 2 H and δ 18 O in groundwater showed an increasing evaporative signal at more enriched values, suggesting a summer mixing end-member that was slightly more isotopically depleted than the observed precipitation. Process-based understanding of arid zone recharge mechanisms identified infiltration of run-off from high intensity precipitation events as a possible recharge source explaining the evaporative signal. Summer run-off infiltration was characterized in this study by surface-water δ 2 H and δ 18 O data to estimate a second summer end-member and constrain the range on F win values. Specific to the South Rim, findings of this study indicate summer precipitation is an important component of groundwater recharge and, in addition to the previous identified San Francisco Peaks recharge area, the eastern extent of the Coconino Plateau was identified as a potential major recharge area.
Repeat measurements of δ 2 H and δ 18 O indicated relative stability in recharge source over multiple decades but observed variability in age tracer and trace element concentrations Beisner et al. 2019) suggest recharge amount and location could be changing in shorter time periods. In terms of resource management, the findings of this work indicate (1) contaminants, either from land-surface or subsurface sources, are likely to be transported into the deep aquifer, which is the primary source of South Rim springs and drinking water wells, (2) redistribution of run-off could significantly alter the net recharge, and thus discharge, and (3) current understanding of South Rim groundwater is incomplete, presenting a challenge to science-based decision making.
Funding information This work was supported by the National Park Service Water Resources Division and the US Geological Survey Toxic Substance Hydrology Program. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.