Introduction

Better characterization of the water sources in Grand Canyon National Park (GRCA), Arizona (USA) (Fig. 1), is a priority for park managers in order to continue meeting human and ecological needs. Climate change, possible decreases in spring discharge throughout the park, and vulnerability of groundwater to contamination all drive the need to estimate water quantity and spatiotemporal trends of spring flow to better inform decisions regarding water intake and use. The spatiotemporal analyses of spring hydrographs presented here are simple and broadly applicable methods of aquifer characterization in a data-poor system where site access is difficult and standard methods of aquifer characterization, like borehole logs and geophysical measurements, are largely unavailable. Similar hydrograph analyses have been used successfully in other karst regions for aquifer characterization (Fiorillo et al. 2021; Sarker and Fryar 2022).

Fig. 1
figure 1

a Map of North America with the Colorado River and Grand Canyon region highlighted. b Location of Grand Canyon National Park boundary relative to the Arizona, Utah, and Nevada state boundaries. c Study area map showing the R aquifer springs analyzed here: Roaring, Emmett, At Last, Tapeats, and Abyss River springs. The study area map also shows the locations of the Bright Angel Creek (BAC) watershed, the Roaring Springs fault, and the Transcanyon Pipeline carrying water from Roaring Springs to the north and south rims

Climate models predict hotter and drier conditions with aridification in the western US (Ahmadalipour et al. 2017) and a transition from snowpack-driven to rain-dominated regimes for groundwater recharge (Godsey et al. 2014; Taylor et al. 2013). A recent study suggests rising temperature and reduced groundwater infiltration in the Grand Canyon subregion over the next century (Tillman et al. 2020). Recharge of the Redwall-Muav aquifer (R) aquifer in the Grand Canyon region is therefore at risk of decreasing if climate regimes transition as forecasted. Already, linear regression analysis of a century of snow depth data on the Kaibab Plateau, a region of recharge for the R aquifer, indicates that snowmelt is occurring about a month earlier than it was a century ago (Fig. 2). Previous baseflow analysis of Roaring Springs on the north rim of GRCA even suggests that flow may run dry in ~40 years, assuming no additional recharge (Jones 2017). Vasey’s Paradise Spring, a particularly significant R aquifer spring to indigenous tribes in the region and a popular stop for river recreation, is intermittently dry despite being historically classified as a perennial spring. The last time this spring was documented to have no flow was in 1977 (L. Stevens, Spring Stewardship Institute, personal communication, 2023). Lastly, long-term trend analyses of spring discharge and groundwater modelling from Grapevine Spring on the South Rim of GRCA already indicate a decrease in discharge influenced by the past 30 years of dry climate (Nuyttens 2022). Continuing to monitor spring flow to better understand groundwater processes will be critical for the park to sustainably manage water sources in the future.

Fig. 2
figure 2

Timing of first recorded zero snowpack from NOAA at the Bright Angel Ranger Station on the North Rim of GRCA. Years missing more than 10% of daily records were removed from analysis. The statistically significant but highly variable trend shows a loss of nearly 30 days of snowpack in the last century

In addition to climate change threats to water supply, R aquifer springs may also be vulnerable to contamination. Hydrographs, dye trace studies, temperature data, and geochemical characterizations of groundwater discharge at GRCA indicate a complexity of flow paths, rates, and residence times in the R aquifer, ranging from days to thousands of years (Beisner et al. 2017; Crossey et al. 2006; Jones et al. 2018; Tobin et al. 2018, 2021). Evidence of short residence times indicates that the aquifers and springs are vulnerable to possible contamination from treated wastewater effluent (Beisner et al. 2022), wallowing cattle and bison, accidents or spills on roads or developed areas, and other contaminants that could infiltrate through sinkholes (Jones et al. 2019). Complex flow paths that may not necessarily follow structural contours make forecasting the impacts of climate change and potential contamination at specific springs difficult.

Currently, the drinking water supply at GRCA is sourced entirely from Roaring Springs, which discharges from the R aquifer on the north side of the canyon and into the perennial Bright Angel Creek (BAC) (Fig. 1). Water from Roaring Springs is diverted over 28 km by a trans-canyon pipeline to the north and south canyon rim developed areas. Several other perennial springs contribute to BAC flow, including Emmett Spring, located 2 km northeast of Roaring Springs. GRCA management is currently developing plans for water intake infrastructure in BAC near its confluence with the Colorado River. It is therefore imperative to characterize the current water supply and forecast future water security for Roaring Springs, Emmett Spring, and other R aquifer springs.

Sufficient data have only recently been available for the examination of hydrographs, snowmelt recession timing and calculations, and trend analyses of flow statistics and recession coefficients for multiple R aquifer springs (Fig. 1). Evidence from dye trace studies, sinkhole modeling, and aquifer vulnerability modeling now also allow for the initial delineation of a recharge area and water budgets for Roaring Springs (Jones et al. 2018, 2019; Tobin et al. 2021). These analyses of quality spring discharge data to characterize the R aquifer are more broadly applicable to other data-poor systems. The purpose of this study was to quantify R aquifer storage, inflow, and outflow to eventually develop a numerical groundwater flow model.

Study Area

The study area is in the southwestern United States and includes GRCA and the Kaibab Plateau, which is located on the north rim of the Grand Canyon and extends north past the national park boundary (Fig. 1). The Kaibab Plateau is an uplifted region with a snow-dominated, temperate forest climate and extensive karst terrain (Huntoon 1974; Jones et al. 2018). Precipitation on the Kaibab Plateau is bimodal, mainly occurring in the winter as snow and in the summer as flashy monsoon rainstorms. Initial spring hydrograph analyses of the regional R aquifer on the north side of GRCA suggest that similar to other high-elevation snow-dominated systems (Tobin and Schwartz 2012), snowmelt events on the Kaibab Plateau recharge baseflow to the GRCA aquifer, but rainfall events during the summer monsoon do not (Jones et al. 2018; Tobin et al. 2018). This recharge occurs by infiltration through sinkholes and diffuse recharge to the matrix storage (Brown and Macy 2012; Groves 2007; Huntoon 2000; Jones et al. 2018).

Grand Canyon National Park is characterized by a semiarid to arid environment with up to 1,600 m of vertical relief, primarily by downcutting of the Colorado River, creating extreme vertical topography, climate gradients, and exposure of springs discharging from the Redwall and Muav carbonate formations. This deeper R aquifer is a regional karst aquifer extending across much of the Colorado Plateau physiographic province, which is separated from the shallower Coconino (C) aquifer by impermeable confining layers of Hermit Shale and Supai Group rocks (Fig. 3; Huntoon 1974). These aquifers make up a stacked, karst aquifer system vertically connected by faults that have propagated through these confining layers, establishing zones of secondary porosity through which groundwater may travel (Huntoon 1974). Many R aquifer springs are sacred to Native American tribes in the region, are critical for park infrastructure and backcountry visitors, and are exceptionally diverse, supporting 44% of flora taxa, while covering only <0.01% of the total GRCA landscape (Sinclair 2018).

Fig. 3
figure 3

Cross-section of Grand Canyon hydrolithostratigraphy below the Kaibab Plateau, with the C and R aquifers labeled. The C aquifer extends to about 250 m depth, and the R aquifer extends to about 1,000 m depth

Materials and Methods

Spring Data Collection

Karst-spring hydrograph recession analysis is a common and effective tool to characterize aquifer storage, structural characteristics, and discharge response to recharge events (Ford and Williams 2007; Fiorillo et al. 2012). It is especially helpful in remote areas and for deep aquifers where geophysical surveys, boreholes, and cave mapping are significantly more difficult. There are currently six gage sites at R aquifer springs below the north rim: Roaring Springs, Emmett Spring, At Last Spring, Tapeats Spring, Abyss River Spring, and Vasey’s Paradise Spring. Access to these R aquifer springs in GRCA is difficult, time-consuming, season-limited, and typically requires several personnel with technical climbing and backcountry skills. These access limitations result in the lack of data at potentially important springs and in data gaps that decrease the quality and reliability of hydrographs—for example, the Emmett Spring hydrograph is almost a decade in the making but includes only five water years of usable data, due to infrequent monitoring. Tapeats and Abyss River Springs each require multiday backcountry trips and can only be visited once a year with experienced personnel, resulting in several years of missing data. Vasey’s Paradise Spring requires access by traditional climbing or rappelling. Insufficient discharge measurements have been collected in Vasey’s Paradise and At Last Spring to create a hydrograph for these sites.

Spring Hydrographs

According to US Geological Survey (USGS) protocol (Sauer 2002) and consistent with methods used in Jones et al. (2018), new rating curves based on semilogarithmic stage-discharge relations were calculated for Roaring, Emmett, Tapeats, and Abyss River springs (Fig. 4). For this rating type, gage height, i.e. stage, is the independent variable (scaled linearly), and discharge is the dependent variable (scaled logarithmically). Gage height is measured by In-Situ Level Troll 500 5-psi vented pressure transducers that record water depth at 15-min intervals (referred to here as loggers), and discharge is calculated by the standard current-meter method (Rantz 1982a). The Roaring Springs rating curve was updated from the previous curve in Jones et al. (2018), with an additional corrected discharge measurement from 14 March 2017 and new stage and discharge measurements from 22 September 2017 to 3 April 2021. The Emmett Spring, Tapeats Spring, and Abyss River Spring rating curves use newly published stage and discharge data (Aquatic Informatics 2022). Due to unsuitable channel geometry at the Emmett and Tapeats spring gage sites, discharge measurements for the Emmett Spring rating curve are recorded 1 km downstream of the logger location, and discharge measurements for the Tapeats Spring rating curve are recorded 3 km downstream of the logger location. All rating curves are available in the public Aquarius WebPortal (Aquatic Informatics 2022). These rating curves were used to interpolate discharge from water depth values. Daily-averaged discharge values were then calculated to create smoothed hydrographs for all four springs (Fig. 5).

Fig. 4
figure 4

Semi-logarithmic rating curves in red of discharge versus stage at a Roaring Springs, b Emmett Spring, c Tapeats Spring, and d Abyss River Spring. The 31 discharge measurements used for the a Roaring Springs curve were taken from 18 March 2015 to 2 July 2021. The eight discharge measurements used for the b Emmett Spring curve were taken from 15 March 2016 to 4 April 2021. The two discharge measurements used for the c Tapeats Spring curve were taken from 20 October 2021 to 19 April 2022. The four discharge measurements used for the d Abyss River Spring curve were taken from 8 November 2018 to 16 April 2022. The equations of each curve, number of discharge measurements (n), and the strength (r2) of each relationship are all shown on each plot. All these curves have p values <0.05

Fig. 5
figure 5

Daily-averaged hydrographs of a Roaring Springs, b Emmett Spring, c Tapeats Spring, and d Abyss River Spring

The Roaring Springs hydrograph is interpolated from a robust rating curve with a strong linear relationship, and it contains no gaps. Discharge values from 22 February 2012 to 6 February 2015 were interpolated using the best-fit relationship between overlapping hydrographs from a gage site in the cave close to the entrance, referred to as the “old site”, and from a gage site deeper within the cave, referred to as the “new site” (Fig. 6). The new site was installed in 2015 to capture a more accurate total discharge of Roaring Springs, and it is the site at which discharge measurements were taken to calculate the Roaring Springs rating curve.

Fig. 6
figure 6

Polynomial regression model relating discharge between two gage sites in Roaring Springs cave, where the black line indicates the best fit model. The model relates daily averaged discharges from 8 February 2015 to 11 February 2022 between a new gage site deeper in Roaring Springs cave versus an old gage site close to the entrance in Roaring Springs cave

The Emmett Spring hydrograph is also interpolated from a robust rating curve with a strong linear relationship; however, it includes water years with multiple consecutive months of missing data due to logger failure and infrequent monitoring visits. Peaks in discharge for 2012 and 2017 cannot be confidently calculated; per USGS protocols, the stage-discharge rating curve cannot be extended past twice the magnitude of the largest measured discharge value (Rantz 1982b; Sauer 2002). Lastly, hydrograph data from 26 January 2012 to 21 September 2013 required offset correction to align with discharge measurements taken in the field.

The hydrographs for Tapeats and Abyss River Springs are interpolated from rating curves with strong linear relationships, but these curves were calculated from only n = 2 and n = 4 discharge measurements, respectively. While Rantz (1982b) suggest a minimum of 10 discharge measurements for a rating curve, this requirement was intended for alluvial streambeds that are subject to channel reorganization. The Tapeats and Abyss River gage sites are in caves with fixed channels, so the bedrock is not subject to change on this timescale. Additionally, there are multiple consecutive years of missing data at both springs. These gaps are due to changes in the type of logger used to record water depth at the site and the lack of personnel to maintain loggers over time.

Hydrograph Recession Analyses

Following the hydrograph recession methods of Fiorillo (2012) and Jones et al. (2018), the recession limbs of each snowmelt-related event for these four springs extends from the last discharge peak to the beginning of the next rising limb. Only continuous recessions (i.e., uninterrupted by additional discharge peaks) were analyzed using a modified Maillet (1905) equation, solving for the recession coefficient:

$${\alpha}_n=\frac{\log\ \left({Q}_n/{Q}_{n+1}\right)}{0.4343\ x\ \left({t}_{n+1}-{t}_n\right)}$$
(1)

where Qn = discharge in m3/s, t = time in days, 0.4343 is a constant conversion factor relating Q and t in their respective units, n = microregime being evaluated, and αn = recession coefficient in days–1.

The recessions analyzed did not start at the highest peak in discharge (representing snowmelt), but at the last discharge peak preceding continuous recession. By not including the initial snowmelt pulse through the system, these hydrograph recession analyses can provide evidence of the storage and structural characteristics of the aquifer system. Up to three curves were fit to each recession event to model three microregimes, or flow regimes, through the aquifer. These three microregimes are inferred to indicate three different types of aquifer flow: conduit, fault/fracture, and matrix porosity (see Appendix Figs. A1 and A2) . For example, baseflow is inferred to be comprised of matrix porosity flow, while high-flow discharge peaks are considered to activate conduit flow, and intermediate flows activate fracture flow. When the recession curves are plotted in semilogarithmic space, the recession coefficient α is the slope of the linear relationship between log(Q) and t for each microregime, or the time it takes for discharge to drop an order of magnitude. Conduit flow is represented by α1, fracture flow is represented by α2, and matrix/baseflow is represented by α3. Although the Maillet (1905) equation was formulated for use in more homogenous aquifers with high porosity and was designed to be applied to recessions as a whole, it can be applied as “microregime analysis” to complex aquifer systems with heterogeneous storage and discharge flow paths (Jones et al. 2018; Kresic and Stevanovic 2010).

Spring Flow Statistics, Timing, and Comparison of Sample Means

The mean and standard errors of the timing of recession events, including peak flow, start of the rising limb, start of the recession limb, and return to baseflow (i.e., the number of days into the water year that these occurred) were calculated from Roaring Springs (n = 10 years), Emmett Spring (n = 3–8 years), Tapeats Spring (n = 4–5 years), and Abyss River Spring (n = 5–6 years) hydrographs. While a hydrograph cannot yet be calculated for At Last Spring, it was included in the timing analyses, as these values can still be determined from the water-depth time series. Each spring is treated as a sample, and the sample means for each of these values were compared between the five springs using a Kruskal-Wallis rank sum test for a nonparametric multiple comparison using the stats package in R-Studio. This nonparametric test was chosen because samples were not normally distributed and all sample sizes, or years of data available at each spring, were n < 10. These timing parameters were also analyzed across time by linear regression analyses.

Mean recession coefficients and mean volume discharged from snowmelt recession events at Roaring Springs (n = 10 years), Emmett Spring (n = 6 years), Tapeats Spring (n = 4 years), and Abyss River Spring (n = 5) were calculated for each of the three flow regimes and were compared between the four springs using the Kruskal-Wallis rank sum test. Again, parametric statistical comparisons were not possible due to the small sample sizes and nonnormal sample distributions. Recession coefficients and mean volumes discharged were compared across time by linear regression analyses.

The annual-mean flow and annual-mean flow variability values were calculated for Roaring Springs (n = 9 years), Emmett Springs (n = 2 years), Tapeats Spring (n = 4 years), and Abyss River Spring (n = 5 years) using the USGS Environmental Flow Allocation and Statistics Calculator (Konrad 2011). Flow variability is the log of the ratio of the discharge that is exceeded 90% of the water year and the discharge that is exceeded 10% of the water year. This ratio is a representation of the variability in flow about the annual mean.

$$\mathrm{Flow}\ \mathrm{variability}={\log}_{10}\frac{Q_{90}}{Q_{10}}$$
(2)

where Q90 = discharge that is exceeded 90% of the water year (m3), and Q 90 = discharge that is exceeded 10% of the water year (m3).

A lack of complete years for the Emmett Spring, Tapeats Spring, and Abyss River Spring hydrographs necessitates the inclusion of partial years that encompass at least 90% of the water year. The annual-mean flow and annual-mean flow variability were then compared between springs using the Kruskal-Wallis rank sum test and over time by linear regression analyses.

Due to the small sample sizes, a more conservative significance level of p < 0.05 was used to determine the statistical significance of the results of all sample mean comparisons and linear regression analyses.

Recharge Area Delineation

The recharge area for Roaring Springs (Fig. 7) was delineated in ArcGIS Pro 2.9 using dye trace results (Jones et al. 2018; Tobin et al. 2021), an R aquifer vulnerability model developed from sinkhole density and major fault locations (Jones et al. 2019), and sinkhole catchment areas developed using the watershed tools in ArcGIS Pro 2.9. While dye traces have been successfully used as a method of recharge area delineation in other karst regions (Hiler 2017; Miller 2020; Spangler 2001), the limited dye tracing data and complexity of the Roaring Springs aquifer system requires this broader approach.

Fig. 7
figure 7

a Map of the estimated recharge area for Roaring Springs outlined by the thick black line. This recharge area is measured to be 160 km2. This map includes results of the dye trace study (Jones et al. 2018; Tobin et al. 2021), location of BAC watershed in blue, relevant and known GRCA and KP faults, and the modified Concentration-Overburden-Precipitation (COP) R aquifer vulnerability model (Jones et al. 2019), where a value of “1” indicates high vulnerability and a value of “5” indicates low vulnerability. Note that Vasey’s Paradise Spring is symbolized to show that three dyes were received at this site. b Same map showing the sinkhole density model instead of the modified COP R aquifer vulnerability model

For dye trace-based recharge area delineations, any sinkhole that can be positively traced to the target spring (in this case, Roaring Springs), should be included in the recharge area estimate. Any regions that flow into the sinkhole, known as the sinkhole catchment area, are also included within the recharge area. In 2017, Uranine dye injected at a sinkhole near the southeastern edge of the Kaibab Plateau was detected at Roaring Springs (Fig. 7); therefore, the sinkhole and its capture area are included in the Roaring Springs recharge area.

Regions of high sinkhole densities on the Kaibab Plateau correlate with the location of major faults (Jones et al. 2018). This correlation indicates that major faults may act as concentrated flow paths and conduits, consequently driving sinkhole formation and increasing sinkhole density in these regions. The major faults on the Kaibab Plateau are hypothesized to have high connectivity to the R aquifer (Huntoon 2000). Consequently, the regions that have high sinkhole density and are in close proximity to major faults are likely to have greater R aquifer connectivity and higher sensitivity to recharge (Jones et al. 2019). A model representing vulnerability to contamination, which indicates sensitivity to recharge, was mapped and divided into five categories with “one” being the most sensitive and “five” being the least sensitive to recharge. These regions of high sensitivity are used as a guide to help approximate the Roaring Springs recharge area, where regions of high sensitivity (categories 1–3) surrounding the sinkhole positively connected to Roaring Springs are included in the recharge area.

Lastly, the sinkhole catchment areas for these regions of high sensitivity were used to further refine the boundary of the Roaring Springs recharge area. This method of catchment delineation operates under the assumption that when precipitation falling within these sinkhole catchment areas produces sufficient runoff, i.e. during snowmelt, water may reach concentrated flow paths on the surface to sinkholes and in the subsurface to Roaring Springs.

The boundaries of these high-sensitivity regions and their catchment areas were merged with the merge tool in ArcGIS Pro, and the resulting area was calculated to be 160 km2. This estimated recharge area for Roaring Springs is close to the area of 207 km2 estimated by master recession curve analyses in Jones (2017).

Water Budgets

A modified water balance equation to quantify groundwater budgets over the 5 years of complete and available data for Roaring Springs and its estimated recharge area was calculated, based on the equation outlined in Ford and Williams (2007).

$${Q}_{\mathrm{Roaring}}=P\hbox{--} \mathrm{ET}\hbox{--} {Q}_{\mathrm{other}\ \mathrm{springs}}+\kern0.5em \Delta S+e$$
(3)

where QRoaring = total Roaring Springs water volume (m3), P = precipitation (m3), ET = evapotranspiration (m3), Qother springs = total assumed water volume at other springs sourced by the recharge area (m3), ΔS = change in aquifer storage (m3), and e = sum of all parameter errors.

This method of water balance calculation for an entire basin has been used with success in another high-elevation semiarid region with a deep carbonate aquifer (Mohammadi et al. 2014) as well as for different climatic and geologic settings (Cochand et al. 2019; Lee et al. 2006). This study assumes that the only inflow parameter is infiltration of precipitation, and the only outflow parameters are evapotranspiration on the Kaibab Plateau and spring discharge at Roaring Springs and other C and R aquifer springs that may also be sourced from this recharge area. Sublimation of snowpack was not included in this equation, as this parameter has not yet been constrained. The change in storage, ΔS, is assumed to be zero for this time period, which is supported by no significant change in Roaring baseflow (α3) over the period analyzed. Error, e, was calculated by summing the estimated model errors for P (36% of the modeled value) and ET (31.6 mm/month).

Monthly P and ET volumes were collected from the OpenET (2022) public database. The “depths” of P and ET were extracted (in mm) from their respective databases, summed annually for water years 2017 to 2021 (in m), averaged for the delineated Roaring Springs recharge area (in m) and then converted to volumes (in m3) by multiplying that spatially averaged annual sum by the 160 km2 recharge area (converted to m2). The total volume discharged at Roaring Springs for each water year was determined by integrating the area under the daily averaged hydrograph.

The precipitation data on OpenET were compiled from a gridMET dataset, which blends spatial meteorological data from PRISM (PRISM Climate Group 2014) with temporal attributes from NLDAS-2, and is validated by networks of weather stations, including RAWS, AgriMet, AgWeatherNet and USHCN-2 (Abatzoglou 2013). This dataset has a resolution of 4 km. The PRISM dataset models large-scale precipitation processes, so interpolation to locations beyond recorded input data assume those locations have similar physiographic and climatic properties. Therefore, the larger the elevation difference between the input station and the PRISM cell within which it is located, the greater the difference between modeled and measured precipitation (Jeton et al. 2005). The closest land surface station within this recharge area—Bright Angel Ranger Station, on the Global Historical Climatology Network, National Oceanic and Atmospheric Administration (NOAA)—is not included in the PRISM model according to the published list of input station data (PRISM Climate Group 2014). A simple conversion of annual measured precipitation from this station to a precipitation volume for the recharge area (assuming precipitation is equal throughout this area) shows measured values to be anywhere between 5 and 36% of the PRISM-modeled precipitation volumes. The difference between modeled and measured values in this region may therefore be higher than other regions, which have been shown to match closely to ground-truthed datasets on mean within 5% of measured values (Daly et al. 2017).

The ET data from OpenET used in these analyses contains weighted mean ETs from six Landsat-based models with a weighted mean absolute error (MAE) of 13.6–21.6 mm/month (Melton et al. 2021). This ensemble ET dataset has undergone two phases of accuracy assessment in which modeled values were compared to measured ET from 139 eddy covariance flux towers, and model outliers were removed. These accuracy assessments show that the ensemble value had the highest overall correlation between measured and modeled ET values and the lowest overall MAE compared to the individual models. This ensemble ET dataset was therefore used in the water budget calculations. However, the closest eddy covariance flux tower used in this ET accuracy assessment is about 130 km from the recharge region on the Kaibab Plateau and about 500 m lower in elevation (OpenET 2021), which may raise accuracy concerns with this modeled ET.

Results

Assuming a significance level at p < 0.05, the Kruskal-Wallis rank sum tests show no significant differences in the timing of peak flow, the start of the rising limbs, start of the recession limbs, nor the end of the recession limbs, i.e. return to baseflow, between Roaring Springs, Emmett Spring, At Last Spring, Tapeats Spring, and Abyss River Spring (Fig. 8). The statistical similarity in the timing parameters of multiple snowmelt recession events at these five springs indicate similar timing of response to snowmelt events on the Kaibab Plateau across the R aquifer system.

Fig. 8
figure 8

Mean and standard error bar comparison between Roaring, Emmett, At Last, Tapeats, and Abyss River springs of mean a peak flow, b start of the rising limb of snowmelt discharge, c start of the snowmelt recession limb, and d return to baseflow, i.e. the end of the snowmelt recession limb. Timing is measured by the number of days into the water year (WY) that each parameter occurs

The Kruskal-Wallis rank sum tests also show no significant differences in mean conduit (α1), fracture (α2), or matrix (α3) recession coefficients between Roaring Springs, Emmett Spring, Tapeats Spring, and Abyss River Spring (Fig. 9). However, the Kruskal-Wallis tests do show that the volume discharged during snowmelt recession is significantly different between the four springs, except for Roaring Springs and Abyss River Spring. Roaring Springs and Abyss River Spring discharge similar volumes during snowmelt recession. Tapeats Spring discharges 1.8 × 108 m3, or four times more water during snowmelt than Roaring Springs and Abyss River Spring. Roaring Springs and Abyss River Spring discharge 5.5 × 107 m3, or 34 times more water during snowmelt than Emmett Spring. Similar recession coefficients but considerably different snowmelt recession volumes suggest that while aquifer flow regimes and structural characteristics are similar at these R aquifer springs, their recharge regions are different.

Fig. 9
figure 9

Mean and standard error bar comparison between Roaring, Emmett, Tapeats, and Abyss River springs of mean recession coefficients for the a conduit flow regime, i.e. α1, b fracture flow regime, i.e. α2, and c matrix flow regime, i.e. α3 (all in days–1), and d the mean volume discharged during snowmelt recessions in m3

There is no significant difference in mean flow variability between Roaring Springs, Emmett Spring, Tapeats Spring, and Abyss River Spring (Fig. 10). However, there is a significant difference in mean annual flow between the four springs, which is expected due to their differences in volume discharged during snowmelt recession; nevertheless, the only differences are between Tapeats Spring and Roaring Springs and between Tapeats Spring and Abyss River Spring. It is probable that the extremely low sample size for Emmett Spring (n = 2 years) prevented an accurate comparison with Roaring, Tapeats, and Abyss River springs for this parameter.

Fig. 10
figure 10

Mean and standard error bar comparison between Roaring, Emmett, Tapeats, and Abyss River springs of a annual mean flow in m3/s and b annual mean flow variability in m3/s

The matrix flow recession coefficient (α3) did get smaller at Tapeats Spring, meaning baseflow dropped more slowly over time. However, there are no strong or statistically significant temporal trends for any of the other parameters at Tapeats Spring, nor for any of the other four springs (Figs. 11, 12, 13, 14 and 15). This lack of a significant change in these parameters over time is likely due to small sample sizes (n = 4 – 11 years).

Fig. 11
figure 11

Linear regressions across multiple years for each of the flow variables at Roaring Springs, where plots ad are the timing of snowmelt recessions measured in days into the water year, plots eg are the recession coefficients measured in days–1, plot h is the volume discharged during snowmelt recession, and plots ij are flow statistics measured in m3/s. The strength of the relationships (r2) and their significance (p) are indicated on these plots. None of the trends in flow data at Roaring Springs are strong or significant

Fig. 12
figure 12

Linear regressions across multiple years for each of the flow variables at Emmett Spring, where plots ad are the timing of snowmelt recessions measured in days into the water year, plots eg are the recession coefficients measured in days-1, plot h is the volume discharged during snowmelt recession, and plots ij are flow statistics measured in m3/s. The strength of the relationships (r2) and their significance (p) are indicated on these plots. The timing of return to baseflow, i.e. end of the snowmelt recession, is occurring later into the water year (d)

Fig. 13
figure 13

Linear regressions across multiple years for each of the flow variables at At Last Spring, where plots ad are the timing of snowmelt recessions measured in days into the water year. The strength of the relationships (r2) and their significance (p) are indicated on these plots. None of the trends in flow data at At Last Spring are strong or significant

Fig. 14
figure 14

Linear regressions across multiple years for each of the flow variables at Tapeats Spring, where plots ad are the timing of snowmelt recessions measured in days into the water year, plots eg are the recession coefficients measured in days-1, plot h is the volume discharged during snowmelt recession, and plots ij are flow statistics measured in m3/s. The strength of the relationships (r2) and their significance (p) are indicated on these plots. The matrix recession coefficients at Tapeats Spring, i.e. α1, are getting smaller over time

Fig. 15
figure 15

Linear regressions across multiple years for each of the flow variables at Abyss River Spring, where plots ad are the timing of snowmelt recessions measured in days into the water year, plots eg are the recession coefficients measured in days-1, plot h is the volume discharged during snowmelt recession, and plots ij are flow statistics measured in m3/s. The strength of the relationships (r2) and their significance (p) are indicated on these plots. None of the trends in flow data at Abyss River Spring are strong or significant

The annual water budgets from 2017–2021 for Roaring Springs show modeled ET volumes to be higher than modeled P volumes for all years, and the estimated error for these parameters is also high (Table 1). When considering the range of error, water years 2017 and 2019 do result in potentially balanced water budgets, i.e. P – ET – (QRoaring) = 0. Otherwise, the annual water budgets are imbalanced.

Table 1 Annual water budgets for Roaring Springs based on delineated recharge area, annual water volume from the spring, and modeled evapotranspiration and precipitation volumes

Discussion

Nonparametric comparisons of snowmelt timing between all five springs, and comparisons of flow statistics between Roaring Springs, Emmett Spring, Tapeats Spring, and Abyss River Spring suggest similar flow regimes and timing of spring response to snowmelt across 50 km of horizontal distance and 1,000 m of vertical distance through the aquifer system. However, the significantly different water volumes discharged between Roaring, Emmett, Tapeats, and Abyss River springs suggests differences in their respective recharge areas, which supports previous estimations of different groundwater basin areas for these springs (Huntoon 1974). Tapeats likely has the largest recharge region on the Kaibab Plateau, Roaring and Abyss River the second largest, and Emmett Spring the smallest. Despite Roaring and Emmett springs being within only 2 km (horizontally) of each other, Emmett Spring discharges from the upper Muav limestone almost 100 m higher in elevation than Roaring Springs, which discharges from the lower Muav limestone (Huntoon 1974). Their annual mean flow and mean volumes discharged during snowmelt are an order of magnitude different. These results further indicate the complexity of this aquifer system, as spatial proximity does not necessarily indicate similar spring discharge magnitudes and may indicate different recharge areas and/or flow paths in this case. The timing of spring response to snowmelt events appears to be similar across a large horizontal distance.

The lack of significant changes over the 3–10 years in recession coefficients, the timing of snowmelt recessions, the total volume discharged during snowmelt, annual mean flow, and flow variation suggest that flow characteristics of Roaring, Emmett, Tapeats, and Abyss River springs have remained relatively stable over the time period analyzed, compared to smaller springs on the South Rim which have smaller catchments and lower mean discharge (Nuyttens 2022). However, the decrease in matrix, or baseflow, recession coefficients at Tapeats Spring (Fig. 14g) may indicate some change in aquifer characteristics. Namely, Tapeats Spring may be experiencing less change in baseflow between recession events. It should be noted that there are only 2–6 useable water years’ data from Emmett Spring, At Last Spring, Tapeats Spring, and Abyss River Spring, making trend analyses less robust than those from Roaring Springs. Furthermore, evaluating climate-related trends in spring discharge in this region may require multidecadal time scales, as regionally relevant climate cycles like the El Niño-Southern Oscillation (ENSO) operate on a multiannual time scale. These limitations therefore necessitate trend analyses of several additional years of flow data at Roaring Springs, Emmett Spring, At Last Spring, Tapeats Spring, and Abyss River Spring to monitor climate-related impacts and spatial trends in spring flow. Multidecadal trend analyses can provide information on R aquifer connectivity and spring responses to changes in climate and have been used successfully in similar karst regions (Sarkar and Fryar 2022). These analyses will be useful for GRCA management to determine if and when the system responds to reduced aquifer recharge as a possible result of climate change.

Annual water budgets at Roaring Springs for three of the five water years are imbalanced, resulting in negative water volumes on the scale of tens of millions of cubic meters of water annually. However, the high propagated error from the modeled P and ET values did result in a range of water volumes from –6.4 × 107 to 4.3 × 107 m3 for 2017 and 2019, which are the 2 years with the highest amount of winter precipitation in the Kaibab Plateau. These results may indicate that unless there is sufficient winter precipitation in the recharge area, there may be a loss of aquifer storage. Therefore, the negative water budgets in the other years may indicate this loss of aquifer storage. However, the lack of significant change in matrix flow, or baseflow, at Roaring Springs (Fig. 11g) and modeled ET values greater than modeled P values suggest that these imbalances may be a result either of inaccurate inflow data, i.e. P and ET, and/or an inaccurate recharge area based on the available evidence.

There are limitations to using the R aquifer vulnerability model in delineating recharge area. Data on faults are limited, and areas of high sinkhole density on the northern portion of the Kaibab Plateau may indicate southwest–northeast-trending faults that are not mapped and therefore not included in the fault data (Fig. 7b). Additionally, overland flow on the Kaibab Plateau is rare, and therefore the sinkhole watershed areas may not be useful in the delineated recharge area. It is also likely that the modeled ET and/or P data are still inaccurate for this region, evidenced by their high error. When modeled ET was subtracted from measured P volume for both the estimated recharge area of 160 km2, as well as for a test area of 0.5 km2 area around the Bright Angel Ranger Station, negative water volumes and imbalanced water budgets still resulted, strongly suggesting that P and ET for this region require additional ground truthing.

Additionally, the absence of data on sublimation volume as an outflow parameter presents an unknown error but is likely to be significant in a semiarid, high-altitude region like the Kaibab Plateau. A study simulating sublimation for a high-elevation, semiarid catchment in the Andes suggested that the ratio of sublimation to snowmelt was anywhere from 10–55% of the total snow ablation in the region (Reveillet et al. 2020). Sublimation may therefore be an important component of the water budget for the Kaibab Plateau that should be measured and included in future water budgets and modeling.

Finally, the volume of water discharged from other C and R aquifer springs sourced from this delineated recharge area are incomplete. Despite being within 2 km of Roaring Springs, the dye trace studies found no direct connection between any of the sinkholes and Emmett Spring. The sinkhole directly connected to Roaring Springs through dye trace is also directly connected to At Last Spring, but a hydrograph for At Last Spring is not yet possible. Additional tracer tests are therefore necessary to identify other springs sourced from this recharge area, as an unknown outflow volume from additional springs may alter the water budgets.

Conclusions

While the available gage data for R aquifer springs is limited, the evaluation of timing, recession, and flow statistics at multiple R aquifer springs further evidence the complexity of this aquifer system. Spatial proximity in this case does not necessarily indicate similar discharged water volumes nor recharge areas, though the timing of spring response to snowmelt events appears to be similar across a large horizontal distance. Hydrographs and recession analyses should be calculated for the remaining gaged R aquifer springs, At Last Spring and Vasey’s Paradise Spring for continued evaluation of spatial and temporal trends in flow. Additional springs in the BAC system should be gaged and additional dye trace studies should be conducted to delineate a recharge area and calculate water budgets for the entire BAC system, as water from BAC will be the park’s main drinking water supply by 2025. By these analyses, the park will be able to track climate-change-related effects at critical spring sites, especially those within the BAC watershed.

These initial trend and water budget analyses highlight significant gaps in data that relevant land management agencies should consider filling. Estimated groundwater budgets are high in error and imbalanced; modeled ET and P data from the best available sources will require ground-truthing on the Kaibab Plateau. Specifically, the park should consider collaboration with the PRISM Climate Group to include measured P data from the Bright Angel Ranger station in the PRISM dataset. The park, and possibly the Kaibab National Forest, should also consider installing eddy covariance (EC) instrumentation on the Kaibab Plateau to provide in situ estimates of ET to better constrain this parameter. ET rates are significantly variable across the landscape (Melton et al. 2021), and the Roaring Springs water budgets presented here suggest interpolated ET from the best available models is not sufficient for this high-elevation semiarid region. Additionally, the volume of snowpack sublimation versus snowmelt should be measured or modeled as a component of future BAC water budgets. Without these steps, annual water budgets at the level of accuracy needed for management decisions and policy-making will not be possible. Additionally, the recharge area for Roaring Springs, Emmett Spring, and eventually the entire BAC watershed, should be better constrained through a follow-up dye trace study to identify additional sinkhole-to-spring connections. Collecting these data and conducting another dye trace study will provide a more accurate and well constrained baseline for numerical groundwater modeling of the R aquifer.

These analyses of R aquifer spring flow and trends underline aquifer complexity and should be used as a baseline for continued spatiotemporal characterization. Park management can use these results and follow these recommendations to ensure well-informed decision-making and planning for any forecasted changes in R aquifer characteristics that may affect the spring sources on which humans and wildlife depend.