Background

Streams and rivers are dynamic systems that are characterized by temporally and spatially varying conditions. The complexity of these systems produces a wide range of geomorphic features and habitats that support diverse ecological communities (Maddock et al. 2013), and a natural streamflow regime serves an important role in maintaining biological diversity and ecological integrity (Dunne and Leopold 1978; Karr 1991; Richter et al. 1996; Poff et al. 1997). Streamflow characteristics may affect aquatic life directly or indirectly through interconnections with stream habitat, channel substrate, nutrient flux, and connectivity (Richter et al. 1997; Bunn and Arthington 2002; Annear et al. 2004; Poff and Zimmerman 2010; Kennen et al. 2013). Furthermore, streamflow is often considered a “master variable” that affects water chemistry and quality and limits the distribution and abundance of riverine species (Resh et al. 1988; Power et al. 1995), and regulates the ecological integrity of flowing water systems (Poff et al. 1997).

Streamflow patterns vary seasonally and throughout larger timescales, and patterns can be described by five components of hydrologic condition: magnitude, frequency, duration, timing, and rate of change. Definitions of these components are provided in Poff et al. (1997), and these components are used to characterize the range of flows that shape river ecosystems (Richter et al. 1996; Poff et al. 1997; Olden and Poff 2003). Aquatic ecosystems are sensitive to alterations in streamflows, and alterations in the frequency, timing, and rate of change of streamflow can affect aquatic ecosystems as much as the change in overall magnitudes of flow (Richter et al. 1996; Poff et al. 1997; Olden and Poff 2003). Hydrologic metrics typically computed from measured or modeled daily streamflow records are often used to quantify aspects of the flow regime and study altered hydrology (Richter et al. 1996; Poff et al. 1997; Olden and Poff 2003).

Previous studies have evaluated the extent of hydrologic alteration in Minnesota and demonstrate that streamflows in many rivers in Minnesota have been altered substantially (Novotny and Stefan 2007; Lenhart et al. 2011; Peterson et al. 2011; Schottler et al. 2013; Ziegeweid et al. 2015). Anthropogenic activities that alter streamflows in Minnesota include but are not limited to (1) withdrawal of water for agricultural or municipal uses; (2) installation of subsurface tile drains in agricultural areas; (3) creation of more impervious surface in urban areas; (4) operation of dams, and (5) discharge of treated wastewater effluent into streams (Novotny and Stefan 2007; Lenhart et al. 2011; Peterson et al. 2011; Schottler et al. 2013). Trends in streamflow have been observed throughout Minnesota (Novotny and Stefan 2007; Peterson et al. 2011; Ziegeweid et al. 2015; Krall et al. 2019), and directions and magnitudes of trends vary based on the hydrologic landscape unit (HLU) in which the stream is located (Wolock et al. 2004; Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016). Periodicities in streamflow trends have been observed in the Mississippi, Minnesota, and Red River of the North Basins, with the amplitudes of periodicities becoming stronger after 1980 (Novotny and Stefan 2007). In addition, streamflow magnitudes increased in agricultural watersheds after 1980, with possible links to increased subsurface tile drainage (Lenhart et al. 2011). Increases in streamflow magnitudes likely are the result of shifts from small grains and forage crops to intensive row crop agriculture (corn and soybeans) in the late 1970s, with a corresponding increase in subsurface tile drains in agricultural areas (Lenhart et al. 2011; Schottler et al. 2013). Land cover and land management had a greater effect on hydrologic variability than variation in annual precipitation (Lenhart et al. 2011; Peterson et al. 2011; Schottler et al. 2013).

Additional studies have examined the effects of variability in landscape and hydrologic metrics on biological responses in specific watersheds of Minnesota. A large interagency study of streams within the Lake Superior Basin of Minnesota developed a classification system (Cai et al. 2015) that was used to compute environmental flow statistics (Herb et al. 2015a) and develop models (Herb et al. 2015b) to examine changes to stream communities based on predicted climate and landscape changes in Minnesota (Herb et al. 2015c). In addition, McKay et al. (2019) used modeled flow data to develop flow–ecology relations for streams in the Minnesota River Basin and applied relations to six future land-use scenarios. Finally, Poff and Allan (1995) found significant relations between hydrologic factors computed from measured streamflow data and fish assemblage data collected in Minnesota and Wisconsin that could not be explained by zoogeographic constraints; of the nine sites in Minnesota, seven were part of the Minnesota River Basin in southern Minnesota. However, previous studies do not specifically examine the effects of streamflow alteration on fish communities throughout Minnesota, and quantitative relations between streamflow alteration and differences among fish communities throughout the diverse HLUs of Minnesota (Fig. 1A–F; Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016) have not been developed.

Fig. 1
figure 1

Map illustrating paired U.S. Geological Survey streamgages and Minnesota Pollution Control Agency biological sampling sites across 5 hydrologic landscape units previously identified for Minnesota (Ziegeweid et al. 2015); numbers assigned to sites represent map numbers presented in a published companion data release (Krall et al. 2022)

The Minnesota Pollution Control Agency (MPCA) uses macroinvertebrate- and fish-based indices of biotic integrity (IBIs) to assess the biological condition of stream reaches against biological criteria in state water-quality standards and identify those below the criteria as impaired. A stressor identification process is then completed to identify stressors to be addressed through the state’s watershed approach (MPCA 2014a, 2014b). Streamflow alteration has been identified as a key stressor on aquatic life in many streams, but there has been limited evaluation of what aspects of flow alteration potentially affect fish and macroinvertebrate conditions in Minnesota. The presence of an extensive biological monitoring database and long-term streamflow data from streamgages in Minnesota provides the opportunity to evaluate the extent of flow alterations in Minnesota rivers and streams using hydrologic indices and to identify streamflow-sensitive metrics of aquatic-life condition.

Several software packages and synthesis approaches have been developed to calculate hydrologic metrics, identify and select biologically relevant hydrologic parameters, develop streamflow–ecology relations, and apply streamflow–ecology relations in water-resources management. The Indicators of Hydrologic Alteration (IHA) was an early software package developed by The Nature Conservancy (Richter et al. 1996). Thirty-two hydrologic parameters were selected as ecologically relevant for use in the IHA to evaluate the hydrologic condition of pre- and post-effect periods for a hydrologic data series. Measures of central tendency (mean, median) and dispersion (standard deviation) are calculated for each parameter to provide 64 statistics with which to evaluate differences in flows between periods. The Hydrologic Index Tool was developed by the U.S. Geological Survey (USGS) as part of the Hydroecological Integrity Assessment Process for use in characterizing streamflows and assessing hydrologic alteration using 171 ecologically relevant indices (Henriksen et al. 2006; Kennen et al. 2007).

More recently, the USGS developed the R package EflowStats to compute the 171 ecologically relevant hydrologic indices using the open-source R software environment to simplify the process (Henriksen et al. 2006; Thompson et al. 2013). The 171 ecologically relevant hydrologic indices represent the five components of a flow regime (magnitude, timing, duration, frequency, and rate of change), and these components are split into subcategories in EflowStats. Magnitude and timing metrics are split into three subcategories representing high- (mh and th, respectively), average- (ma and ta, respectively), and low-flow (ml and tl, respectively) conditions. Duration and frequency metrics are split into two subcategories representing high- (dh and fh, respectively) and low-flow (dl and fl, respectively) conditions. Rate-of-change metrics are only based on average-flow conditions (ra). EflowStats includes seven additional indices developed for use at continental spatial scales (Archfield et al. 2014), for a total of 178 indices.

The primary objective of this study was to develop statewide flow–biology relations for 134 different biological metrics computed by the MPCA using three different datasets of hydrologic metrics computed using EflowStats (Henriksen et al. 2006; Thompson et al. 2013) that represent total periods of streamflow record and ratios of hydrologic metrics computed from different periods of hydrologic record to estimate long- and short-term changes in hydrology. A secondary objective was to use several methods to evaluate developed regression models and identify subsets of biological metrics in each of the six classes of biological metrics and hydrologic metrics computed from three different hydrologic datasets that demonstrate the strongest flow–biology relations for streams throughout Minnesota. Study results are presented in a way that can be (1) easily interpreted by resource managers; (2) easily incorporated into decision-support frameworks, such as the tiered-aquatic life use (TALU) framework (Yoder 2012) developed for the MPCA or the ecological limits of hydrologic alteration (ELOHA) framework developed by Poff et al. (2009), and (3) easily applied to evaluations of stream restoration projects developed by the Minnesota Department of Natural Resources (MNDNR 2010). The final objective of this study was to develop flow–biology relations using methods that could be easily applied to any watersheds outside of Minnesota with long-term streamgage and fish-community sample data.

Materials and methods

The Materials and methods section is divided into several subsections. The “Biological datasets” subsection describes how fish-community sample data collected by the MPCA were used to compute biological metrics. The “Hydrologic datasets” subsection describes how long-term streamflow records were compiled and used to compute hydrologic metrics. The “Paired site selection” subsection describes how the biological and hydrologic datasets were combined for flow–biology analyses. The “Statistical analysis” subsection describes best subset analyses used to compute regression models for each of the biological metrics in each of the three paired datasets. The “Data synthesis” subsection describes additional methods used to (1) identify the hydrologic metrics that most frequently occur as explanatory variables in computed regression models and (2) identify the strongest flow–biology relations for each category of biological metrics. Detailed definitions of all hydrologic and biological metrics used in analyses, data files for paired biological sites and streamgages, lists of final regression models for all biological metrics in each dataset, and R scripts to run described analyses are published in Krall et al. (2022).

Biological datasets

Biological metrics were computed from fish-community survey data collected during a single visit to each site between mid-June and mid-September (MPCA 2009, 2017) and were retrieved from the MPCA Environmental Data Application (https://www.pca.state.mn.us/environmental-data). Data used in analyses were limited to samples collected from 1996 to 2015 because fish-community samples collected prior to 1996 were collected using different protocols. Fish-community samples were collected using electrofishing surveys according to established agency protocols (MPCA 2009, 2017), and biological metrics representing the categories composition, habitat, life history, reproductive, tolerance, and trophic metrics were calculated from fish-community data according to standardized protocols (MPCA 2014a). A total of 134 biological metrics were used as response variables in statistical analyses. Additional metrics that were not broadly applicable across streams and rivers contained within HLUs were excluded from analyses, such as metrics specific to coldwater trout streams. In addition, metrics based on count data were excluded because similar metrics were available using percent of individuals or percent of taxa in a sample, and distributions of the percent metrics were approximately normal. The final two biological metrics were the composite fish-based index of biotic integrity (FIBI) and FIBI scores normalized to an impairment threshold of the corresponding stream class based on “biological condition gradient 4” (FIBI_BCG4, MPCA 2014a, 2016). The FIBI and FIBI_BCG4 metrics were computed for comparison to a recent study that used stepwise linear regression techniques to develop relations between hydrologic explanatory metrics and biological response metrics computed from collected macroinvertebrate community samples involving macroinvertebrates (Fitzpatrick 2018).

Hydrologic datasets

Continuous, long-term streamgages were identified from the USGS National Water Information System (NWIS, U.S. Geological Survey 2019), and annual records of daily mean streamflows were evaluated to determine suitability for use in this study. Suitable streamgages contained at least one period with a minimum of 10 years of consecutive and complete water years of data (Novak et al. 2016, U.S. Geological Survey 2019, Krall et al. 2022). A water year represents the period from October 1st through the following September 30th and is defined by the year in which it ends. Streamgages were excluded if substantial effects of regulation or diversion were noted in a previous study (Ziegeweid et al. 2015). Periods of streamflow record during the 1945–2015 water years were used in analyses. To be included in the datasets, streamgages also needed a complete water year of record during the year in which the corresponding biological samples were collected.

A total of 173 hydrologic metrics were computed with EflowStats (Henriksen et al. 2006; Thompson et al. 2013; Archfield et al. 2014) using complete water years of hydrologic record. Five of the 178 hydrologic metrics typically calculated using EflowStats (Henriksen et al. 2006; Thompson et al. 2013; Archfield et al. 2014) were excluded from the analyses because of a disproportionately high number of zero values that could not be used in ratios representing metrics computed for two different periods, resulting in a total of 173 hydrologic metrics included in the analyses. The excluded metrics represented low-flow duration, timing, and frequency: dl18 (number of zero-flow days), dl19 (variability in number of zero-flow days), dl20 (number of zero-flow months), tl3 (seasonal predictability of low flow), and fl3 (frequency of low pulse spells; Henriksen et al. 2006; Thompson et al. 2013). Also, the lam1 (arithmetic mean streamflow) metric from the seven additional indices developed for use at continental spatial scales (Archfield et al. 2014) represents the same computed value as the ma1 (mean of daily mean flow values for entire flow record) metric from the original 171 ecologically relevant indices (Henriksen et al. 2006; Kennen et al. 2007). Both metrics were computed in EflowStats and included in all hydrologic datasets, but the two metrics were not used together in the analyses of flow–biology relations described later in the Methods section (Krall et al. 2022).

Three different datasets of hydrologic metrics were developed to explore (1) general relations between biological responses and period-of-record hydrologic metrics; (2) biological responses to long-term hydrologic changes, and (3) biological responses to short-term hydrologic changes. For all three hydrologic datasets, missing years of streamflow record were dealt with by calculating hydrologic metrics for all continuous periods and computing weighted averages of the metrics (Helsel et al. 2020) based on the proportion of the entire usable record represented by each continuous period. All streamgages with missing years of record used in analyses had at least one continuous 10-year period of streamflow record.

Period-of-record (POR) hydrologic metrics were computed using all available complete water years of hydrologic record starting with the 1945 water year and ending with the water year in which the one-time biological sample was collected. The 1945 water year was selected as a cutoff point for a few reasons. First, the records of some streamgages in in the dataset started specifically in the 1945 water year (U.S. Geological Survey 2019). Second, the proportion of streamgages suitable for the dataset and with records that extend further back in time than the 1945 water year was small relative to the total number of suitable streamgages. Third, starting with the 1945 water year created similar time periods for evaluating long-term hydrologic change pre/post-1980 and were similar to periods in a previous dataset (Krall 2019). Fourth, starting with the 1945 water year excludes the unusually dry period of the Dust Bowl in the 1930s (Schubert et al. 2004). The full periods of available record for each streamgage used in the analyses can be obtained through the NWIS (U.S. Geological Survey, 2019) using the USGS station numbers provided in Krall et al. (2022).

Long-term change (LTC) hydrologic metrics were computed by taking the ratios of the hydrologic metrics computed post- and pre-1980 water year. The 1980 water year was selected as the change point based on previous studies that demonstrated that trends in streamflow not attributed to precipitation began throughout Minnesota around 1980 (Lenhart et al. 2011; Schottler et al. 2013; Ziegeweid et al. 2015). In addition, a shift in agricultural practices was noted between 1975 to 1980, when crop rotation practices began to shift from small grains and forage crops to intensive row crop agriculture (corn and soybeans) in the late 1970s, with a corresponding increase in artificial drainage (Lenhart et al. 2011; Schottler et al. 2013). The post-1980 period hydrologic metrics were computed using all available complete water years of data starting with the 1981 water year and ending with the water year in which the one-time biological sample was collected, from 1996 through 2015. Pre-1980 hydrologic metrics were computed using all available complete water years of data starting with the 1945 water year and ending with the 1979 water year. A minimum of 10 years of continuous streamflow record in both pre- and post-1980 periods was required for inclusion of a streamgage in the dataset (Krall et al. 2022). The ratios of the post-1980 metrics to the pre-1980 metrics were used as the final hydrologic metrics (explanatory variables) in regression analyses to demonstrate long-term hydrologic changes and relate the long-term changes to biological responses observed in fish communities throughout Minnesota. The 1980 water year was excluded from computation of pre- and post-1980 hydrologic metrics to create even periods of available streamflow record pre- and post-1980. Also, final hydrologic metrics were ratios of computed post-1980 to pre-1980 hydrologic metrics, and excluding 1980 as the “change” year prevented issues with potentially biasing results by either assigning 1980 to one period or including 1980 in both periods.

Short-term change (STC) hydrologic metrics were computed by first computing hydrologic metrics using the last 10 complete years of hydrologic record and dividing these metric values by the POR metric values calculated for the streamgage as described earlier in this section. Periods of record for streamgages included in the STC dataset ranged from 22 to 71 years (Krall et al. 2022). The final year of the last 10 years of record was the year in which the one-time biological sample was collected for the paired streamgage data and biological samples. The ratios of the hydrologic metrics computed for these two periods represent the final explanatory hydrologic variables in regression analyses to demonstrate short-term hydrologic changes and relate the short-term changes to biological responses observed in fish communities throughout Minnesota.

The LTC metrics were based on ratios of metrics calculated from two separate time periods, but the STC metrics were based on metrics calculated from overlapping periods, for a couple reasons. First, STC metrics had a less well-defined change point because of the variation in the year of fish-community sampling at the biological site. In contrast, LTC metrics had a more well-defined change point (1980 water year) and at least 10 complete water years of data during pre- and post-change periods. Second, overlapping the last 10 years of complete water year record with the overall POR (from the 1945 water year through the year of fish-community sample collection) allowed us to include additional sites with long (22–35 years) records that did not have 10 years of record prior to 1980 for inclusion in the LTC dataset. These additional sites included one site in HLU region E, a small corner of southwestern Minnesota that is part of the Missouri River Basin and that was not represented in the LTC dataset (Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016). Finally, the approach using the STC dataset could expand the number of available streamgages with more than 10 complete water years of continuous record as new USGS (U.S. Geological Survey 2019) and Minnesota Department of Natural Resources (MNDNR 2019) streamgages continue to collect new streamflow record, thus facilitating future applications of these study methods to paired streamgages and biological sites in Minnesota.

Paired site selection

Paired USGS streamgage data and MPCA biological samples were selected for use in this study and are shown on the map in Fig. 1. A set of a priori criteria were developed to determine whether streamgage data and biological samples could be paired for analysis (Fig. 1). Sites were only paired if the streamgage and biological site were on the same stream, had the same stream order, were within 10 km of each other, had a drainage-area ratio that did not exceed 4:1, and did not have dams, diversions, major tributaries, or natural riverine lakes in between the streamgage and the biological site that was sampled (Lorenz and Ziegeweid 2016). Similar criteria for distance between streamgages and biological sampling locations having the same stream orders were used in Kakouei et al. (2017). These criteria were established to ensure that streamgages and biological sites would respond similarly to precipitation events, variations in climate, and land-use changes. Pairs of biological samples and streamgage data periods were used to compute 134 biological metrics (response variables) and 173 hydrologic metrics (explanatory variables) for use in regression analyses to examine flow–biology relations. The same biological samples collected during 1996–2015 were paired with hydrologic metrics computed for the POR, LTC, and STC hydrologic datasets, resulting in sample sizes (n) of 54, 39, and 48 for flow–biology relations developed using the POR, LTC, and STC hydrologic datasets, respectively. The sample sizes (n) of each paired streamgage data and biological sample are different because some of the streamgages used to compute POR hydrologic metrics did not meet the previously described criteria for use in the LTC or STC hydrologic datasets.

When appropriate, a single streamgage was paired with multiple biological samples to increase the representation of variability in biological communities over space and time. Multiple biological samples collected from different sites were paired with the same streamgage if both biological sites met the a priori criteria established in the above paragraph. If more than one biological sample was collected from the same sample site within the same year, only one randomly selected sample was used in the analysis. Multiple biological samples collected from the same biological site were included in the dataset if samples were collected at least five water years apart, which represents half of the minimum period of record required for streamgages to be included in presented analyses.

Data ranges for watershed characteristics of streamgages and biological sites from each of the three datasets (Krall et al. 2022) are included here to establish limits on transferring results of this study to other sites in Minnesota that were not included in this study. Among all three hydrologic datasets (POR, LTC, STC), stream orders ranged from 4 to 7, and streams represented the following three MPCA fish stream classes: northern streams, northern rivers, and southern rivers (MPCA 2014a, 2016). In the POR dataset (n = 54), the number of water years of streamflow record ranged from 10 to 71, with 91 and 80 percent of streamgages exceeding 30 and 40 years of streamflow records, respectively (Krall et al. 2022). In the STC dataset (n = 48), the number of water years of streamflow record ranged from 22–71, with 92 and 81 percent of streamgages exceeding 30 and 40 years of streamflow records, respectively (U.S. Geological Survey 2019). All five HLUs in Minnesota (Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016) were represented in the POR and STC datasets. In the LTC dataset (n = 39), the total number of water years of record ranged from 39 to 71, with 95 percent of streamgages exceeding 40 total years of streamflow records between the pre- and post-1980 water year periods (U.S. Geological Survey 2019). Only four of the five HLUs in Minnesota were represented in the LTC dataset; the LTC dataset did not contain any paired biological samples/streamgage records from region E in southwestern Minnesota, which represents the portion of the Missouri River Basin contained in Minnesota (Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016).

Statistical analysis

All statistical analyses were completed using the R statistical environment, version 3.6.1 (R Core Team 2019). Information about specific R packages other than EflowStats, R scripts, and original data files are published in Krall et al. (2022). Statistical analyses described in this section were applied to all three datasets of paired hydrologic metrics (explanatory variables) and biological metrics (response variables). A level of significance (α) of 0.05 was selected for all analyses.

A best subset linear regression analysis process was automated in the R Statistical Environment (R Core Team 2019) to iteratively select the three best one-, two-, and three-variable linear regression candidate models (based on adjusted-R2 values) that describe the relation between each biological metric (response variables) and the one, two, or three best hydrologic metrics (explanatory variables, Krall et al. 2022). Candidate models were limited to three or less explanatory variables to reduce overfitting and multicollinearity. A total of nine candidate models were developed for each of the 134 biological metrics (response variables) using one, two, or three of the 173 hydrologic metrics (explanatory variables) computed using each of the three hydrologic datasets (POR, LTC, STC). This resulted in 1206 candidate models per hydrologic dataset, a total of 3618 candidate models, and 402 final selected regression models (Krall et al. 2022).

Diagnostic statistics and plots generated using the automated best subset regression process were used to address the assumptions of multiple linear regression, assess model fits, and ultimately select the best overall linear model that explains the most variability for the specific biological response metric (Helsel et al. 2020; Krall et al. 2022). Variance inflation factors (VIFs) were used to minimize multicollinearity in developed regression models (Marquardt 1970; Helsel et al. 2020). The predicted residual error sum of squares (PRESS) statistic was used as a form of model cross-validation to provide an estimate of prediction error (Helsel et al. 2020). Plots of residuals versus leverage were used to ensure that there were no influential points that biased the regression models. The Breusch–Pagan test (Breusch and Pagan 1979) and scale-location plots (Helsel et al. 2020) were used to evaluate homoscedasticity in regression models. Plots of residuals versus fitted values and plots of observed versus fitted values were used to ensure that relations among response and explanatory variables were approximately linear. Correlations between residuals and quantiles of a normal distribution (QQ plot, Helsel et al. 2020) were used to confirm that residuals were approximately normally distributed. Pearson correlation matrix plots were generated to evaluate multicollinearity between all hydrologic metrics used in the set of candidate models (Helsel et al. 2020).

The primary quantitative criteria that were prioritized for selections of the final regression models from the pool of candidate models included (1) high pseudo-R2 values relative to other candidate models; (2) low PRESS statistics relative to other candidate models; (3) VIF values < 5, and (4) Pearson correlation coefficients with absolute values less than 0.70 for all hydrologic metrics (explanatory variables, Dormann et al. 2013; Kakouei et al. 2017; Lynch et al. 2018). If these criteria were met, graphical plots were compared to ensure that each final selected regression model represented data that were approximately homoscedastic, independent, and normal. Best professional judgments of the authors were used to select final regression models from graphical plots of candidate models for all 134 biological metrics used as response variables for all three hydrologic datasets. Final selected regression equations were reasonable and did not violate assumptions of multiple linear regression (Helsel et al. 2020).

The final selected models for each of the 134 biological response metrics were compiled for each of the three datasets (Krall et al. 2022) and examined for further analysis. However, some of the biological metrics included zero values that could affect estimates of uncertainty. Therefore, the final regression model for each biological metric was re-computed using left-censored regression (Cohn 1988; Breen 1996; Helsel et al. 2020; Krall et al. 2022). Left-censored regression analyses incorporated the adjusted maximum likelihood estimation method (AMLE, Cohn 1988), a normal distribution, and a censoring value of 0.1. The R script developed for left-censored regression analysis produced the censored regression model coefficients, standard errors, z-scores, and p-values for the intercept and each explanatory hydrologic metric. Other diagnostic outputs included the unbiased estimated residual standard error, the total number of observations in the dataset, the number and percent of censored observations, the Chi-square value of the model, the model degrees of freedom, the overall model p-value, the pseudo-R2 value (Cohn 1988), Akaike’s Information Criteria (AIC) and Bayesian Information Criteria (BIC) values (Konishi and Kitagawa 2008), and VIFs for the explanatory variables (Marquardt 1970; Helsel et al. 2020).

Data synthesis

Additional methods were used to identify metrics and flow–biology relations that the MPCA and MNDNR can use when developing restorations to manage flows and improve habitat quality for fish communities. Frequencies of occurrence of hydrologic metrics (explanatory variables) used in the 134 regression models (one for each biological metric) for each of the three hydrologic datasets (POR, LTC, and STC) were used to identify the three hydrologic metrics most commonly used as explanatory variables in regression equations, which we assumed represented the hydrologic metrics with the broadest influence on biological metrics that describe stream fish communities throughout Minnesota. Tukey boxplots (Helsel et al. 2020) were used to compare pseudo-R2 values of regression models developed for biological metrics in each of the six categories of biological metrics (composition, habitat, life history, reproductive, tolerance, and trophic) to determine if flow–biology relations were stronger in specific categories of biologic metrics or in specific hydrologic datasets. The two best regression models (based on pseudo-R2 values) and associated estimates of uncertainty (percent of censored data, and root mean square error values) were described for each of the six biological metric categories and each of the three hydrologic datasets, and illustrations of the distributions of modeled versus measured flows for the single best regression model in each of the six categories of biological metrics were generated. Finally, the biological metric SensitiveTxPct (the relative abundance of sensitive taxa in a fish-community sample) was examined further because SensitiveTxPct had strong linear relations for all three hydrologic datasets, did not include any censored values, and is used to calculate fish-based index of biotic integrity (FIBI) scores for all MPCA stream classes represented in the dataset (northern streams, northern rivers, southern rivers; MPCA 2014a; Krall et al. 2022). Lastly, we illustrated how the linear relation between computed SensitiveTxPct values and the dominant hydrologic metric in each of the three datasets changed using high and low values of the other two hydrologic metrics in the regression models.

Results and discussion

Several previous studies use modeled streamflows to simulate unaltered hydrology or develop classification schemes to group streams based on similar characteristics (Richter et al. 1996; Kennen et al. 2007; Poff et al. 2009; Carlisle et al. 2011; May et al. 2015). However, streamflows modeled using regression-based methods can underestimate peak flows and overestimate base flows because of the relatively few number of peak flow data points compared to the number of base flow data points and because of other geomorphic factors that control peak flows (Van Liew et al. 2003; Ziegeweid and Magdalene 2015; Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016). Therefore, this study included only measured streamflow records to ensure that hydrologic metrics based on peak-event thresholds were accurately represented. This limited sample sizes and constrained the development of more elaborate classification schemes. Instead, altered hydrology was examined using ratios of hydrologic metrics calculated for different time periods. Using ratios helped to standardize changes in hydrology across varying stream orders, watershed sizes and characteristics, and hydrologic landscape units throughout Minnesota (Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016).

A subset of the 402 regression models representing the same 134 biological metrics for each hydrologic dataset (POR, LTC, and STC) are presented in this section. The biological metrics described in this section are defined in Table 1, and the hydrologic metrics described in this section are defined in Table 2. All 402 regression models, 134 biological metrics, and 173 hydrologic metrics considered in analyses are presented in Krall et al. (2022).

Table 1 Categories and definitions of biological metrics described in the Results and discussion section
Table 2 Categories and definitions of hydrologic explanatory metrics described in the “Results and discussion” section

We documented significant relations between hydrologic metrics and biological metrics in each category of biological metrics for rivers in Minnesota. Hydrologic metrics included metrics computed from period of record (POR) streamflow data and from ratios of metrics calculated from varying time periods to represent long-term changes (LTC) and short-term changes (STC) in hydrology of rivers in Minnesota. Significant relations (p-values < 0.0001) between hydrologic metrics and biological metrics were identified for all three hydrologic datasets. Based on Tukey boxplots of pseudo-R2 values (Helsel et al. 2020), overall linear relations between biological and hydrologic metrics were strongest using the LTC hydrologic dataset and weakest using the POR dataset, with the exception of tolerance metrics in the POR dataset (Fig. 2). The strength of the linear relations in the LTC dataset may indicate that widespread changes in streamflows did occur around 1980 as described by Lenhart et al. (2011) and Schottler et al. (2013). Linear relations may be weakest in the POR dataset because long- and short-term changes to flow data would be incorporated into these streamflow records, and the variability associated with these changes may obscure relations between hydrologic and biological metrics. Median pseudo-R2 values were most consistent across biological metric categories in the STC dataset, but the exact reasons for this are unknown. Tolerance metrics had the strongest linear relations to hydrologic metrics among all datasets. The FIBI and FIBI_BCG4 metrics were excluded from boxplot comparisons in Fig. 2 because they are calculated from a combination of other biological metrics and because there was only one value for each metric per hydrologic dataset.

Fig. 2
figure 2

Tukey boxplots showing the distribution of pseudo-R2 values among final censored regression models in each of the six categories of biological metrics for three hydrologic datasets (POR, LTC, and STC; Krall et al. 2022); hydrologic datasets are identified above each set of boxplots, sample sizes for each biological metric category are represented at the top of the plots, and outliers are represented by open circles

The frequencies of occurrence of hydrologic metrics (explanatory variables) among the 134 final regression models associated with each hydrologic dataset were used to identify the three hydrologic explanatory variables that have the broadest influence over fish communities in each of the three datasets (Table 3). The most common hydrologic metrics were significant explanatory variables in 12–18 out of 134 regression models of biological metrics among each of the three hydrologic datasets (Table 1; Krall et al. 2022). The most common metrics for each hydrologic dataset included hydrologic metrics representing flow variability and some aspect of bankfull streamflow, which is represented by a 1.67-year recurrence interval and is the most geomorphically active flow in streams and rivers (Dunne and Leopold 1978; Poff and Allan 1995; Rosgen 2006; Fitzpatrick and Peppler 2010). Similarly, Lynch et al. (2018) used hydrologic metrics computed using EflowStats and a best subsets approach to multiple linear regression to determine that reach-scale habitat quality and geomorphology were the most important influences on community structure in stream ecosystems of the Ozark Highlands. The most common metrics in the LTC and STC datasets included an explanatory metric representing an aspect of seasonal predictability (Table 3).

Table 3 Frequency of occurrences and definitions for the three most commonly used hydrologic metrics in final 134 regression models associated with each of three hydrologic datasets

The frequency of occurrence results (Table 3) demonstrated that bankfull streamflow, measures of streamflow variability, and measures of seasonal predictability are hydrologic factors that influence fish communities in rivers and streams of Minnesota. Similarly, Fitzpatrick (2018) found that metrics related to streamflow variability and the time between bankfull flows were the best predictors macroinvertebrate-based indices of biotic integrity (MIBI) normalized to the impairment threshold for streams in Minnesota. Bankfull streamflows have the greatest potential for channel-altering geomorphic processes, and these geomorphic changes can alter the quality and availability of fish habitat (Leopold et al. 1964; Dunne and Leopold 1978; Rosgen 2006; Fitzpatrick and Peppler 2010). Finally, Poff and Allan (1995) found that in Minnesota and Wisconsin, hydrologic factors were significant environmental variables influencing fish-community structure, while zoogeographic constraints did not explain the observed relations between stream hydrology and the functional organization of fish assemblages.

The two best regression models in each biological metric category (based on pseudo-R2 values) and for each hydrologic dataset are presented in Table 4. Relations between hydrologic metrics and biological metrics presented in Table 4 were significant (p-values < 0.0001). Pseudo-R2 values, percent of censored data, and root mean square error values were used to assess fit and uncertainty associated with these regression models and are also presented in Table 4 (Helsel et al. 2020). Sample sizes for each dataset were limited by the availability of streamflow records long enough to complete the described analyses. Pseudo-R2 values for regression equations in Table 4 ranged from 0.388 to 0.788, and the four regression models with the strongest linear relations explained more than 70 percent of the variation in the biological metric using three hydrologic metrics (Table 4). Percentage of censored values for equations shown in Table 4 ranged from 0 to 33.3 percent, and root mean square error values ranged from 3.63 to 19.8 percent. Modeled versus measured biological metric values were plotted with a 1:1 line (Figs. 3, 4, 5) to illustrate residuals in the best linear relations developed for each of the six biological metric categories (composition, habitat, life history, reproductive, tolerance, trophic) using each of the three hydrologic datasets (POR, LTC, STC). Plots of measured (based on field data) versus modeled (based on regression models) biological metric values for the single best regression model in each biological metric category are presented in Figs. 3, 4, and 5 for the POR, LTC, and STC datasets, respectively. Plots in Figs. 3, 4, and 5 also illustrate the regression equations for the best linear relation in each category of biological metrics for each hydrologic dataset, the number of censored values in the regression, and pseudo-R2 values indicating how much of the variation in the biologic metric is explained by the hydrologic metrics.

Table 4 The two best regression models and fit statistics for six categories of fish-based biological metrics across three hydrologic datasets; see Table 1 for definitions of the biological metrics (response variables) and Table 2 for definitions of the hydrologic metrics (explanatory variables)
Fig. 3
figure 3

Measured versus modeled value plots for the single best regression model in each of six biological metric categories and the period of record (POR) hydrologic dataset as presented in Table 4 (Krall et al. 2022): a composition, b habitat, c life history, d reproductive, e tolerance, and f trophic

Fig. 4
figure 4

Measured versus modeled value plots for the single best regression model in each of six biological metric categories and the long-term change (LTC) hydrologic dataset as presented in Table 4 (Krall et al. 2022): a composition, b habitat, c life history, d reproductive, e tolerance, and f trophic

Fig. 5
figure 5

Measured versus modeled value plots for the single best regression model in each of six biological metric categories and the short-term change (STC) hydrologic dataset as presented in Table 4 (Krall et al. 2022): a composition, b habitat, c life history, d reproductive, e tolerance, and f trophic

The highest observed pseudo-R2 value was from the POR regression model describing the biological metric “IntolerantTxPct”, which represents the percentage of intolerant taxa in a fish-community sample (Fig. 3E). Additional descriptions of how intolerant taxa are defined can be found in (MPCA 2014a). About 20 percent of the IntolerantTxPct values were censored in regression models for all hydrologic datasets (Table 4). Streams with censored values for IntolerantTxPct were primarily streams that are classified as “impaired” for aquatic life condition (Krall et al. 2022) and likely did not have the water- or habitat-quality to sustain “intolerant” species that are sensitive to physical requirements. For non-censored values, the IntolerantTxPct metric seems to relate to a gradient of stream conditions in Minnesota that can be represented by hydrologic metrics computed using EflowStats.

Regression models based on FIBI scores and FIBI_BCG4 scores (normalized to the impairment threshold for the stream class of the paired sites) are presented in Table 5. Relations between hydrologic metrics and FIBI scores and FIBI_BCG4 scores presented in Table 5 were significant (p-values < 0.0001). Pseudo-R2 values, percent of censored data, and root mean square error values were used to assess fit and uncertainty associated with regression models (Table 5). Pseudo-R2 values ranged from 0.293 to 0.717 (Table 5), and the regression model with the strongest linear relation explained over 70 percent of the variation in FIBI_BCG4 using three hydrologic metrics from the LTC dataset. Linear relations were stronger for the FIBI_BCG4 metric than the FIBI metric for every hydrologic dataset (Table 5). Data used to develop the equations in Table 5 did not have any censored values, and root mean square error values ranged from 0.224 to 14.0 percent. Root mean square error values were lower for the FIBI_BCG4 metric than the FIBI metric for all three hydrologic datasets. Plots of observed versus expected values for regression models representing FIBI_BCG4 for the three hydrologic datasets are presented in Fig. 6. Definitions for biological metrics (response variables) represented in Tables 4 and 5 are presented in Table 1, and definitions for hydrologic metrics (explanatory variables) represented in Tables 3, 4, and 5 are presented in Table 2.

Table 5 Regression models and fit statistics for fish-based indices of biotic integrity (FIBI) scores and FIBI scores normalized to impairment threshold (FIBI_BCG4) for three hydrologic datasets: period of record (POR), long-term change (LTC), and short-term change (STC); see Table 1 for definitions of the biological metrics and Table 2 for definitions of the hydrologic metrics
Fig. 6
figure 6

Measured versus modeled value plots for regression models of fish index of biotic integrity scores normalized to impairment thresholds defined by biological condition gradient 4 (FIBI_BCG4) for stream classes associated with the three hydrologic datasets (Table 5, Krall et al. 2022): a period of record (POR), b long-term change (LTC), and c short-term change (STC)

The two best regression models in the tolerance category for each hydrologic dataset had identical biological metrics (IntolerantTxPct and SensitiveTxPct, Table 4), and some overlap in the biological metrics used in the best regression models was observed among other biological metric categories and the three hydrologic datasets (Table 4). The most common hydrologic metrics in regression equations were related to bankfull flows and aspects of flow variability, especially seasonal predictability. The 134 regression models associated with each hydrologic dataset represent paired sites throughout the State of Minnesota and encompassed differences in stream orders, hydrologic landscape units (Ziegeweid et al. 2015), and watershed sizes (Krall et al. 2022).

Most of the left-censored regression models presented in Table 4 and all the FIBI_BCG4 models presented in Table 5 had pseudo-R2 values greater than 0.50, indicating that more than 50 percent of the variability in the biological metric could be explained using only two or three hydrologic metrics. Regression results presented in Tables 4 and 5 support the concept of streamflow as a master variable controlling ecosystem processes (Poff et al. 1997).

Most regressions not presented in Tables 4 and 5 had pseudo-R2 values less than 0.50, which indicates that factors other than hydrology likely explain more of the variation in observed values for many biological metrics. This study did not consider other factors affecting biological responses, such as land use, climate change, natural resource management activities, or interactions between other components of aquatic food webs. Incorporating these other variables may help explain more variability in biological response metrics. However, the HLUs that comprise Minnesota experience differences in precipitation, land use, and climate (Wolock et al. 2004; Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016). Therefore, incorporating effects of these other factors on a statewide scale would be difficult, and these types of analyses were beyond the scope of this study.

Tolerance metrics had the strongest overall linear relations with hydrologic metrics (Table 4, Fig. 2) for all three hydrologic datasets used in this study (POR, LTC, and STC). Tolerance metrics are based on tolerances to specified flow and water-quality conditions and are more likely to include generalist fish species that are broadly found across streams and rivers of various stream orders and hydrologic landscape units (MPCA 2014a; Ziegeweid et al. 2015). Relations between biological and hydrologic metrics were more variable among datasets for the composition, habitat, life history, reproductive, and trophic categories. These five categories have metrics that are more attuned to specific characteristics of specialist species that may vary more with stream order and may not be as ubiquitously distributed within or among HLUs or throughout the entire State of Minnesota.

The biological response metric SensitiveTxPct (percent of sensitive taxa in a fish-community sample) was evaluated further because this metric had some of the strongest linear relations with hydrologic explanatory metrics (pseudo-R2 ≥ 0.50) in all three datasets (Table 4), and regression models for SensitiveTxPct did not contain any censored values. In addition, SensitiveTxPct is a metric used to calculate fish-based index of biotic integrity (FIBI) scores for all MPCA stream fish classes represented in this study (northern streams, northern rivers, southern rivers; MPCA 2014a; Krall et al. 2022). Additional descriptions of how sensitive taxa are defined can be found in (MPCA 2014a). Figure 7 illustrates how linear relations between SensitiveTxPct and the dominant explanatory variable change using combinations of high and low values of the other two explanatory variables for all three hydrologic datasets. All plotted values are within observed ranges of values in the original datasets.

Fig. 7
figure 7

Plots illustrating the predicted relative abundance of sensitive taxa (SensitiveTxPct) in fish-community samples over a range of the dominant hydrologic explanatory variable and combinations of high and low values of the other two hydrologic explanatory variables for three different hydrologic datasets (Krall et al. 2022): a period of record (POR), b long-term change (LTC), and c short-term change (STC)

Using the POR hydrologic dataset (Fig. 7a), SensitiveTxPct decreased with an increase in the variation in maximum 90-day moving average flows (dh10), a decrease in the maximum number of days per year between flows that exceed a 1.67-year recurrence interval (dh24), and a decrease in the number of events greater than seven times the median flow of the period for which the hydrologic metrics are calculated (fh7). These results suggest that sensitive taxa generally are highest in Minnesota streams with regularly occurring extreme high-flow events, more stable high-flow periods, and longer periods of average-flow conditions between high-flow events. Carlisle et al. (2008) noted that impairments of fish and macroinvertebrate assemblages were strongly associated with agricultural land uses. Increased annual precipitation and subsurface drain tiling throughout most of southern Minnesota can cause intense peak streamflows with fast rises and falls that quickly convey water off the surrounding landscape, reducing the number and duration of stable high-flow periods that help sustain biological communities (Lenhart et al. 2011; Schottler et al. 2013).

Different relations were observed between hydrologic explanatory metrics and biological response metrics using the LTC and STC datasets. Using the LTC hydrologic dataset (Fig. 7b), SensitiveTxPct decreased with increasing changes in variation in October flows (ma33), ratio of base flow to total flow (ml20), and median 90-day minimum flows (dl5). In contrast, using the STC hydrologic dataset (Fig. 7c), SensitiveTxPct decreased with increasing variation in September flows (ma32), increasing maximum July flows (mh7), and an increase in flow predictability (ta2). SensitiveTxPct decreased in both the LTC and STC datasets when variability of typically stable fall flows increased (MNDNR 2020). Decreases in SensitiveTxPct corresponding to increased proportion of baseflow in the LTC dataset could be attributed to increased precipitation (Novotny and Stefan 2007; MNDNR 2019), and the decreases in SensitiveTxPct corresponding to increased flow predictability in the STC dataset could be attributed to subsurface tile drainage bypassing natural infiltration processes and conveying precipitation directly to the stream in a more predictable manner (Lenhart et al. 2011; Schottler et al. 2013; Cowdery et al. 2019; MNDNR 2019). Peak flows due to more intense summer rainfall events are increasing in Minnesota (Novotny and Stefan 2007), and this would help explain observed decreases in SensitiveTxPct with increases in maximum July flows (mh7) in the STC dataset. These results indicate there are both separate and overlapping mechanisms of how long-term and short-term changes in streamflows affect sensitive taxa for streams and rivers in Minnesota. Plots used to compare changes in SensitiveTxPct with changes in values of hydrologic metrics could be generated for other biological metrics of interest (Krall et al. 2022).

Comparing the results for SensitiveTxPct using three different explanatory datasets illustrates the complexity of flow–biology relations through time and the potential effects of temporal variability. However, results from all three datasets demonstrate that increases in the magnitude and variability of typically stable base flow periods contribute to reductions in sensitive taxa in Minnesota. These results indicate that measures designed to slow the transport of water from the surrounding watershed to the stream may help to restore more natural streamflows (Cowdery et al. 2019) and improve conditions for sensitive taxa. Results specific to sensitive taxa are further supported by broad-scale results. Hydrologic metrics related to the timing and number of events above or below bankfull streamflow were some of the most frequently used explanatory variables in final regression models among examined biological metrics (Table 3).

Results presented from this study support results obtained using similar methods to develop flow–biology relations that focus on macroinvertebrate communities in streams throughout Minnesota (Fitzpatrick 2018). For streams with similar stream orders and watershed sizes, macroinvertebrate indices of biotic integrity (MIBI) scores responded most strongly to hydrologic metric dh7 (Fitzpatrick 2018), which is the variability of annual maximum 3-day moving average flows. Among the most significant linear relations presented in Table 4, dh10 (the variability of annual maximum 90-day moving average flows) was tied with ra5 (number of day rises) for the second-most frequently occurring hydrologic metric. The dh7 and dh10 metrics are similar and differ only in the time periods of the moving average maximum flows. The observed difference in significant time periods may represent differences in the timing and duration of life cycle processes between macroinvertebrates and fish. Furthermore, metrics dh6-10 all represent variability in different types of annual maximum flows, and these metrics appear in Table 4 equations eight times, further supporting the relative influence of variability in high-flow conditions on biological metrics. These results also illustrate the importance of having accurate peak flow data, which can often be underestimated when using flows based on regression models to estimate flow time series in Minnesota (Ziegeweid and Magdalene 2015; Ziegeweid et al. 2015; Lorenz and Ziegeweid 2016).

In the POR dataset, FIBI_BCG4 scores responded strongly to dh24, or the maximum number of consecutive days that flows are below bankfull streamflow. Similarly, Fitzpatrick (2018) found that MIBI scores normalized to numeric impairment threshold values of each MIBI class responded strongly to dh22, or the median number of days between flood events greater than bankfull streamflow. Furthermore, hydrologic metrics related to bankfull streamflow were among the most frequently occurring significant explanatory variables in regression models of fish-based biological metrics in the LTC and STC hydrologic datasets (Table 4). These results demonstrate the relative importance of bankfull streamflow in controlling aquatic communities in streams and rivers of Minnesota. Therefore, hydrologic alterations that affect the frequency or magnitude of conditions above bankfull streamflow have the potential to strongly affect macroinvertebrate and fish communities.

Factors affecting study results and future directions

Factors affecting the presented results must be acknowledged to ensure that study results are used properly. First, there is a wide range in dates of available hydrologic and biological data for each site pair. Fish-community samples represent the fish communities of specific stream reaches at a single point in time, and fish-community samples used in analyses were collected over a span of 20 years (1996–2015, MPCA Environmental Data Application, https://www.pca.state.mn.us/environmental-data). In addition, streamgage records did not all start in the same year (U.S. Geological Survey 2019), and records were ended with the year of biological sample collection, which also varied among biological samples (Krall et al. 2022). Thus, the periods of streamflow records were not uniform across sites, and each paired biological sample and streamgage record represent different patterns of climate and land-use changes. These differences could have introduced variability in the results and hindered our ability to identify stronger flow–biology relations, but streamflow records likely were long enough to encompass cyclical patterns of wet/dry cycles (Magdalene et al. 2018) and a wide range of extreme high- and low-flow events.

Streams and rivers in this study are limited to stream orders 4–7, so results presented here are not representative of first, second, or third order streams. Small streams are flashy in nature and may go dry (Cai et al. 2015), resulting in periods of zero flow, which contributes to the lack of streamgages with at least 10 years of continuous streamflows records for low-order streams. This difficulty of operating continuous streamgages in low-order streams is another reason many studies rely on modeled streamflow records to develop relations between streamflow and biology. This study only used measured streamflow data, so study results are not comparable to results of other studies obtained using modeled streamflow data (Poff and Allan 1995; Cai et al. 2015; Herb et al. 2015a,b,c; McKay et al. 2019). However, future studies could focus on developing hydrologic metrics based on modeled flows and comparing them to metrics based on measured flows. Metrics calibrated to modeled flows could be used to predict the effectiveness of planned restoration activities (MNDNR 2010). A similar approach used on the Kootenai River was outlined by McDonald et al. (2016).

This study does not include climate- or landscape-based explanatory variables, but we recognize that climate and landscape likely contributed to values of hydrologic metrics that were used as explanatory variables. Directly incorporating climate- or landscape-based explanatory variables may help explain more of the variability in biological metrics (response variables). However, this study focused only on the relations between altered hydrology and biological responses. We assumed that climate and landscape factors would contribute to alterations in measured streamflows, and we did not make an effort to distinguish the relative contributions of factors that could alter streamflows. Therefore, climate- and landscape-based variables were excluded from presented analyses.

Previous studies of flow–biology relations have incorporated climate- and landscape-based variables into their analyses (Poff and Allan 1995; Carlisle et al. 2011; Cai et al. 2015; Herb et al. 2015c; McKay et al. 2019). In a nationwide study of streams in the conterminous United States, Carlisle et al. (2011) demonstrated that alterations in streamflow were stronger predictors of biological integrity than other physical and chemical factors included in statistical analyses. However, most of the streamflow alterations discussed in Carlisle et al. (2011) were related to diminished streamflows as a result of anthropogenic water withdrawals. Streams in Minnesota typically have ecological issues caused by increases in streamflows because of anthropogenic changes to the surrounding landscape. Previous studies in Minnesota that incorporated climate- and landscape-based variables (Poff and Allan 1995; Cai et al. 2015; Herb et al. 2015c; McKay et al. 2019) focused on specific regions of Minnesota with similar land-use and climate-related issues. Therefore, these studies were not designed to look at flow–biology relations for the entire State of Minnesota. Additional studies could be completed to further link hydrologic, climate, and landscape factors for the entire State of Minnesota.

This study used literature-based evidence to define periods for calculating hydrologic metrics and ratios in LTC and STC datasets (Lenhart et al. 2011; Schottler et al. 2013). However, methods like double-mass curves (Searcy et al. 1960) could be used to define specific years in which hydrologic changes took place for each site in the study. Having specific years associated with hydrologic changes could help improve accuracy of pre- and post-change metric computations. Spatial and temporal patterns of hydrologic changes could be examined to identify causal links for specific hydrologic changes, such as shifts from small-grain crops to corn and soybeans (Lenhart et al. 2011; Schottler et al. 2013).

Finally, this study used only linear regression techniques to develop relations between hydrologic explanatory metrics and biological response metrics. Other statistical methods that can identify nonlinear relations between explanatory and response metrics may describe other aspects of relations between hydrologic and biological metrics. Some commonly used methods include machine-learning approaches like boosted regression trees (Aertsen et al. 2010) or multivariate statistical approaches like principal components analysis (Poff et al. 2010; Rahman et al. 2017). Nonlinear statistical methods could further inform resource managers in Minnesota about relations between altered hydrology and biological responses.

Conclusions

In this study, we developed statewide flow–biology relations in Minnesota for 134 different computed biological metrics using three different datasets of hydrologic metrics representing total periods of streamflow record and ratios of hydrologic metrics computed from different periods of streamflow record to estimate long- and short-term changes in hydrology. Developed regression models represented paired streamgage records and fish-community samples from throughout the State of Minnesota and encompassed differences in stream orders, hydrologic landscape units, and watershed sizes. The three hydrologic metrics most frequently used as explanatory variables in regression models for each hydrologic dataset were assumed to represent the hydrologic metrics that most broadly affect stream fish communities throughout Minnesota. The most commonly used hydrologic explanatory metrics in regression equations were related to bankfull flows and aspects of flow variability, especially seasonal predictability. The regression model with the strongest linear relation for each biological metric category and in hydrologic dataset explained at least 49.8 percent of the variation in the biological metric.

The biological response metric SensitiveTxPct (percent of sensitive taxa in a fish-community sample) is used to calculate fish-based index of biotic integrity (FIBI) scores for all MPCA stream fish classes represented in this study (northern streams, northern rivers, southern rivers) and had some of the strongest linear relations with hydrologic explanatory metrics (pseudo-R2 ≥ 0.50) in all three datasets. Graphical representations demonstrated how changes in hydrologic metric values affected SensitiveTxPct values. Study results can be used to vary the values of the hydrologic metrics and evaluate changes in biological metrics of interest for aquatic life management goals, and this information could be incorporated into decision-support frameworks designed to improve the health of stream fish communities, such as the tiered-aquatic life use (TALU) framework (Yoder 2012) developed by the MPCA or the ecological limits of hydrologic alteration (ELOHA) framework developed by Poff et al. (2009). Results can also be applied to evaluations of hydrologic simulations associated with stream restoration projects developed by the MNDNR (2010) to ensure that restoration activities could address the hydrologic variables that have the strongest effects on aquatic life. Presented methods can be used by researchers to complete statistical comparisons of hydrologic metrics computed from measured and modeled streamflows and expand these flow–biology studies to headwater streams and modeled flow data. Methods in this study used to develop flow–biology relations could be applied to any stream locations outside of Minnesota with long-term streamgage and fish-community sample data.