Background

Nutrient pollution by nitrogen (N) and phosphorus (P)—defined here as nutrient concentrations higher than background or natural levels—is a major stressor of freshwater ecosystems, both across the United States and globally [1,2,3,4,5,6]. Nutrients and resulting stressors (e.g. oxygen depletion) degrade ecosystem services worth more than $2.2 billion annually in the United States alone [7]. Despite recognition by scientists and stakeholders that nutrient pollution and resulting eutrophication (increased ecosystem metabolism) are problems in fresh waters [1, 4, 5, 8, 9], rigorous synthesis of scientific evidence is still needed to inform nutrient-related management decisions and policies, particularly in streams and rivers [10]. There are several factors that complicate nutrient stressor-response relationships in lotic systems. Several potential nutrient constituents (e.g. nitrate, ammonia) can act as stressors. Causal pathways between nutrients stressors and biological effects are complex and include many indirect effects. These pathways also involve diverse assemblages (e.g. algae, macroinvertebrates, fishes) and food web compartments (e.g. “green” pathways involving primary producers, “brown” pathways involving heterotrophic bacteria and fungi [11, 12]); and many interacting environmental factors are also involved, such as land use, flooding, and stream size, affect stressor-response relationships [13,14,15]. Temporal factors also complicate relationships, with legacy (historic) nutrient sources contributing to stressors [3, 16, 17]. Finally, high spatiotemporal variability of both nutrient concentrations [18] and lotic systems more generally [19] can complicate evaluation of stressor-response relationships in these systems. The effects of nutrient increases on biota have been documented in streams and rivers with a variety of biological, chemical, and physical conditions; however, to our knowledge, a synthesis of links between nutrient increases and impacts on stream biota that also addresses the influence of differing conditions across a breadth of lotic systems is lacking [20].

Biota integrate impacts over time and so can better represent ecological condition compared to snapshot water quality measurements [21,22,23,24]. Environmental managers often use this biological information to evaluate impacts of chronic pollution (e.g. [25]). However, high spatiotemporal variability and other factors (e.g. those mentioned above) can mask links between nutrients and biota [26]. A synthesis of nutrient stressor-response relationships and how these relationships are modified by other factors could aid the setting of regulatory limits and identification impacted systems based on biota (e.g. [27]).

Algae are the main primary producers in lotic systems, and algal biomass is expected to be one of the first ecological endpoints to respond to nutrient pollution [28]. Increases in algal biomass are also associated with many of the negative human health and ecological consequences of eutrophication, such as reduced drinking water quality [29, 30] and altered species composition [4]. Chlorophyll a (chl-a) is a photosynthetic pigment used to measure algal biomass [31]. In streams and rivers, researchers may sample benthic chl-a from hard substrates or sestonic chl-a from the water column [31, 32] to determine chl-a concentrations.

This systematic review will compile and synthesize literature on chl-a responses to nutrients in streams and rivers, to provide a state-of-the-science body of evidence for assessing nutrient impacts. The review focuses on total nitrogen (TN) and total phosphorus (TP) concentrations in the water column. These constituents were selected for both ecological and practical reasons. Although dissolved nutrient forms may be more available for immediate uptake by biota, total nutrient forms are often more highly correlated with chl-a [28]. Dissolved forms may undergo rapid uptake and release by primary producers, such that concentrations of dissolved nutrients in the water column may not represent true availability [33, 34]. In contrast, total nutrient forms may best represent trophic state and nutrient limitation in most lotic ecosystems because TN and TP account for N and P held within algae and sediment particles and thus represent integrated measures of biologically available nutrients [26, 34, 35]. TN and TP are also the most common nutrient measures used by environmental managers in the United States and around the globe to assess eutrophication of lotic ecosystems [36].

This review was motivated by a need for comprehensive information on stressor-response relationships to aid water quality scientists at the U.S. Environmental Protection Agency (USEPA) and state environmental agencies in better understanding the effects of nutrient pollution. In several meetings held during 2016–2017, these potential end users helped refine the scope, specific questions and objectives (including the relevant population, exposure, and outcome) of the systematic review, and the modifying factors of interest.

Objective of the review

The primary question addressed by this review is: What is the response of chl-a to TN and TP concentrations in lotic ecosystems? The nutrient stressor (TN or TP) and biotic response (chl-a) were chosen based on measures commonly used by U.S. state agencies to evaluate and make regulatory decisions about impairment of lotic ecosystems due to eutrophication. This question consists of the following components:

Population: Lotic fresh waters, or mesocosms that mimic these systems, in any geographic location.

Exposure: Concentration of TN or TP. We define TN as the sum of ammonia N, nitrate N, nitrite N, and organic nitrogen forms; we define TP as the sum of dissolved and particulate phosphorus forms.

Comparator: Control group (no added TN or TP, or low exposure to TN or TP) (for experimental studies), or comparison to lower or higher TN or TP concentrations across a gradient (for observational studies).

Outcome: Chl-a concentration (sestonic or benthic).

The secondary question addressed by this review is: How are the relationships identified in the primary question affected by other factors? An initial list of potential modifying factors is provided below (see “Methods” and “Potential effect modifiers and reasons for heterogeneity”); others may be added as studies are examined in more detail.

Methods

Search strategy

Search terms and filters

Bibliographic databases will be searched using a combination of terms representing the nutrient stressors (TN or TP), the biological response (chl-a), and habitat- or study-specific terms (e.g. terms associated with types of lotic fresh waters and experimental stream studies) (Table 1). Databases vary in how they handle search strings, so searches will be adapted as needed for each search. An appendix of search strings used for each database will be provided in the full systematic review (see Additional file 1 for an example based on the Web of Science™ database). Books, book chapters, pamphlets and conference abstracts will be excluded from consideration unless they are submitted through calls for additional information (see “Supplemental searches”), because they generally do not have sufficient relevant primary data and results to extract, and non-electronic library resource limitations prevent a full evaluation of these resources. No language restrictions will be applied to database searches, and any other filters used for specific databases (e.g. excluding full text search to limit irrelevant literature) will be detailed in the full systematic review.

Table 1 Search terms to be used for database searches

Databases

At least 16 bibliographic databases, representing peer-reviewed, non-peer-reviewed, and unpublished material, will be searched to obtain articles for the review (Table 2). When databases limit the search results that can be viewed or downloaded, results will be filtered by year, when possible, to obtain subsets for viewing and download. Due to limitations on batch downloading of citations, three databases (DART, National Technical Reports Library, and OpenGrey) will be treated similarly to website searches and the first 50 items returned (for separate searches for TN and TP) will be examined (see below) (Table 2).

Table 2 Bibliographic databases and relevant information

Specialist websites

Websites of the following organizations will be searched for relevant literature:

  • U.S. Environmental Protection Agency

  • U.S. state- and territory-level environmental agency websites (56 total entities)

  • U.S. Department of Agriculture

  • U.S. Forest Service

  • U.S. Fish and Wildlife Service

  • U.S. Geological Survey

  • U.S. National Oceanic and Atmospheric Administration (NOAA)

  • NOAA Fisheries

  • National Park Service

  • World Wildlife Fund

  • American Rivers

  • International Rivers

  • The Nature Conservancy

  • United Nations Environment Program

  • European Environment Agency

  • European Commission Joint Research Center

  • Environment and Climate Change Canada (http://ec.gc.ca/default.asp?lang=En&n=FD9B0E51-1)

  • Fisheries and Oceans Canada

  • Canadian Council of Ministers of the Environment

The first 50 items returned, sorted by relevance, will be examined for each search. For websites without a search function, relevant “publications” sections will be examined to find documents. Because many websites do not accept Boolean search strings, separate searches will be conducted for TN and TP, and a smaller set of terms will be used each of these searches. All website searches will be documented in a spreadsheet that will include the search date, the specific web URL and search terms used for each site, any website sub-sections used, the total number of items returned, and the number of items deemed relevant. Although the specialist website list is biased toward western countries, resource constraints limit our ability to search more broadly in non-English speaking countries. The “Supplemental searches” will be used to increase capture of relevant articles from other countries.

Search engines

Searches using Google and Google Scholar will be conducted, and the first 50 search results will be examined for relevance as with website searches. Separate searches will be conducted for TN and TP, and search terms used for each search will be documented.

Supplemental searches

To supplement these searches, additional resources will be requested from colleagues with disciplinary knowledge and through ECOLOG-L, Twitter, and ResearchGate. “Snowball” searches will also be conducted: references that cite or are cited by a small set of highly relevant literature (see below) will be compiled and any novel references not found during database searches will be evaluated.

Reference management

Articles returned by the search strategy will be stored in an EndNote library. Duplicate entries will be removed, and an initial title screen within EndNote will be used to remove entries that are clearly not relevant (e.g. Front Matter, Meeting Programs and Abstracts, Books Reviewed). The number of entries removed will be recorded. The remaining articles will be imported into the Rayyan software [37] (http://rayyan.qcri.org/) for title/abstract screening.

Assessing search comprehensiveness

Comprehensiveness of the search strategy will be assessed by: (1) determining whether all articles in a predetermined “test set” of approximately 15 relevant papers per stressor-response relationship (i.e., TN—chl-a, TP—chl-a; Table 3) are found with the search strategy; and (2) examining bibliographies of these “test set” papers, and papers that cite the “test set” papers, to determine whether relevant citations are captured in our search. If articles are missed, the search strategy will be evaluated and revised accordingly. The “test set” was created by searching the authors’ personal libraries for highly relevant articles until at least 15 papers per stressor-response relationship were obtained, and includes both journal articles and reports (Table 3).

Table 3 ”Test set” of sources used to test search strategy comprehensiveness and trial study quality and data extraction approaches

Article screening and study inclusion criteria

Screening process

Before screening all articles, consistency in applying inclusion criteria will be evaluated on a subset of articles using the kappa statistic (ranging from 0 to 1, with 1 indicating complete agreement [38]). Two to four reviewers will assess the same randomly-selected set of 10% of studies to be screened (minimum 50, maximum 200) at the title/abstract level. Kappa will be calculated, using modifications for more than two raters if necessary [39]. If kappa is low (<0.50) [40], reviewers will examine inconsistencies and clarify inclusion criteria; if kappa is moderate or high (>0.50) [40], one to four reviewers will proceed to screen all retrieved articles at the title/abstract level and, subsequently, all relevant articles at the full text level. Consistency during full text screening will be addressed by frequently convening reviewers to discuss the strategy and discuss and resolve any questions.

The inclusion criteria (see below) will be applied to systematically exclude articles that are topically irrelevant or do not contain relevant data, based on review of the title and abstract. Any article for which there is uncertainty about whether to include or exclude it based on title/abstract screening will be included for full text screening. Following evaluation of all titles and abstracts, full text screening will occur simultaneously with data extraction and quality assessment: as full text articles are examined for data extraction and quality assessment, any article judged to be irrelevant will be excluded and added to the appendix of excluded references, along with the justification based on inclusion criteria. Articles obtained through website searches will be screened during those searches by examining title/abstract/summary and full text when necessary, and information on the number of returns and relevant articles will be recorded separately.

Inclusion criteria

The following inclusion criteria will be used to determine relevant studies (see also Table 4):

Relevant population: Lotic freshwaters anywhere in the world or mesocosms made to mimic these systems.

Relevant exposure: Exposure to total nitrogen (TN) or total phosphorus (TP) measured as concentration (e.g. mg/L).

Relevant comparator: Comparison to sites or treatments with lower or higher levels of TN or TP across a gradient, or comparison to a control group (no or background TN or TP) or to lower or higher concentrations of TN or TP in experimental studies.

Relevant outcome: Concentration of benthic or sestonic chl-a, measured as mass per area or volume (e.g. µg/cm2, mg/m2, µg/L).

Relevant study type(s): Experimental studies in mesocosms or field sites, or field-based observational studies.

Relevant publication type(s): Study must contain original data and sufficient detail on methodology to assess study quality. Book chapters and conference abstracts will be excluded unless specifically suggested by outside experts.

Language: No language restrictions will be applied.

Date: No date restrictions will be applied.

Table 4 Detailed inclusion and exclusion criteria used to determine study inclusion in the systematic review

Multiple studies using same datasets

For cases in which multiple studies use the same or similar datasets (e.g. a dissertation and one or more published articles from that dissertation), the following criteria (listed in order of priority) will be used to select a single source: the study with the more complete dataset, the version published as a peer-reviewed journal article, or the most recent version. The excluded duplicative study or studies may be used to fill in gaps in methodology or contextual information. These decisions will be documented in an appendix.

Unobtainable articles

Attempts to obtain full text of all articles not excluded during the screening process will be made using available library resources or by contacting authors. Articles for which full text is not obtainable will be listed in an appendix. Abstracts of non-English language articles will be translated using Google Translate to assess relevance. Every effort will be made to obtain translations of any highly relevant, non-English language papers; however, this will depend on available resources. All non-English articles considered relevant based on title/abstract screening but not fully translated will be listed in an appendix.

Potential effect modifiers and reasons for heterogeneity

One motivation for this review is the apparent variability in nutrient stressor-response relationships in lotic ecosystems. Factors that potentially modify stressor-response relationships will be extracted from relevant studies when these factors were examined in the original study. Based on evaluation of highly relevant studies and consultation with stakeholders and experts, the modifiers considered include:

  • ecoregion;

  • latitude;

  • altitude;

  • land cover/land use;

  • stream size;

  • watershed area;

  • geographic location;

  • date/season/duration of sampling;

  • stream gradient;

  • flood stage/flow regime/flow permanence;

  • nutrient concentration range (lowest and highest TN and/or TP);

  • existing background nutrient concentrations;

  • temperature;

  • canopy cover/light availability;

  • pH;

  • alkalinity;

  • sediment/turbidity;

  • conductivity;

  • dominant algal species/groups; and

  • grazing (primary consumer) pressure.

Other relevant modifying factors will be recorded as they are encountered during screening and data extraction. Existing geographic information system (GIS) layers and tools that summarize important landscape and environmental factors (e.g. StreamCat [41], Google Earth) may be used to obtain relevant modifying factors (e.g. latitude, flow regime, land use/land cover, watershed area) for studies that do not report this information. If any outside data are associated with studies, care will be taken so as not to combine data from disparate sources (e.g. if the National Land Cover Dataset is used to estimate land cover, it will be used for all studies). Methodological modifiers, such as extraction method, measurement method, or sampling location (benthic, sestonic) for chl-a [31, 42], or fraction of water sample used for nutrient measurement (filtered, unfiltered), will also be recorded.

Study quality assessment

Studies from articles included after title/abstract screening that are still categorized as relevant upon full text screening will be assessed for quality and risk of bias. Aspects of quality and risk of bias from published critical appraisal frameworks in environmental science and medicine [43,44,45] were examined to develop a quality assessment approach specific to this review, similar to [46] (Tables 5, 6 and 7). For each study, aspects of study quality contributing to a “low” or “high” risk of bias will be rated, based on specific criteria for three different study designs: (1) observational field studies, which typically sample chl-a along a gradient of nutrient concentrations; (2) mesocosm experiments; and (3) field experiments (e.g. Before-After-Control-Impact designs [47]) (Tables 5, 6 and 7). An overall risk of bias estimate for each study will be generated by dividing the number of “high” scores by the number of questions. Results of the systematic review will be discussed and analyzed in the context of this study quality assessment.

Table 5 Study quality assessment framework for observational, field studies
Table 6 Study quality assessment framework for experimental mesocosm studies

All relevant studies will undergo quality assessment. To assess accuracy in quality assessment, a reviewer not involved in the initial quality assessment will independently assess quality for 25% of the studies evaluated by other reviewers, and reviewers will discuss and resolve any differences.

Table 7 Study quality assessment framework for experimental field studies

Data extraction

Data will be extracted from studies found in articles that are considered relevant after full text screening. The majority of studies of nutrient stressor-response relationships examine biotic responses across field sites with varying nutrient concentrations, although some compare “reference” to “impacted” sites or experimentally manipulate nutrient concentrations. Most studies will thus use correlation or regression to assess relationships between nutrients and chl-a. The shape and direction (e.g. linear—increasing, linear—decreasing, logarithmic, exponential, sigmoidal) and strength of these relationships will form the basis for meta-analysis and narrative summary of the review results. In most instances, Pearson’s correlation coefficient or Spearman's rho (r) between TN or TP and chl-a will be used as the effect size. Other effect size measures (e.g. standardized slope coefficients: change in standard deviations of y associated with a change of one standard deviation of x [47,48,50]) will also be extracted and explored; however, the correlation coefficient was the most widely used and easily calculable from the example studies examined. Sample sizes will also be extracted for each effect size to estimate effect size variances using meta-analysis models (see “Data synthesis and presentation”). For experimental studies that manipulate nutrient concentrations and report differences in chl-a concentration between control and treatment groups, we will extract or calculate an appropriate “standardized mean difference” effect statistic such as Cohen’s d [50, 51].

Authors will be contacted if a study indicates that an effect size was calculated, but not reported (e.g. for negative associations). For studies not reporting effect sizes, raw data will be extracted from figures using image analysis software when possible and effect sizes will be calculated. If no effect size is reported and raw data are not presented (e.g. only site means are provided in a table), these studies will not be use in meta-analysis. The initial “test set” of relevant literature will be used to refine the data extraction fields as needed.

One to six reviewers will participate in data extraction from all relevant studies. To assess accuracy in data extraction, a reviewer not involved in initial data extraction will independently extract data for 25% of studies, and any differences will be discussed and resolved. Extracted data from relevant studies will be provided as an appendix or in a publicly-available USEPA data repository.

Data synthesis and presentation

Meta-analysis and narrative and tabular summaries of stressor-response relationships will be used to synthesize data from the systematic review. For all studies, the direction or shape of the response will be noted (see “Data extraction”) and summarized across studies and subgroups of interest (e.g. subsets based on ecoregion, stream size, chl-a or nutrient measurement method). For studies with sufficient information, effect sizes (see “Data extraction”) and variance within and among studies will be examined across studies using a random effects model. Random effects models assume that the true effect size differs among studies and treat this heterogeneity as random, and are appropriate for making unconditional inferences about a set of studies of which the obtained studies are assumed to be a random sample [51,52,54]. Pearson’s correlation coefficient or Spearman's rho (r) between TN or TP and chl-a will be used as the effect size in most instances. A Fisher’s z-transformation of r will likely be necessary to improve normality and variance [55, 56], although other effect size measures (e.g. standardized slope coefficients) will be explored. Equations in Nakagawa and Cuthill [50], Lajeunesse [51] and meta-analysis packages in the R environment [57] (e.g. ‘MAc’ [58] and citations therein) will be used to convert other effect sizes (e.g. multiple regression coefficients) to Pearson’s r. For analysis and presentation, results for TN and TP will be analyzed separately.

Effects of modifying factors (e.g. canopy cover) or sub-groupings (e.g. ecoregion) will be assessed using mixed-effects models or meta-regression. Effect size variation and mean effect size will be visualized using forest plots. Analyses will be conducted using several R packages, including ‘metafor’ [53] and ‘MAc’ [58]. Quality assessment scores will be used as factors in sensitivity analysis to explore the impact of study quality on overall effect sizes and response shapes [40]. Publication bias will be assessed using funnel plots comparing study effect sizes with standard error [59, 60].