Introduction

Human wastewater biosolids, hereafter referred to as biosolids, are nutrient-rich organic materials resulting from the treatment of human digestive residuals often in wastewater treatment facilities. Biosolids are produced in significant quantities on a global scale (10 × 107 Mg year−1) [1]. They are often applied to an extensive land mass including agricultural fields, forests, mine lands, and urban areas [2,3,4,5,6]. Applications of biosolids can increase soil organic C (SOC), improve soil physical and chemical characteristics, and reduce fertilizer needs and water usage [7,8,9]. Many of the soils on which biosolids are applied are low in organic matter (OM) and thereby SOC. Applications of biosolids are generally expected to increase OM content and thereby reduce atmospheric greenhouse gas emissions. However, SOC stock changes after land application vary from study to study. Several long-term studies indicate C sequestration potential of land applied biosolids [10, 11]. Antonelli and Fraser [12] found long-term C storage efficiency was higher for lower biosolids application rates. In contrast, Badzmierowski and Evanylo [13] found that SOC stocks decreased during years with no amendment application. This indicated that the applied carbon from the organic amendments was still undergoing decomposition and not being “stored." Variability of SOC stock changes from land-applied biosolids is most likely a result of various factors such as climatic conditions, soil properties, land use management, biosolids characteristics, application strategies (i.e., rate, surface-applied or incorporation, one-time or repeated applications), and timing between last application and sample measurement.

At the time of this writing, there has been no peer-reviewed, quantitative synthesis of SOC stock changes after biosolids land-application. Due to the incomplete scientific basis and uncertainty regarding the permanence of SOC with time and management strategy following biosolids application, organizations such as the Intergovernmental Panel on Climate Change and the American Carbon Registry have been unable to include biosolids as a specific carbon dioxide removal mechanism. Biosolids stakeholders (e.g., wastewater treatment plants and biosolids users) seek a quantitative synthesis on the potential of C sequestration from land-applied biosolids to gain access to C credit markets.

We plan to conduct a systematic review and meta-analysis in response to biosolids stakeholders needs. The systematic review and meta-analysis will identify potential C sequestration rates of land-applied biosolids and assess explanatory factors that may affect rates. Research questions and systematic review protocol were developed by the authors (Badzmierowski, Evanylo, Daniels). Funders only had input on desired goals of the systematic review- to determine the carbon sequestration potential of land applied biosolids. Funders will have no other input regarding systematic review design, search strategy, analysis, or interpretation of results.

Objective of the review

Our objective is to perform a systematic review of peer-reviewed and grey literature to develop sequestration rates of land-applied biosolids and associated explanatory factors. Our study will address the primary review question (Population, Intervention/Exposure, Comparator, Outcome, and Study design—“PICOS” elements are defined in Table 2) and secondary review questions:

What is the impact of human wastewater biosolids (sewage sludge) application on long-term soil carbon sequestration rates?

  • How do geographical location and climate (i.e., moisture and temperature) affect long-term C sequestration rates?

  • Do biosolids processing methods (e.g., aerobic, anaerobic, lime-stabilized, etc.) and final characteristics (e.g., total solids, iron and aluminum content, etc.) affect long-term C sequestration rates?

  • Does application method of land-applied biosolids (i.e., surface, incorporated, or injected) affect long-term C sequestration rates?

  • How do land use (e.g., cropland, forests, reclamation, etc.) management (e.g., crop rotation and cover crop), and vegetation affect long-term C sequestration rates from land-applied biosolids?

  • What is the relationship between SOC changes from land-applied biosolids and soil properties (e.g., depth, soil textural class, clay content, iron content, aluminum content, carbon to nitrogen (CN) ratio, etc.)?

Methods

Our study will follow the methodologies established by the Collaboration for Environmental Evidence (CEE) Guidelines and Standards for Evidence Synthesis in Environmental Management, version 5.0 and use the “RepOrting standards for Systematic Evidence Syntheses” (ROSES) to document our systematic review [14, 15]. See Additional file 1 for our completed Roses form for systematic review protocols.

Searching for articles

Our search strategy has employed the assistance of three Virginia Tech librarians (Cozette Comer, Evidence Synthesis Librarian; Inga Haugen, Life Science, Agriculture, and Scholarly Communication Librarian; and Rachel Miles, Research Impact Librarian) to optimize search terms, search strategies, and databases to be used.

Search languages

The search will be conducted using English search terms and use Boolean operators and wildcards to improve search results relevancy. For studies that are not published in English we will attempt to get a translation for these results. We will exclude a result if we cannot obtain a translation. This is a shortcoming of our review, but we do not have the resources to work in other languages.

Search strings

Our search string is made of three components population, intervention/exposure, and outcome terms. See the list of components and terms within each component listed below.

Population term: soil.

Intervention terms: biosolid* OR sewage OR sludge OR “sewage sludge” OR biosludge OR milorganite OR “human solid waste” OR “waste amend*”.

Outcome terms: carbon OR “soil OC” OR SOC OR “soil organic C” OR “soil organic carbon” OR “organic matter” OR “soil OM” OR SOM OR “soil organic matter”.

The three components will be linked using the Boolean operator “AND.” The Boolean operator “OR” will be used to separate terms/phrases within a given component. The asterisk (*) represents a ‘wildcard’ meaning it represents any group of characters including no character. Quotation marks are used to search exact phrases (e.g., “sewage sludge” will search the exact phrase sewage sludge and the hyphenated sewage-sludge).

Estimating the comprehensiveness of the search

To estimate the comprehensiveness of the search, a brief list of 12 benchmark studies that fit the inclusion criteria was established based on previous related reviews and knowledge of the review team (see Additional file 2). The final search strings were tested for each of the three databases that have been selected to use in our review (see below for publication databases). All databases had a 100% comprehensiveness using the final search string. See Additional file 2 for search string results, benchmark list used for testing search comprehensiveness, and previously published related reviews on our topic.

Publication databases to be searched

We will be using our Virginia Polytechnic Institute and State University subscription to search the following databases, CAB Abstracts (1910s-present), ProQuest Dissertations & Theses Global (1637-present, full-text dissertations 1997-present), Scopus (1800s-present), and Web of Science Core Collection (1900–present). Our subscription for the Core Collection includes: Science Citation Index Expanded (1900–present), Social Sciences Citation Index (1900–present), Arts & Humanities Citation Index (1975–present), Conference Proceedings Citation Index- Science (1990–present), Conference Proceedings Citation Index-Social Sciences & Humanities (1990–present), Book Citation Index-Science (2005–present), Book Citation Index-Social Sciences & Humanities (2005–present), Emerging Sources Citation Index (2005–present), Current Chemical Reactions (1985–present), and Index Chemicus (1993–present).

Internet searches to be conducted

We will use the Publish or Perish 7 software tool [16] to query the top 1000 “relevant” search results for both Google Scholar and Microsoft Academic. The use of these search engines will be used to target “grey” literature including theses and dissertations, institutional reports, and conference proceedings. We will use the “keywords” search field in the Publish or Perish 7 software. See Table 1 for search specifications.

Table 1 Search specifications and string for each database or search engine

Specialist searches–Searches for grey literature

We will search specialist websites with two simplified search strings using English terms:

  • (carbon AND biosolids)

  • (carbon AND sewage sludge)

Specialist websites will include:

Supplementary searches

Backward and forward snowballing (i.e., backward = identifying articles from reference lists and forward = identifying articles that have cited the articles) will be done on all accepted articles and relevant reviews (see Additional file 2). Our “grey literature” search will be expanded by reaching out to our known biosolids research contacts and stakeholders requesting relevant datasets via email and to alert the community of our systematic review. The Review Team will attempt to contact authors of any articles that are unobtainable through our library subscription or interlibrary loans to gain access to their full articles. There will be no search updates for this review.

Screening process

Results from all searches will be imported to EndNote [17] and exported as Extensible Markup Language (.xml) to the online systematic review management tool, Covidence (access via Virginia Tech license) [18]. All results will be added to Covidence. Covidence will be used to identify and remove duplicates from search results.

The results will be screened in two stages: (1) title and abstract, and (2) full-text. We will select a random 10% subset of results at the title and abstract level and two reviewers (Badzmierowski and Haering) will screen the articles independently based on the eligibility criteria defined in Table 2. Cohen’s kappa will be used to determine the inter-rater reliability as a consistency measure between the two reviewers. If the Kappa score is 0.61 or higher, the consistency will be considered acceptable. A score below 0.61 will require a review of eligibility criteria and the screening process among the systematic review team. The screening process will be repeated until an acceptable Kappa score is achieved. After a 10% subset with acceptable agreement has been obtained, the remaining articles will be reviewed by one reviewer at the title and abstract level by the lead principal investigator (Badzmierowski). Reviewing articles by one reviewer at the title and abstract stage is not best practice in a systematic review and we will highlight this in our final synthesis.

Table 2 Systematic review eligibility criteria using the PICOS framework

All results at the full-text stage will be screened by two reviewers. No reviewer will screen their own studies for inclusion or exclusion at this stage. Disagreements for inclusion will first go to discussion between the two reviewers to reach a consensus. If a consensus is not reached, then a third person will be used. We will provide a list of articles excluded at the full-text level and include basic meta data for rationale for exclusion.

Eligibility criteria

We have adopted the “PICOS” framework to determine eligibility criteria. The inclusion and exclusion criteria are detailed below in Table 2 for each component of the PICOS framework.

Study validity assessment

Critical appraisal will be performed for all studies that pass the full-text screening process following the elements outlined in the CEE guidelines [14]. The critical appraisal will be done on a study-by-study basis. This means that if one article reported more than one experiment (e.g., different experimental setup/multiple sites) these will be regarded as multiple studies and receive independent validity rating. Where multiple articles have been published for a given experimental system, the data across the collection of articles will be aggregated and appraised as a whole. In cases with multiple articles, we will use the latest appropriate values. If the latest reported value across articles of the same system is not used, we will provide a written rationale for excluding the latest reported value. We will email authors of studies that are missing data and provide this meta data of authors contacted and their responses (or non-responses).

The appraisal (see Table 3) includes standard criteria listed in the CEE guidelines such as statistical design, similar starting point for control and treatment group, randomization of sampling, presence of confounding variables, and time between intervention and sampling. Our appraisal is also formulated to our specific review question. We establish three additional criteria, soil organic carbon/matter measurement method, soil sampling depth, and soil bulk density. Ideally, studies use a high quality method such as dry combustion or the Walkley–Black procedure for soil organic carbon/matter determination as it is viewed as the best methods to determine these outcomes and treat for inorganic carbonates, if necessary [21]. Studies examining soil organic carbon using different land management strategies also need adequate soil sampling protocol as sampling to different depths can result in different interpretations [22]. Therefore, it is necessary that a study should sample to at least the lowest depth of treatment incorporation. Additionally, changes in soil organic carbon results in changes in soil bulk density [23]. Soil bulk density is necessary to estimate soil organic carbon stocks and improved by comparing changes on an equivalent soil mass basis [23].

Table 3 Critical appraisal criteria used to assess included studies

Studies will be excluded from quantitative synthesis and given specific written reasoning if any of the following factors apply:

  • No true replication in experimental design or sampling protocol (Pseudoreplication will not be considered as a treatment replication).

  • Intervention and comparator sites with substantial differences prior to intervention.

  • Unaccounted for severe confounding factors (e.g., irrigation at intervention sites but not at the comparator sites).

  • Insufficient methodological description to determine how the study was conducted (e.g., unable to determine/calculate biosolids carbon loading rate) or if data cannot be interpreted or is missing (e.g., study is missing comparator soil organic carbon data).

Studies that pass study validity assessment will be classified as “low” or “high” susceptibility to bias based on variables assessed (see Table 3). “Unclear” will be designated to variables with insufficient details and “Not applicable” will be designated to variables that were not measured in each study. All included studies will be appraised by two reviewers independently. Disagreements in appraisal will first go to discussion between the two reviewers to reach a consensus. If a consensus is not reached, then a third person will be used. We will perform a sensitivity analysis to determine the potential differences between studies of higher and lower validity. Reviewers will not assess studies for validity for which they are an author.

Data coding and extraction strategy

Data from included studies will be extracted using a predefined form (Additional file 3). The extracted data will be made available as additional files in the final review. The data coding and extraction form was developed to be fully encompassing including, study meta-data, experimental design and location, initial conditions, amendment characteristics, and outcomes post-intervention. Data that is only shown in graphical format will be estimated by using the data extracting software, DataThief [25]. We will contact authors if data is missing or not clear and provide documentation of our contact. Data will be extracted by one person and reviewed by a second person for accuracy.

We will extract the mean values of the control (no amendment or suitable fertilizer) and treatment groups (suitable biosolids interventions). These means will be standardized (e.g., soil organic carbon stocks standardized to Mg organic carbon ha−1). Measures of variability (i.e., standard deviation, variance, standard error, or confidence intervals) and sample sizes will also be recorded.

Potential effect modifiers/reasons for heterogeneity

We will look at the following potential effect modifiers and method of testing:

  • Sampling methodology (sub-group analysis)

  • Time since last intervention (meta-regression)

  • Frequency of management intervention (sub-group analysis/meta-regression)

  • Geographical location/climate (i.e., moisture and temperature) (sub-group analysis)

  • Biosolids processing methods and iron + aluminum content (sub-group analysis)

  • Application method (i.e., surface, incorporated, or injected) (sub-group analysis)

  • Differing land use (e.g., cropland, forests, reclamation, etc.) (sub-group analysis)

  • Disturbance vs. no disturbance post-intervention (i.e., tilling) (sub-group analysis)

  • Soil properties (e.g., soil textural class and clay content) (sub-group analysis)

The potential effect modifier list was compiled by the review team after consultation with stakeholders. This list was compiled to contain known potential effects on carbon dynamics in terrestrial ecosystems. Additional effect modifiers and reasons for heterogeneity may be identified from the studies as the review proceeds.

Data synthesis and presentation

We will conduct a narrative and quantitative synthesis of the results extracted from included studies. The narrative synthesis will detail the validity of the results and findings. Tables and figures will be prepared to summarize results. A meta-analysis using random effects models will be conducted if sufficient data of high enough quality are extracted. Sensitivity analyses will be done by including/excluding studies of high risk of bias and when applicable, selected effect modifiers. Meta-regressions and sub-group analysis of potential effect modifiers will be performed where sufficient studies report common heterogeneity sources. We will also use the Egger test to produce funnel plots of the effect size plotted against the standard error of the effect size to assess publication bias [26, 27]. We expect that this review will help identify major research and knowledge gaps related to carbon sequestration potential of land-applied biosolids.