Background

About 11% of all global greenhouse gas (GHG) emissions come from agriculture [1]. Therefore, the agricultural sector has a significant role in reaching international and national emission reduction targets, such as the agreement from COP26 in Glasgow 2021 and the Swedish environmental goal "limited climate impact". However, agriculture has a wide set of alternative land uses and management methods, and there is limited knowledge of the net climate impact of the alternatives or combinations of different options. Practical advice for individual farmers is often general and is only sometimes considering local conditions (for example, air permeability, organic matter content and soil type).

Although knowledge of the impact of different land uses and management methods is limited, there is a relatively good understanding of the basic mechanisms for the production and turnover of GHG in soils. Research has shown how the production and consumption of carbon dioxide (CO2), methane (CH4) and nitrous oxide (N2O) in the soil are controlled by microbial processes [2, 3]. Under aerobic conditions, organic material (like peat) breaks down to CO2 mediated by microbial respiration processes using oxygen as the terminal electron acceptor. Therefore, organic soils that are drained and exposed to oxygen will be a significant source of CO2 [4]. Under waterlogged anoxic conditions, microorganisms break down organic material by anaerobic processes, including methanogenesis and denitrification, which produces CH4 and various proportions of N2O, respectively [5]. In ecosystems with oxic topsoil, CH4 produced in deeper anoxic subsoil may be oxidised to CO2 by aerobic methanotrophic bacteria during transport towards the atmosphere [6]. The risk of CH4 emissions from agricultural land is therefore associated with high levels of organic matter and water in the soil since both these factors will restrict oxygen availability [7]. Nitrous oxide can be produced under aerobic soil conditions as a side reaction in the oxic transformation of ammonia (NH3) to nitrate (NO3), but the main risk of N2O emission relates to anaerobic denitrification of NO3 to gaseous N compounds. In summary, microbial processes control GHG emissions with oxygen availability as a main driver, which is itself restricted by soil water content. However, further drivers of GHG emissions relate to soil physical properties, organic matter content, and access to nutrients. For example, it is a combination of nitrogen fertilisation, plant nitrogen uptake and the conditions for anaerobic environments to form, which determine to what extent the soil becomes a source of nitrous oxide. These factors will be influenced by the land use and management methods that are applied at farm level [8, 9]).

Organic soils spread over almost 500 Mha worldwide, of which around 400 Mha is situated in boreal and temperate regions [10]. The area of peatlands drained for management has been estimated at 43–51 Mha globally [11, 12]. In Europe, about 10% of the former peatland area has been lost through drainage for agriculture, forestry, and peat extraction and about 50% of the current peatland area in the EU is classified as degraded [13]. For example, to increase food security in the beginning of the nineteenth century, the Swedish government started projects to drain peatlands to be able to cultivate them [14]. A side effect was that the drained soils started to release CO2 into the atmosphere. Today, 7%, 9%, 10% and 14% of Danish, Swedish, Norwegian, and Finnish agricultural soils respectively are cultivated peatlands [15,16,17,18]. Even though the organic soils constitute a relatively small proportion of the arable land, they are considered to be a major source of both CO2 and N2O [19, 20].

Rewetting drained organic soils is a recognised mitigation tool to reduce GHG emissions [21] and is also supported by the European Union (see Proposal for a Nature Restoration Law, European Commission 2022 [22]). However, rewetting arable organic soils would make them largely unsuitable for food production and thus may induce some GHG leakage by shifting the cultivation to other areas. As an alternative, it has been suggested that organic agricultural soils could be used to produce ley or turned into perennial green fallow (Per Bodin, Swedish Board of Agriculture, personal communication, 2022). Some Nordic countries see potential in these interventions, but there is some uncertainty and lacking consensus regarding their effectiveness. Other measures tested in Sweden that allow the cultivation of peat soil at the same time as GHG emissions potentially are reduced, include different grasses [23], the addition of foundry sand [24], the addition of lime [25], different cultivation systems [26], different cultivation intensities [27], raised groundwater table [28] and abandonment [29]. A stakeholder group (representatives from farmers, advisory board, regional government, farmers union and the Swedish Board of Agriculture) gave input and ideas throughout the projects. So far, the only measure that has shown reduced emissions is the sand treatment.

Since there is high political pressure to reduce the GHG emissions from peat soil, and the IPCC emission factors [30] encourage Nordic countries to use ley as a solution, there is a need to strengthen the scientific evidence base on these mitigation measures. Scientific publications and compilations of studies looking into this field are often comparing the treatments without taking into account that the fields for annual and grass crops may originally have been selected, e.g., based on the peat quality. Thus, the compared data does not originate from a homogeneous set of sites or the same years. Other variables than the treatment may have influenced the outcome, such as climate, weather conditions, time and difference in soil properties, or an annual grass crop may have been used in the comparison instead of long-term ley [19, 31,32,33]. The Swedish Board of Agriculture has expressed a need for a systematic review of existing research results to find out what evidence there is to justify the suggested interventions. The stakeholders mentioned above were invited to a meeting at an early stage in the planning of the forthcoming systematic review, where they were asked to share their thoughts and ideas about the systematic review.

Objective of the review

The question attempted to be answered in the forthcoming systematic review is: ”What is the effect of ley or perennial green fallow on the flux of greenhouse gases from agricultural organic soils?” The question emerged in a Nordic context, but we will use data from other parts of the world if they meet the eligibility criteria, and we believe that the results and conclusions of the review should be valid also for other countries with similar agricultural practices, in any boreo-temperate climate zone.

The PICO (Population, Intervention, Comparator and Outcome) elements of this question are:

  • Population: Organic soils on agricultural land in temperate and boreal climate zones. Such organic soils are often drained peatland, but other origins may occur.

  • Intervention: Using land for grazed or ungrazed, permanent or cultivated grassland (ley) or setting land aside from agricultural production (perennial green fallow) without attempt to raise the groundwater level. Rewetted grasslands are thus not included. Growing woody energy crops is not an eligible intervention or comparator. Growing grass-like energy crops is an eligible intervention.

  • Comparator: Using land for various crop rotations involving annual crops. Land uses may be categorised regarding tillage, fertilisation, and other management practices.

  • Outcome: Flux of CO2, N2O, or CH4.

The PICO elements are defined in more detail in the section on study eligibility criteria. It should be noted that issues related to the concept of GHG leakage are outside the scope of this review, although such considerations may influence the overall assessment of land use changes on organic soils.

Methods

Searching for articles

No time or document type restrictions will be applied, and publications for which full texts are unavailable will be recorded and reported.

Bibliographic databases

The searches in bibliographic databases will be conducted using English search terms, including articles in other languages with English titles and abstracts. The search string comprises three substrings related to the population, intervention, and outcome, respectively (see Table 1). The substrings will be combined with the Boolean operator AND. The format of the search strings will be adapted to each database (see Additional file 1). Searches will be made in the bibliographic databases shown in Table 2. There will be no restrictions regarding publication dates or types.

Table 1 Substrings, combined with the Boolean operator AND will be used for searches in bibliographic databases
Table 2 Bibliographic databases that will be used in the literature searches

Grey literature

Searches for grey literature will be performed using the search engine Google Scholar through Publish or Perish [34]. In these searches, simplified search strings with search terms in English, German, French, Swedish, Finnish, and Danish will be used (see Additional file 1). The first 300 search results for each search string using search terms in English will be screened for relevance, whereas the first 200 search results using the other languages will be screened. Searches for grey literature will also be performed using the archives and databases shown in Table 3.

Table 3 Archives and databases to be searched for grey literature

Supplementary searches

For a number of key search results, we will explore related papers using ResearchRabbit [35]. Further, the protocol development team will contact a list of relevant researchers and other stakeholders, asking for additional literature of interest. For this purpose, a letter template has been written. Also, the bibliography of relevant review articles and meta-analyses will be screened for potentially relevant articles, i.e., “snowballing”. We will also search specialist websites, such as environmental protection agencies or boards of agriculture in countries relevant for the review as defined in the PICO. The websites will be identified in collaboration with stakeholders during the review process and reported in the systematic review.

Estimating the comprehensiveness of the search

The comprehensiveness of the search was tested through a list of benchmark articles that the protocol development team identified as relevant for answering the systematic review question (see Additional file 2). All but one of the articles indexed in at least one of the searched bibliographic databases were captured by the search strings used. The one missing article [36], in Danish, has a short English abstract with little information. Although relevant to the review question, it does not conform with our inclusion criteria on the outcome. Therefore, we have not judged it meaningful to adjust the search string any further to capture this article. The searches using Google Scholar with search strings in English capture all benchmark publications classified as grey literature except one thesis (Drösler [37]). However, when searching for this publication using the title as the search string, we find at least one web page with this publication and all the words in our Google Scholar search strings. It should thus have been picked up by the searches, but for some reason it was not ranked among the top 300 search results. We judge it unfeasible to adjust the search strategy any further, but it is still possible that this publication will be captured by the searches using search terms in German.

Article screening and study eligibility criteria

Screening process

All studies identified by the above search criteria will be screened to determine inclusion based on the eligibility criteria below. The screening will first be carried out on the title and abstract level and subsequently on the full-text level, deciding for inclusion in the next screening stage in case of uncertainty. The repeatability of the screening process was tested at the abstract stage with 600 publications, which were divided into two groups and screened by three members of the protocol development team in each group. The test articles were retrieved in preliminary searches on Web of Science. After the test screening, the eligibility criteria were discussed among all members of the review team. Having clarified the eligibility criteria, we could resolve the disagreements. The final screening will be divided between two reviewers at the title and abstract level. After double-screening another subset of 300 articles, the consistency between the two reviewers will be reassessed, and if necessary, the eligibility criteria will be clarified again. This procedure will be repeated until we are convinced that the eligibility criteria are interpreted and applied consistently among the two reviewers. At least 10% of the records will be double screened. After that, the screening will continue in single mode. When assessing the consistency between the two reviewers, Kappa tests will be used. However, we will not define any Kappa value a priori that must be exceeded. The Kappa values will rather be seen as a support to our assessments and will be reported in the systematic review. At the full-text level, all records will be screened by at least two reviewers. An additional file will provide a list of articles excluded at the full-text stage with reasons for exclusion.

Eligibility criteria

The studies will be screened with regard to the population, intervention, comparator, outcome, and study design.

Eligible population:

To qualify for this review, the article must include organic soils on agricultural land in temperate and boreal climate zones. As definitions of organic soils vary [38], there will be two categories: “true” peat soils defined as Histosols [39] or having an organic carbon (OC) content > 12% and peat depth > 30 cm, and shallow and/or lower organic carbon peat soils with > 6% OC and > 10 cm depth. The latter may not qualify as peat soils according to many definitions, but with a high bulk density such organic soils nevertheless have the potential for high emissions [38]. The omission of further initial restrictions should prevent the exclusion of relevant data as long as the agricultural system is relevant to the review question. The climate zones considered in this study are Cfb (warm temperate, fully humid, warm summers) and Dfa, Dfb, and Dfc (snow climate, fully humid) according to the Köppen climate classification [40]. As the climate zone is not reported in all studies, and as the classification may have changed over time, the eligibility of all studies will be based on the present classification according to the World Map of the Köppen-Geiger Climate Classification published at https://koeppen-geiger.vu-wien.ac.at/present.htm.

Eligible intervention:

To be included, articles must include grazed or ungrazed, permanent or cultivated grassland (ley) or land set aside from agricultural production (perennial green fallow). Ley must be continuous, i.e., without tillage for at least three years. The minimum of three years is somewhat arbitrary. Still, it is reasonable to assume that it will take some time after conversion to grassland before a measurable effect can be detected. Also, a minimum of three years of continuous ley is, e.g., required by the Swedish Board of Agriculture to receive environmental payments [41]. Rewetting peatland is not an eligible intervention. Growing woody energy crops is not an eligible intervention. Growing grass-like energy crops is an eligible intervention, as such may have similar characteristics as other grassland species.

Eligible comparator:

Studies that will be included use the land for various crop rotations involving annual crops. We will record the specific crops or crop rotations as potential effect modifiers. Every study needs a ley comparison within the same study, where outcomes (i.e., GHG fluxes) were measured with the same method and in similar peat soil conditions, climate, location, etc., to make them as comparable as possible.

Eligible outcome:

For a study to be included, it must report either the flux of CO2, N2O, CH4, or several of those. Gas fluxes must have been measured directly using, for example, dark or transparent chambers, eddy covariance measurements, or concentration gradient methods. Estimations of gas fluxes based on indirect measures, such as soil subsidence or changes in soil organic carbon stocks, are not eligible. The flux of CO2 may be reported as net ecosystem exchange (NEE), carbon balance, or soil respiration. As the meaning of these outcome measures differs, we will note which one of them was reported for each study.

Eligible study designs and other study characteristics:

We expect that most studies will have a Control-Impact (CI) study design. Still, we will not, by default, exclude any other study design that involves an eligible control, e.g., a Before-After (BA) or a Before-After Control-Impact (BACI) study design. We will not impose initial limits on study characteristics like study duration, number of replicates or sampling frequency as we are not expecting many suitable studies. Instead of putting numbers as restrictions, it must be clear that the article describes a system that can answer the review question. Suitability and data quality will rigorously be rated in the study validity assessment. Mesocosm studies are eligible, but the mesocosms should be dimensioned large enough (larger than approximately 0.5 m2) and contain soil sufficiently undisturbed to mimic a full-scale grassland. Modelled data will not be included, but studies might be tracked back to check for the input (model validation) data.

Study validity assessment

Critical appraisal of relevant studies will include an assessment of internal and external validity.

Internal validity

The assessment of internal validity will be based on the risk of bias. To assess the risk of bias in individual studies, we will use a modified version of the CEE Critical appraisal tool, version 0.3 [42]. We have chosen to modify the existing tool since we have judged that all criteria and questions within each criterion are not applicable to the planned systematic review. In the modified critical appraisal tool, we consider five criteria (sources of bias). These are confounding biases, selection biases, performance biases, detection biases, and outcome assessment biases. For each source of bias, there is a set of questions which should be answered with “yes”, “no”, or “unclear”. Depending on how the questions are answered, the risk of bias is for each source judged to be “low”, “medium”, “high”, or unclear”. Finally, the overall risk of bias is determined based on the risk of bias associated with each source, according to Table 4. The critical appraisal tool is provided as an Excel file (Additional file 3) and is also illustrated in Additional file 4.

Table 4 Criteria for assessing the overall risk of bias for individual studies

External validity

The external validity of the studies, i.e., the degree to which the studies are appropriate or applicable for answering the review question in a particular context, is primarily assessed during study eligibility screening. Since we will compare different cropping systems, it is important that the crops being cultivated in the studies are relevant to the stakeholders, and possible crops to grow are governed mainly by the climate and soil properties. Thus, climate and soil properties will be fundamental when assessing the external validity of the crops being grown. No study will be excluded based on these factors as long as they are judged to comply with the eligibility criteria. Still, we will record them to assess the strength of evidence for different contexts and conditions. Another aspect of external validity that needs to be accounted for when evaluating the strength of evidence is the transferability of study results from small-scale experimental studies (e.g., mesocosms) to actual farming practices. Therefore, we will also record the type and scale of included studies.

Coding for study validity assessment

Critical appraisal and coding for internal study validity will be carried out by four reviewers, and each study will be critically appraised independently by two reviewers. The reviewers will not be allowed to assess the validity of their own work. Disagreements between reviewers will be recorded and reconciled through discussions, seeking to reach a consensus among all reviewers. Metadata needed for the assessment of external validity will be extracted and recorded by two reviewers. To check the consistency between the two reviewers, a subset of studies will be extracted by both reviewers. After the completion of metadata extraction, the two reviewers will check each other’s extractions. If quantitative synthesis is feasible, a sensitivity analysis may be performed, comparing results with and without excluding studies with low validity.

Data coding and extraction strategy

The articles included for data extraction will be split into two batches, and two reviewers will extract data from one batch each. To check consistency between the reviewers and to detect any mistakes, all articles extracted by one reviewer will be double-checked by the other reviewer. In case of disagreements, consensus will be reached through discussions with the broader review team. Data will be recorded as reported in each study. If necessary and feasible, data will be standardised (e.g., conversion of units) at the analysis stage to allow for direct comparison among studies.

Quantitative data and meta-data about the experimental setup and the greenhouse gas emissions will be extracted into a spreadsheet as in Additional file 5, which will be fully available as additional files in the final systematic review. Outcome data will be recorded in separate Excel files for each article. If repeated measurements have been carried out, the data for all reported time points will be recorded. In cases where outcome data were reported in graphic figures, we will use WebPlotDigitizer [43] to extract data. All outcome data used in the meta-analysis will be available in an Additional file.

Potential effect modifiers/reasons for heterogeneity

The meta-data to be extracted from studies includes data regarding key sources of heterogeneity. Such potential variables were agreed on in consultation with the protocol development team and can be found in Additional file 5.

The main reasons for heterogeneity in the presented question for both the intervention and the comparator may be different soil parameters like OC content, moisture, pH, bulk density, degree of decomposition or peat depth, as they mutually influence each other, as well as microbial activity and thus GHG emissions [44]. Further, drainage or groundwater table depth, time since drainage, time since conversion to annual cropland and ley/ perennial fallow, tillage practices, and applied fertilisers and crop residues may affect emissions and will be recorded. Finally, measurement methodologies will be reported to account for differences between studies, although data synthesis will rely on relative differences between intervention and comparator per study.

Data synthesis and presentation

All included studies will be presented in narrative synthesis tables, including the extracted metadata and risk of bias assessments. The quantitative synthesis will be carried out through meta-analysis using a random-effects model. Measurement methods of GHG emissions are diverse and may not be comparable in absolute numbers between studies. As the review question asks for a relative comparison between land uses, the collected data will be analysed by calculating relative differences between intervention and comparator per study. We believe the most suitable effect size for this purpose is the log response ratio (ln R). However, we expect that the included studies will generally have a small number of replicates and that the number of studies in each meta-analysis will be relatively small. Therefore, once the data is extracted, we will test the suitability of ln R using the diagnostic test suggested by Hedges et al. [45] and Lajeunesse [46]. Alternatively, standardised mean difference will be used as effect size.

The degree of heterogeneity between study results will be assessed using the I2 statistic. Possible reasons for heterogeneity will be explored through subgroup analyses where, for example, “true” peat soils and lower-carbon organic soils, as defined in the Eligible population paragraph, are compared, as well as mesocosm, incubation experiments, and large lysimeters vs field sites. However, we leave the option open to include the mesocosm experiment in the analysis of field sites in case there will not be enough eligible studies. Provided that sufficient studies are included in the meta-analysis, we will construct funnel plots [47] to assess the risk of publication bias. Meta-analyses will be conducted in R using the Metafor package [48]. Results will be visualised through forest plots and presented in tables.

When summary treatment effects (point estimates and confidence intervals) have been estimated, we will explore the possibility of grading the evidence and expressing our confidence in the estimated treatment effects. When grading the evidence, we will consider the internal and external validity of included studies, the number of included studies, context dependency, and the risk of publication bias.