Background

Soils contain the largest terrestrial carbon (C) pool globally—some 2500 Pg of C to a depth of 2 m—representing approximately twice the atmospheric C stock [13]. Indeed, soils could provide a vital ecosystem service by acting as a C sink, potentially mitigating climate change [46]. Consequently, changes in soil C could affect atmospheric CO2) concentration. Approximately 12 % of soil C is held in cultivated soils [3], which cover around 35 % of the terrestrial land area of the planet [7].

Arable soils are under considerable threat due to unsustainable cultivation practices. It has been estimated that US soils may have lost between 30 and 50 % of the soil organic carbon (SOC) that they contained prior to the establishment of agriculture there [8]. This has been attributed to loss of C from agricultural soils due to the advent of the plough (e.g. [9]) indicating that agricultural soils may have a potential to mitigate climate change through C sequestration [10, 11]. Besides climate change, SOC has a number of potential associated benefits, including: increased soil fertility [12, 13]; improved biological and physical soil characteristics [14] via a reduction in bulk density, improved water-holding capacity and enhanced activity of soil microbes [15] (although this may increase CO2 emission); and increased soil biodiversity [16]. Promoting SOC also often increases soil biodiversity and ecosystem functions that can enhance agricultural productivity by mediating nutrient cycling, soil structure formation, and crop resistance to pests and diseases [17].

Historically tillage has been performed because of a number of benefits associated with the practice. These benefits include: loosening and aeration of topsoil, facilitating planting; mixing of crop residues into the soil; mechanical destruction of weeds; drying wetter soils prior to seeding; allowing frost-induced disturbance of the soil when undertaken prior to winter, facilitating seedbed preparation in the spring.

However, conventional tillage may increase compaction of soil below the depth of tillage (i.e. formation of a tillage pan), the susceptibility to water and wind erosion and the energy costs for the mechanical operations. In recent years, the promotion of less intensive tillage practices (also referred to as conservation tillage or reduced tillage) and no tillage agricultural management has sought to mitigate some of these negative impacts on soil quality and to preserve SOC. These practices aim at maintaining organic matter on the surface or in the upper soil layer thereby increasing SOC concentration especially in the topsoil [18, 19]. A reduction in the need for mechanical tillage practices reduces energy consumption and C emissions through the use of fossil fuels [16], whilst also reducing labour requirements [20], but this benefit may be outweighed to a certain extent by the increased requirements for pesticides. Furthermore, reduction of tillage activities has been associated with a loss of yield (8.5 % lower yield for no tillage relative to conventional tillage [21]). Higher N2O emissions can occur with reduced or no tillage, due to moister and denser soil conditions, which may eventually offset positive effects on SOC balances [22, 23].

Alvarez [24] recognized the need for a broad synthetic approach to assess the impact of agricultural management. As such, a number of authors have reviewed the impact of tillage on soil C (e.g. [10, 19, 2428]). These reviews and meta-analyses have shown both beneficial [10, 19] and null [29, 30] effects on SOC due to no tillage relative to conventional tillage. Furthermore, the efficacy of reduced tillage relative to no tillage is also unclear [24, 26]. Discrepancies may depend on whether total SOC stocks are measured or only presented as the SOC concentration without accounting for equal soil masses. Whilst some advantages of conservation tillage are clear (e.g. reduced erosion and reduced fuel consumption), other impacts (e.g. N2O emission, crop yield, SOC sequestration) can be variable [31]. What seems to be decisive for the direction of SOC changes is the effect of tillage on net primary production (NPP). If NPP increases due to certain tillage practices, SOC stocks are more likely to increase and vice versa [32]. The purpose of this systematic review is to identify the state-of-the-art results regarding the so far inconclusive effects of tillage on SOC in a comprehensive, transparent and objective manner.

Review questions

We hypothesise that reduced or no tillage will mitigate losses of soil carbon as compared to more intensive ploughing [18, 19]. However, reduced tillage is assumed to have effects on SOC in the surface of the soil but not always through deeper soil layers [31]. Hence, we also test effects of reduced tillage from experiments with measurements in the supper 15 cm and deeper in the soil profile.

Identification of the topic

The subject of tillage was originally identified and included in the previously published systematic map [33] following in depth discussion with Swedish stakeholders, including the Swedish Board of Agriculture. Following completion of the systematic map, tillage was identified as a candidate topic for full systematic review based on a number of key criteria: the presence of sufficient reliable evidence, the relevance of the topic for stakeholders, the applicability of the topic for the Swedish environment, the benefit of a systematic approach to a topic that has received some attention via traditional reviews, and the added value of investigating effect modifiers and sources of heterogeneity across studies via a large meta-analysis. The topic was proposed and accepted during a meeting of the authors in May 2015.

Objective of the review

The effects of tillage on SOC have previously been reviewed (e.g. [10, 19, 2428]) but as yet none of these reviews has been systematic in nature. The objective of this review is to systematically review and synthesise existing research pertinent to tillage practices in warm temperate and snow climate zones (see Population below for details) using, as a basis, the evidence identified within a recently completed systematic map [33]. This systematic map aimed to collate evidence relating to the impacts of all agricultural management on soil organic carbon in boreo-temperate regions.

Primary question

What is the effect of tillage intensity on soil organic carbon (SOC)?

Secondary question

How do other agricultural management interventions interact with tillage to affect SOC?

Population :

Arable soils in agricultural regions from the warm temperate climate zone (fully humid and summer dry, i.e., Köppen–Geiger climate classification; Cfa, Cfb, Cfc, Csa, Csb, Csc) and the snow climate zone (fully humid, i.e., Köppen–Geiger climate classification; Dfa, Dfb, Dfc).

Intervention :

Any described tillage practice (including no tillage, reduced tillage, rotational tillage, conventional tillage and subsoiling).

Comparator :

More intensive tillage practice. Also before/after comparisons for single tillage treatments.

Outcome :

SOC (measured as either concentration or stock).

Methods

Searches

Original systematic map search

Searches of 17 academic databases were undertaken as part of the published systematic map between the 16th and 19th September 2013. This search was broader than just tillage, including also interventions relating to amendments, fertilisers and crop rotations (some 750 studies in total). These academic database searches were supplemented by searches for grey literature via web search engines and organisational websites, and by searches of the bibliographies of 127 relevant reviews and meta-analyses identified during the course of the systematic map. Full details for all searches can be found in supplementary information accompanying the systematic map described in Haddaway et al. [33].

Search update

A search update will be undertaken to capture research published since the original search in September 2013. The update will be restricted to four academic databases, Academic Search Premier, Pub Med, Scopus, Web of Science (Web of Science Core Collection, BIOSIS Citation Index, Chinese Science Citation Database, Data Citation Index, SciELO Citation Index), and one academic search engine, Google Scholar, which has been shown to be effective at identifying both academic and grey literature [34]. The choice to reduce the number of citation databases was driven by observations made during the undertaking of the systematic map, where a large number of duplicates was identified in many of the databases used. Only English language search terms will be used but all articles identified in Danish, English, French, German, Italian, and Swedish will be included.

In the academic databases the following search string will be used to search on ‘topic words’. This search string has been adapted from the original string used in the published systematic map [35] to identify specifically tillage research and restricted to the period since the original search was undertaken (September 2013):

soil* AND (arable OR agricult* OR farm* OR crop* OR cultivat*) AND (till* OR “no till*” OR “reduced till*” OR “direct drill*” OR “conservation till*” OR “minimum till*”) AND (“soil organic carbon” OR “soil carbon” OR “soil C” OR “soil organic C” OR SOC OR “carbon pool” OR “carbon stock” OR “carbon storage” OR “soil organic matter” OR SOM OR “carbon sequestrat*” OR “C sequestrat*”)

[the underlined text indicates modifications to the original systematic map search string]

In Google Scholar the following search string will be used and the first (up to) 1000 records downloaded for both title and full text searches:

soil AND carbon AND (till OR tillage OR “reduced tillage” OR “conservation tillage” OR “no tillage” OR “direct drill” OR “minimum till*”)

Up to 1000 search results (ordered by an undisclosed algorithm) for full text searches and title searches restricted to 2013–2015 will be downloaded using webcrawling software [34, 36].

Screening

A total of 311 studies have already been identified as part of the recent systematic map [33]. These studies were originally assessed according to predefined inclusion criteria (see [35]) as part of the systematic map. These original inclusion criteria were modified for the purposes of this systematic review by the inclusion of a requirement for studies to have investigated tillage interventions. The inclusion criteria used to screen all studies (including the original 311 studies and the updated search results) are as follows:

Relevant populations:

Arable soils in agricultural regions from the warm temperate climate zone (fully humid and summer dry, i.e., Köppen–Geiger climate classification; Cfa, Cfb, Cfc, Csa, Csb, Csc) and the snow climate zone (fully humid, i.e., Köppen–Geiger climate classification; Dfa, Dfb, Dfc). Figure 1 displays the geographical regions covered by these zones. These zones were selected due to their relative homogeneity and relevance to the Swedish environment.

Relevant interventions:

All tillage practices identified iteratively within the evidence base. Such practices include: no tillage (also described as direct drill); reduced, minimum or conservation tillage (i.e. chisel plough, disc plough, harrow, mulch plough, ridge till); rotational tillage (i.e. non-annual, regular tillage); conventional tillage (i.e. mouldboard plough); subsoiling. We recognise that some tillage practices classified above as reduced tillage may be intensive, and all described tillage practices will be assessed on an individual basis before classifying them broadly as no tillage, moderate intensity tillage, and high intensity tillage.

Relevant comparators:

Any comparison between different intensities of tillage from no tillage to intensive tillage. Additionally, studies will be included that make comparisons of single interventions from before relative to after the intervention.

Relevant outcomes:

Soil C measures, including: soil organic carbon (SOC), total organic carbon (TOC), total carbon (TC), and soil organic matter (SOM). This may be expressed either as a concentration (e.g. g/kg or %) or as a stock (e.g. Mg/ha).

Relevant study types:

Field studies examining interventions that have lasted at least 10 years to ensure that changes in soil C are detectable [37].

Fig. 1
figure 1

World map of Köppen–Geiger climate classification. From [40]

Every study identified via the update will be screened through three stages: title, abstract and full text. At each level, records containing or likely to contain relevant information will be retained and taken to the next stage. Where information is lacking (for example where abstracts are missing), the record will be retained in order to be conservative. Following abstract screening full texts will be sought and those that cannot be obtained will be documented as such in the full systematic review. Screening will be performed by one reviewer, with a subset of 10 % of records at abstract level being screened by a second reviewer. A Kappa test [38] will be performed on the dual screening to assess the level of agreement. Where agreement is lower than moderate (kappa = 0.6) discrepancies will be discussed in detail and a further subset screened and tested to ensure improvement in consistency before continuing with screening.

Additional bibliographic checking

Reviews and meta-analyses identified through screening of search results from the search update described above will be assessed separately, examining the bibliographies of each article for potentially relevant articles. As with the screening described above, bibliographic checking will be performed by one reviewer with a subset of 10 % of review bibliographies being checked by a second reviewer to ensure consistency.

Critical appraisal of study validity

Critical appraisal undertaken in the completed systematic map

The completed systematic map undertook critical appraisal of the included studies for the purposes of excluding unreliable studies that were highly susceptible to bias (such as those lacking details on methods, or those with no replication) or non-generalisable and to assess the reliability of the evidence base. Reasons for exclusion were transparently recorded for all studies (see supplementary information in [33]). In addition to excluding studies that were highly susceptible to bias, five domains were assessed for study reliability for those studies passing the initial assessment: spatial replication (number of spatial replicates); temporal replication (number of time samples); treatment allocation (e.g. randomized, blocked, purposeful); study duration (length of the experimental period); soil sampling depth (the number and extent of soil depth samples taken). For each of these domains, studies were awarded a 0, 1, or 2 for the degree of reliability as described in Table 1. Where insufficient information was reported a ‘?’ was awarded. See Haddaway et al. [33] for full details of the methods used and results from the systematic map.

Table 1 Critical appraisal criteria

On-going critical appraisal for this systematic review

The initial critical appraisal schema described above will be used to assess studies identified through the search update. Next, every study that has passed this first stage of critical appraisal will then be given a ‘low’ or ‘high’ reliability rating based on an individual assessment of reliability for each study (using the coding described in Table 1), and a short justification will be given for each study in text form. This rating activities will be performed by two reviewers. Rating will be used as a basis for sensitivity analysis in the meta-analyses described below.

Effect modifiers/sources of heterogeneity

All studies included in this review after critical appraisal will be subject to extraction of meta-data (see Data Extraction, below), which will include the extraction of data regarding key sources of heterogeneity. These include: climate zone, latitude, longitude, and soil type (classification or texture). These potential modifiers will be used in meta-analyses to account for significant differences between studies, as described below in Synthesis. All studies used in this review will be long-term agricultural sites, and so the impacts of interventions will all be investigated in relation to implementation of alternative agricultural practices on similar land-use types. Where possible, baseline data will be used to account for variability within studies.

Data extraction

Meta-data will be extracted for all studies. This information will include the following information: citation; study location (country, site, climate zone, latitude and longitude); soil type (classification or percent clay/silt/sand); study description (start year, duration, treatments investigated, cropping system, experimental design); sampling strategy (spatial and temporal replication, subsampling, soil sampling depth, C measurement method). In addition, quantitative data (i.e. study findings) will be described (outcome type, units, data location, measure of variability, presence of bulk density) and extracted. Tillage categories for further synthesis will be assessed as belonging to one of the following three categories: no tillage, moderate intensity tillage and high intensity tillage. This assessment will be undertaken by extracting all interventions in the evidence base (machinery, tillage depth and timing) and building a coding tool through which each intervention will be coded into one of the above three categories. This coding tool will be produced through discussions between at least two members of the team, with the tillage description from all articles and the coded tillage category included in a database of all studies.

Meta-data will be extracted into one database describing all studies, whilst quantitative data (i.e. study findings) will be extracted into separate spreadsheets for each study for transparency. Effect sizes for use in meta-analyses will then be calculated within each of these files before being combined for analysis. Effect sizes used in analyses will be raw mean difference expressed in g/kg for concentrations or kg/ha for stocks (study findings standardised according to study duration). In order to account for the potentially non-linear nature of changes to soil C, a categorical coding variable [coded as ‘short-term’ (10–19 years), ‘medium-term’ (20–29) or ‘long-term’ (>29)] for study duration will be included in meta-analyses as a moderator to investigate the influence of study duration. Data from studies quoting stocks rather than concentration will be converted to concentration to enable equivalent effect sizes to be incorporated in one meta-analysis. Studies that do not provide bulk density along with stocks will be analysed separately as stocks (where universal soil depth limits can be ascertained across the evidence base).

Synthesis

A narrative synthesis of the evidence base will be undertaken using tables and figures that both describe the evidence base itself and the findings of individual studies. In addition, meta-analysis will be performed where possible, as described below.

Key comparisons for meta-analysis

Tillage depth cannot be included as a continuous variable in a meta-regression since the relationship between tillage depth and soil C is non-linear. Instead, meta-analysis will be separated into 3 sub-group comparisons for different pairs of interventions as follows: (1) no tillage versus high intensity tillage; (2) moderate intensity tillage versus high intensity tillage; (3) no tillage versus moderate intensity tillage.

Investigation of impacts on SOC across soil depths

In order to maximise the use of information across the evidence base, three sub-group analyses will be performed on different soil depths. Since meta-analyses condense study results into single effect sizes, multidimensional results cannot be incorporated into single meta-analyses. Instead, sub-groups will be used to investigate the influence of tillage at different soil depths. The three depths investigated will be 0–15, 15–30 and 30 cm and below. Since studies understandably do not consistently conform to these cutoffs, the following scheme of weighting will be used. Firstly, studies will be weighted according to the proportion of the depth bracket covered by the study. For example, a study providing data from 0 to 10 cm will be weighted using a factor of 0.67. Secondly, where studies provide data that overlap the boundary between two depth brackets the data will be included in only one sub-group analysis, and it will be included in the higher depth bracket for conservatism (since shallower depths see greater significant differences in SOC). For example, a study presenting data for 0–20 cm will be included in the 0–15 cm depth bracket but given full weight. Thirdly, studies spanning more than two depth brackets (e.g. 0–45 cm) will be excluded from the three main sub-group analyses and included in a fourth meta-analysis across all depths if sufficient studies are identified.

Meta-analysis: sensitivity analyses

Studies may not be includible in meta-analysis where they do not report one of three key variables for each treatment: mean, variability measure (e.g. standard deviation), and sample size (true spatial replication) [39]. Many studies identified in the systematic map by Haddaway et al. [33] failed to report suitable measures of variability across all treatments that would facilitate meta-analysis. However, in some instances an overall variability measure across intervention groups is provided that may be used as an estimate of variability. Furthermore, some studies report other summary results that may be sufficient to calculate variability either between or across interventions. For these cases where variability is estimated, sensitivity analysis will be performed both with and without the estimated variability studies to investigate the influence of less reliable measures on the review findings. As described above in Critical Appraisal, sensitivity analysis will also be performed to investigate the influence of ‘low’ reliability studies on the review findings.

Accounting for multiplicity of p values

Since several subgroup analyses and sensitivity analyses will be undertaken, the threshold for p value significance should be adjusted conservatively depending on the number of a priori tests performed (three depth profiles, each with two sets of sensitivity analyses; a total of 12 tests). Emphasis will also be put on the magnitude of the significant findings (i.e. biological significance) rather than significance itself.

Meta-analyses will be summarised in tables (for sensitivity analysis and subgroup analysis results) and in forest plots (for meta-analysis outputs). Forest plots will be summarised across groups (i.e. by effect modifiers) where the number of included studies is substantial.