Background

Depression is a common disease affecting up to 17% of the population during their lifetime [1]. Based on data from the WHO, depression is thought to be the second largest health-care problem globally, in terms of years lived with disability (YLD) [2]. Depression is also observed as a co-morbidity in a number of somatic diseases, significantly contributing poorer outcomes in diseases such as cancer, ischaemic heart disease, and diabetes. Depending on its severity, depression is often treated using psychotherapy, antidepressants, or a combination of both. However, the clinical efficacy of antidepressants [3,4] and psychotherapy [5-7] has been challenged. Both treatments are costly in terms of time and money and may also have adverse effects. Compliance with antidepressant treatment is poor; the dropout rate in clinical trials is reported to be between 12% and 40% within the initial 6 to 8 weeks of treatment [3,8].

The weakness of evidence for the beneficial effect of treatment, along with problems related to cost, harm, and low compliance, has resulted in an interest in using alternative or complementary therapies. The use of exercise as an intervention has attracted a lot of attention, and various forms of exercise varying in intensity have been assessed in a number of randomized clinical trials to test their effectiveness as a treatment for patients with depression.

In 2011, the authors of this paper published a meta-analysis of randomized clinical trials examining the effect of exercise on depressive symptoms in patients with clinical depression [9]. The results suggested that referring patients with clinical depression to exercise programs was associated with a small to moderate effect on depressive symptoms. However, restricting the analysis to three trials with a low risk of bias, the effect estimate was non-significant. Since 2011, other reviews have been published on the effect of exercise on depressive symptoms [10], in older people [11] and in patients with chronic illnesses [12]. However, none of these reviews addressed the specific population of adults diagnosed with major depression according to valid diagnostic criteria, such as the International Classification of Diseases [13] or the Diagnostic and Statistical Manual of Mental Disorders [14]. The reviews contained a number of trials that included volunteers who were defined as being depressed on the basis of psychometric testing (for example, Beck Depression Inventory [15]), as opposed to individuals with a clinical diagnosis of major depression. Furthermore, several randomized clinical trials investigating the effect of exercise in clinically depressed individuals have been published since our 2011 review.

Objectives

The objective of this present systematic review is to investigate the beneficial and harmful effects of exercise, in terms of severity of depression, lack of remission, quality of life, suicide, and so on, compared with or without co-interventions in adults with a clinical diagnosis of major depression.

Apart from including new trials, the current systematic review differs from the previous study [9]. The current review only considers trials including participants with a diagnosis of major depression and does not include patients referred with depressive symptoms. The harmful effects of exercise interventions are also addressed, and bibliographical searches have been extended to include a Chinese and a South-American database.

Methods/design

  • This systematic review will only include randomized clinical trials. This protocol is not registered with PROSPERO.

Inclusion criteria

  • Participants should be diagnosed as having major depression according to a valid and recognized diagnostic system (that is, Research Diagnostic Criteria (RDC) [16], International Classification of Diseases (ICD) [13], or Diagnostic and Statistical Manual of Mental disorders (DSM) [14]).

  • Participants aged >17 years of both sexes.

  • Randomized clinical trials. A trial is defined as a randomized clinical trial if the allocation of participants to intervention and comparison groups is described as randomized (including terms such as ‘randomly’, ‘random’, and ‘randomization’).

  • No restriction to type of publication (that is, we will include abstracts and full text reports).

Exclusion criteria

  • Trials measuring depression immediately after a single bout of exercise.

  • Trials comparing one form of exercise versus another.

  • Trials comparing different exercise intensities without including a control group.

Interventions

  • The trials had to allocate participants to an exercise intervention versus a control group (that is, exercise versus a control group receiving no intervention or treatment as usual or an attention control using light exercise) or using exercise as an add-on-treatment (that is, exercise plus medication in the experimental group versus medication alone in the control group).

  • Exercise intervention is defined as a systematic physical intervention with the intention to increase muscle strength and/or cardiovascular fitness. A control group could include no treatment or only an attention control using light exercise. However, it should specifically be mentioned by the authors that the intervention is intended to be a control intervention. Light exercise would be equivalent to stretching or light aerobic exercise.

Outcomes

The primary outcomes are 1) depressive symptoms measured on a continuous scale assessed at the end of the intervention; 2) lack of remission, that is, a binary outcome of the proportion of participants in each intervention group of the trial who did not obtain remission at the end of the intervention according to the authors’ own definition; and 3) serious adverse events defined according to ICH-GCP as any untoward medical occurrence that was life threatening, resulted in death or persistent or significant disability (ICH-GCP 1997). Serious adverse events will accordingly include suicide attempts as well as suicides. The secondary outcomes are non-serious adverse events, depressive symptoms, and lack of remission assessed beyond the intervention.

Search strategy

The search will include search CENTRAL, MEDLINE, EMBASE, and Science Citation Index (Web of Science) using medical subject headings (MESH or similar) when possible and text word terms: depression, depressive disorder and exercise, aerobic, non-aerobic, physical activity, physical fitness, walking, jogging, running, bicycling, swimming, strength, and resistance. The search will also include LILAC (Latin American and Caribbean Health Sciences Literature) and the Chinese Wanfang database using text word terms: depression, depressive disorder, and exercise or physical training. The flow of trial reports and reasons for exclusion will be presented in the PRISMA flow chart and categorized: non-clinical populations (that is, not diagnosed according to a diagnostic system), review or commentary, not a randomized trial, acute exercise (that is, studies/trials investigating the effect of a single bout of exercise), and trials including patients with other psychiatric diagnoses (for example, bipolar). In addition, reference lists of relevant reviews will be searched for additional trials.

Study selection

One investigator (JK) will examine titles and abstracts to remove obviously irrelevant reports. Two investigators (JK + HS) will examine the remaining full text reports determining compliance with inclusion criteria.

Data extraction

Two authors (JK, HS) will independently extract data using a pre-piloted structured form. Any discrepancies in the data extraction or inclusion/exclusion of trials will be resolved by referring to the original papers. CG or MN will assist as adjudicator in cases of disagreements. The authors will not be blinded to article results, authors, or institutions. Data extraction will, in addition to outcomes, include information regarding country of origin, number of randomized participants, number of participants included in efficacy analysis, mean age of participants, diagnostic system, baseline assessment of depression severity, type of intervention, frequency of intervention, , duration of intervention, and recruitment setting (clinical vs. non-clinical).

The authors JK, CG, and MN have previously published trial reports assessing the effect of exercise in patients with depression [17,18]. To avoid academic bias, a third assessor (CH) will assist HS in bias assessment for these two trials.

Risk of bias assessment

Methodological studies show that trials with unclear or inadequate methodological quality regarding bias domains may be associated with bias (systematic error, the overestimation of benefits, and the underestimation of harms) when compared to trials using adequate methodology [19-24]. Definitions in the assessment of bias risk of a trial will be done according to the Cochrane Handbook for Systematic Reviews of Interventions [19] of the following domains: allocation sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, incomplete outcome data, selective outcome reporting, for-profit bias, and other bias. Please see Appendix for specifications on bias assessment.

Trials assessed as having ‘low risk of bias’ in all of the above specified domains will be considered ‘trials with low risk of bias’. Trials assessed as having ‘uncertain risk of bias’ or ‘high risk of bias’ in one or more of the above specified domains shall be considered trials with ‘high risk of bias’. In line with our previous systematic review [9] and the latest Cochrane review [10], trials with low risk of bias in the allocation concealment domain, blinded outcome assessment domain, and the intention-to-treat analysis domain will also be characterized as trials with ‘lower risk of bias.’ However, in case no or few trials with low risk of bias will be included, we shall remember that the chance to know the ‘true’ intervention effect in trials with ‘lower risk of bias’ is low or absent.

Data synthesis and analysis

In order to be able to include all of the studies in our meta-analysis [25], estimates of standardized mean difference (SMD) for each individual study will be carried out. SMD is the mean difference in depression score between the exercise and control groups dived by the pooled standard deviation. The result is a unit less effect size measure, which is comparable to other studies using other but similar measures of outcome. By convention, SMD effect sizes of 0.2, 0.4, and 0.8 are considered small, medium, and large, respectively. For dichotomous variables, we will calculate the relative risks with a 95% confidence interval. It is expected that some trials have several intervention groups. Data from the experimental groups will be pooled and compared with the data from the control group. In case of discrepancies between the random-effects model analysis and the fixed-effect model analysis, both results will be reported [26]; otherwise, only results from the random-effects analysis will be reported.

The degree of heterogeneity will be quantified using the I-squared statistic [27], which can be interpreted as the percentage of variation observed between the trials attributable to between-trial differences, rather than sampling error (chance). Heterogeneity will be explored by analysis of sub-groups (see below).

For the primary outcomes, trial sequential analysis will be attempted, based on mean differences or proportions [28,29]. In order to calculate the required information size and the cumulative Z-curve’s eventual breach of relevant trial sequential monitoring boundaries, the required information size for a primary continuous outcome will be based on type I error of 5%, a beta of 10%, the standard error of the meta-analysis, and a minimal difference of three points on the HAM-D17. In order to calculate the required information size and the cumulative Z-curve’s eventual breach of relevant trial sequential monitoring boundaries, the required information size for the primary dichotomous outcomes will be based on type I error of 5%, a beta of 10%, the proportion of patients in the control group with the outcome, and a relative risk reduction of 15% or 30%. Most systematic reviews do not contain sufficient power [30], and if there is no significant effect of the intervention, it is also interesting to know whether this represents an absence of evidence (the cumulative Z-curve has not reached the futility area), or if it represents evidence of an absence of effect (the cumulative Z-curve has reached the futility area). If an absence of evidence persists, the likely number of participants still needed to answer the question raised can also be assessed. An interesting question is whether the trial sequential monitoring boundaries for benefit (or potentially for harms) are crossed. This informs as to whether new trials should have been stopped. Bayes factors will be calculated for all primary values (the ratio between the P value probability divided by the probability of the meta-analysis result, given that an anticipated intervention effect is the true effect) [26].

To assess the potential impact of missing data (incomplete outcome data bias), a ‘best-worst’ case scenario will be assessed, assuming that all participants lost to follow-up in the intervention group had a beneficial outcome (the group mean minus 1 standard deviation (SD)), and all those with missing outcomes in the placebo group have had a harmful outcome (the group mean plus 1 SD and 2 SD). It is also planned to perform the reverse ‘worst-best-case’ scenario analysis [26].

Regarding the outcome of lack of remission, trials will be included with incomplete or missing data. In case of missing data for the ‘lack of remission’ outcome, missing values will be imputed in sensitivity analysis according to the following scenarios [31]: 1) poor outcome analysis: assuming that none of the drop-outs/participants lost from both the experimental and the control arms experienced the outcome, including all randomized participants in the denominator; 2) good outcome analysis: assuming that all of the drop-outs/participants lost from the experimental and the control arms experienced the outcome, including all randomized participants in the denominator; 3) extreme case analysis favoring the experimental intervention (‘best-worse’ case scenario): none of the drop-outs/participants lost from the experimental arm, but all of the drop-outs/participants lost from the control arm experienced the outcome, including all randomized participants in the denominator; and 4) extreme case analysis favoring the control (‘worst-best’ case scenario): all drop-outs/participants lost from the experimental arm, but none from the control arm experienced the outcome, including all randomized participants in the denominator.

Subgroup analyses

In subgroup analyses, the possible effects of a number of variables on outcomes and heterogeneity will be compared. It is expected that no, or very few, trials with low risk of bias will be found, and therefore, an assessment of the risk of bias by comparing trials with lower risk of bias is planned according to adequate allocation concealment, blinded outcome assessment, and intention-to-treat analysis to trials with high risk of bias according to these domains. The effect of age will be assessed by comparing trials including older participants (mean age >60 years) with trials including younger participants (mean age <60 years). The effect of group versus individual exercise will be assessed by comparing trials using group exercises compared to trials using individual exercises. The effect of the duration of intervention will be assessed by comparing trials with short duration of intervention to trials with long duration of intervention. The two groups formed will be based on the median duration of intervention employed. The effect of type of control group will be assessed by comparing trials with trials, using attention control to trials with other forms of control. Assessment of the effect of using exercise as an add-on therapy by comparing trials using placebo/attention control/TAU as control group to trials using antidepressant as a control group will be carried out. In addition, a within-study comparison of low-dose exercise versus high-dose exercise in trials using different exercise intensities will be performed. The effect of co-morbid somatic disease will be assessed by comparing the effect estimates from trials including patients with depression compared to trials including patients with depression in addition to a somatic disease.

Publication bias will be assessed by visual inspection of a funnel plot and by Egger’s test. The meta-analyzed results will be presented in a summary of findings table according to the GRADE system [32].

Discussion

In this systematic review, the assessment of the benefits and harms of exercise interventions for adults with clinical diagnosis of major depression will be reviewed. It is intended to minimize selection bias by including bibliographical databases from South America (LILACS) and China (Wanfang) in addition to standard search strategies limited to western bibliographical databases (for example, CENTRAL, MEDLINE, EMBASE). In addition to meta-analysis, trial sequential analysis to assess our risks of random error is planned. The final discussion will include an analysis of the strength and limitations of the evidence and of the current review.

Based on the authors’ previous review and intimate knowledge of the current subject, we expect to include more than 1,000 patients diagnosed with depression included in randomized clinical trials. The current review will support health-care providers and decision makers within the health-care system on the decision to include exercise as a standard treatment for patients with depression.