Background

Selective serotonin reuptake inhibitors (SSRIs) are often first-line treatment for depression and prescriptions for SSRIs are increasing [1, 2]. A number of reviews with meta-analysis have assessed the effects of SSRIs in adults with major depressive disorder [3,4,5,6,7,8], generally concluding that SSRIs have a statistically significant effect on depressive symptoms [3,4,5,6,7,8]. However, the results of the reviews have been limited by not using predefined Cochrane methodology [3,4,5,6,7,8], only including subgroups of depressed patients [9, 10], not searching all relevant databases [3,4,5,6,7,8, 10], not systematically assessing harms [3,4,5,6,7,8, 10], and not systematically assessing risks of bias [3,4,5,6,7,8, 10]. We have summarised the characteristics and the results of previous systematic reviews in Table 1. Accordingly, the evidence on the effects of SSRIs is unclear. Using, e.g., a composite outcome of all serious adverse events (according to ICH-GCP [11]) might show how SSRIs work. Furthermore, assessments of quality of life might demonstrate if SSRIs have clinically meaningful effects. It is of utmost importance to assess the clinical significance of review results if statistically significant results are shown [12, 13].

Table 1 Overview of previous reviews

Our objective was to conduct a comprehensive systematic review assessing the beneficial and harmful effects of SSRIs versus placebo, ‘active’ placebo, or no intervention in adult participants with major depressive disorder using our eight-step procedure for assessing evidence in systematic reviews [13].

Methods

Details regarding the methodology are described in our protocol, which was registered prior to the systematic literature searches [14]. The methodology was not changed after the analysis of the review results began [14].

We included all randomised clinical trials comparing the effects of SSRIs (citalopram; escitalopram; sertraline; fluoxetine; paroxetine; or fluvoxamine) versus placebo, ‘active placebo’ (any active substance employed to mimic the adverse effects of taking a SSRI) [15], or no intervention. We also planned to perform subgroup analyses comparing the effects of the different doses (see Subgroup analyses). If a trial had three arms (e.g., a three-arm trial randomising the participants to two different SSRIs and placebo) then we divided the total number of control participants with two but kept the means and SDs in each group unchanged [16, 17].

Independent investigators searched for eligible trials published before January 2016 in The Cochrane Library’s CENTRAL, PubMed, EMBASE, PsychLIT, PsycINFO, clinicaltrials.gov., and Science Citation Index Expanded [14] (see Additional file 1: Search strategies). Trials were included irrespective of language, publication status, publication year, and publication type. To identify unpublished trials, we searched clinical trial registers of Europe and USA, websites of pharmaceutical companies, websites of U.S. Food and Drug Administration (FDA) and European Medicines Agency, and we requested the U.S. Food and Drug Administration (FDA) to provide all publicly releasable information about relevant clinical trials of SSRIs that were submitted for marketing approval.

Participants had to be 18 years or older and have a primary diagnosis of major depressive disorder based on standardised criteria, such as DSM III, DSM III-R, DSM IV, DSM V, or ICD 10 [14].

We excluded trials specifically randomising depressed participants with: a specific somatic disease, schizophrenia, or depression during or after pregnancy.

Outcomes

Primary outcomes

  • Depressive symptoms measured on the 17-item or 21-item Hamilton Depression Rating Scale (HDRS) [18], the Montgomery-Asberg Depression Rating Scale (MADRS) [19], or the Beck’s Depression Inventory (BDI) [20].

  • Remission (Hamilton <8 points; BDI <10 points; MADRS <10 points).

  • Adverse events during the intervention period which were classified as serious and non-serious adverse events [11]. Serious adverse events were defined as medical events that were life threatening, resulted in death, disability, or significant loss of function, or caused hospital admission or prolonged hospitalisation [11]. The remaining events were classified as non-serious adverse events [11].

Secondary outcomes

  • Suicides, suicide attempts, and suicide ideation during the intervention period.

  • Quality of life (scale used by the trialists).

The time point of primary interest was end of treatment (defined by trialist) [14]. We also planned to report results assessed at maximum follow-up [14].

Handling of missing data

If the participant was not included in the analysis of ‘no remission’, we assumed that they had ‘no remission’ [17]. If, e.g. 23/50 participants had ‘no remission’ but a total of 53 were randomised then we assumed that 26 had ‘no remission’. For all remaining outcomes we used observed data when these were reported [17].

Subgroup analyses

We planned the following subgroup analyses: participants with HDRS baseline ≤23 compared to >23 points; different SSRIs; trials using a placebo washout period before randomisation compared to trials using no washout period; participants with alcohol or drug dependence compared to no dependence; elderly participants compared to younger participants; duration of treatment below 8 weeks compared to equal to or above 8 weeks; and SSRI below or equal to median dose compared to above median dose.

We planned to assess if SSRIs differ according to psychiatric comorbidities [17]. Four trials included depressed participants with comorbid anxiety but none of these trials reported HDRS so this could not be performed. No other comorbidities (including borderline personality disorder, chronic depression, and treatment resistant depression) were identified in the included trials. We planned to assess if the effects of SSRIs differed: (1) when the SSRI was delivered as add-on therapy to another antidepressant drug; (2) per use of different forms of control interventions (‘active placebo’, traditional placebo, and no intervention); and (3) if electroconvulsive therapy was used as co-intervention. However, these analyses were not possible.

We used test for subgroup differences to assess if the effects of SSRIs seemed to differ between the different types of participants if either a trial specifically randomised a certain type of participants, or a trial reported results separately for each specific type of participants.

Selection of trials and data extraction

Review authors (KKK, AS, SGH, SES, KLM, MI, MBB, IJP, JK, SLK, AT, SE, JCJ) worked in pairs and independently selected relevant trials and extracted data. A standardised data extraction sheet was used (see Data extraction form). If a trial was identified by only one, it was discussed whether the trial should be included. In case of discrepancy, a third review author (JCJ) was consulted. We contacted review authors if relevant information was missing.

Assessment of the statistical and clinical significance

Our methodology was based on The Cochrane Handbook and GRADE [14, 16, 21]. We assessed statistical and clinical significance according to our eight-step procedure [13]:

  1. (1)

    We obtained 95% confidence intervals (95% CI) and P-values from all planned random-effects [22] and fixed-effect meta-analyses [16] and reported the most conservative result as the main result [13]. Review Manager version 5.3 was used for all meta-analyses [23].

  2. (2)

    Sensitivity analyses and subgroup analyses were conducted to explore the reasons for substantial statistical heterogeneity [13, 16]. Statistical heterogeneity was assessed by visual inspection of forest plots and by the heterogeneity (I2 or D2) [13, 16, 24, 25].

  3. (3)

    We defined three primary outcomes in our protocol [14]. Our threshold for significance was therefore adjusted according to problems with multiplicity [13] by dividing 0.05 with the value halfway between 1 (no adjustment) and 3 (Bonferroni adjustment) [13, 17] resulting in 0.05/2 = 0.025.

  4. (4)

    Cumulative meta-analyses are at risk of producing random errors due to sparse data and multiple testing of accumulating data [25, 26]. Therefore, Trial Sequential Analysis version 0.9.5.5 beta was applied to control this risk (http://www.ctu.dk/tsa/) [27]. The required information size (that is the number of participants needed in a meta-analysis to detect or reject a certain intervention effect) was calculated [25, 28]. The required information size is based upon the event proportion in the control group; the assumption of a plausible relative risk (RR) reduction; and the assumed heterogeneity or diversity of the meta-analysis [25, 29]. Trial Sequential Analysis enables testing to be conducted each time a new trial is included in the meta-analysis. Based on the required information size, trial sequential monitoring boundaries are constructed. This enables one to determine the statistical inference concerning cumulative meta-analysis that has not yet reached the required information size [25]. Firm evidence may be established if one of the trial sequential monitoring boundaries (for benefit (upper red dotted line), for harm (lower red dotted line), or for futility (vertical red dotted line)) is crossed by the cumulative Z-score before reaching the required information size, in which case further trials may turn out to be superfluous [28]. Trial Sequential Analysis- adjusted confidence intervals are also presented [28]. For dichotomous outcomes, we estimated the diversity-adjusted required information size based on the proportion of patients with an outcome in the control group, a relative risk reduction of 30%, an alpha of 2.5%, a beta of 20%, and diversity in the trials [13, 17]. For continuous outcomes, we estimated the required information size based on a HDRS mean difference of three points, the standard deviation observed in the control group, an alpha of 2.5%, a beta of 20%, and diversity in the trials [13, 17]. All outcomes were assessed with Trial Sequential Analysis [13, 24, 25].

  5. (5)

    We calculated Bayes factors for all primary outcomes. A low P value indicates that an observed result is unlikely given that there is no difference in effect between the compared intervention groups (i.e., the null hypothesis is true) [12, 13]. Even very low P values may, therefore, be misleading because the probability that the actual measured difference in effect of the compared interventions resulted from an a priori anticipated ‘true’ difference needs to be considered [13]. For this purpose, it is helpful to calculate Bayes factor for the primary outcomes. It will show the ratios between the P-value probabilities of the meta-analysis result divided by the probability of the meta-analysis result given that an anticipated intervention effect is the true effect [12, 13]. In other words, the lower the Bayes factor gets the more confident one should be that an actual intervention effect (the anticipated intervention effect) has produced the meta-analysis results and that a given difference between the compared groups is not caused by random error [12, 13]. Calculation of Bayes factor is not part of standard Cochrane methodology.

  6. (6)

    We assessed the potential impact of bias on the review results [16]. To assess the potential impact of missing data (incomplete outcome data bias) we assessed a ‘best-worst’ case scenario assuming that all participants lost to follow-up in the SSRI group had a beneficial outcome (the group mean plus 1 standard deviation (SD) or plus 2 SDs); and all those with missing outcomes in the placebo group have had a harmful outcome (the group mean minus 1 SD or minus 2 SDs) [13, 17]. We also performed the reverse ’worst-best-case’ scenario analysis [13, 17].

  7. (7)

    We assessed the risk of publication bias by visual inspection of funnel plots and by tests for funnel plot asymmetry [13, 16, 30].

  8. (8)

    We assessed clinical significance of our results. As previously suggested [4, 8, 31], we chose a drug-placebo difference of 3 points on the 17-item HDRS or an effect size of 0.50 SMD as the threshold for clinical significance (see Discussion) [14].

Results

We have summarised the selection of trials in Fig. 1 and excluded trials in Additional file 2: List of excluded trials.

Fig. 1
figure 1

PRISMA flowchart

Using our strict inclusion and exclusion criteria, a total of 195 publications/unpublished trials were identified and included. Due to multiple publications of single trials and lack of useful data, only 131 trials randomised clinical trials [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164] enrolling a total of 27,422 participants were included in our analyses. 116 were published trials [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,121,122,123,124,125,126,127,128,129,130,134,135,147,157,158,159,160,161,162,163,118, 120131, 133136, 141, 144, 146148, 152, 153, 156164] and 15 were unpublished trials [102,138,139,150,, 119, 132, 137140, 142, 143, 145, 149151, 155, 165]. Of the 15 unpublished trials, eight were identified via company websites, two via clinicaltrials.gov, and four via FDA (see Additional file 3: Trial Characteristics).

We did not identify any trials using ‘active placebo’ or ‘no intervention’ as control interventions. Most trials used broad inclusion criteria and randomised adult depressed women and men below 65 years and used SSRI for 6 to 12 weeks. Twelve of the included trials specifically randomised elderly (most often participants >65 years) depressed participants [45, 48, 54, 63, 82, 110, 111, 113, 122, 144, 162, 166], five trials randomised depressed alcohol dependant participants [47, 61, 116, 120, 156], and three trials randomised depressed opioid and cocaine dependant participants [44, 115, 164]. Types of the randomised participants and other trial characteristics are summarised in Additional file 3: Trial characteristics.

Primary outcomes

Hamilton depression rating scale (HDRS)

Twenty-two trials reported mean HDRS scores and standard deviation (SD) [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50, 103, 144, 162] and 27 trials reported mean HDRS change scores and SD at end of treatment [51,55,56,57,58,59,60,61,62,63,64,65,, 52, 5466, 121, 123, 137, 138, 141, 142, 145, 146, 167, 168]. Random-effects meta-analysis of these 49 trials showed that SSRIs versus placebo significantly reduced the HDRS score (mean difference −1.94 points; 95% CI −2.50 to −1.37; P < 0.00001) (Fig. 2). Twenty-four trials reported only mean HDRS scores or presented a graph showing the mean HDRS scores, but did not report the SD at end of treatment [67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88, 166]. We planned to impute missing SDs based on observed standard deviations from trials with similar characteristics [17]. Trial characteristics, sample sizes, and statistical weight of the included trials were similar across trials and we therefore chose to impute the missing SDs with a value of 8 points (the mean of the observed standard deviations rounded up to the nearest integer). Nineteen trials reported only mean HDRS change scores or presented a graph showing the mean change HDRS scores, but did not report the SD [89,90,91,92,93,94,95,96,97,98,99,100,101,102, 108, 143, 157, 158, 169]. We imputed the missing SDs with a value of 7 points (the mean of the observed standard deviations rounded up to the nearest integer) [17]. Random-effects meta-analysis of the results of all 92 trials showed that SSRIs versus placebo significantly reduced the HDRS score (mean difference −2.25 points; 95% CI −2.69 to −1.83; P < 0.00001).

Fig. 2
figure 2

Random-effects meta-analysis of the results on the Hamilton depression rating scale

Heterogeneity

The primary meta-analysis showed statistically significant heterogeneity (I 2 = 75%; P < 0.00001) (Fig. 2). Visual inspection of the forest plot indicated that one trial seemed to have a more extreme effect (larger intervention effect estimate and smaller confidence interval) [58]. For exploratory purposes, we tried to exclude this trial from the analysis and this reduced the I2 to 29%, but removing this trial did not substantially alter the meta-analysis result (mean difference after removing the trial from the analysis −1.77 HDRS points; 95% CI −2.12 to −1.42).

Subgroup analysis per risk of bias

All the included trials had high risk of bias. Hence, it was not possible to perform a subgroup analysis of trials at low risk of bias (Fig. 3) [13, 17]. However, four trials [33, 60, 103, 121] were classified as potentially lower risk of bias (based on bias risk assessment of ‘generating allocation sequence’, ‘allocation concealment’, ‘blinding of participants and treatment providers’, and ‘blinding of outcome assessment’) [13, 17]. Random-effects meta-analysis of these four trials showed that SSRIs versus placebo reduced the HDRS score (mean difference −2.07 points; 95% CI −3.06 to −1.08). Test for subgroup differences between trials with lower risk of bias (n = 4) compared to trials with high risk of bias (n = 45) was not significant (P = 0.82).

Fig. 3
figure 3

Risk of bias in the included randomized clinical trials

Incomplete outcome data

Meta-analysis of the best-worst case scenario analyses adding 1 SD (mean difference −3.38 HDRS points in favour of SSRI; 95% CI −4.10 to −2.66) and adding 2 SDs (mean difference −4.50 HDRS points in favour of SSRI; 95% CI −5.37 to −3.63) for missing values showed large significant intervention effect estimates [13, 17]. Meta-analysis of the worst-best case scenario analyses showed a significant intervention effect estimate when adding 1 SD for missing values (mean difference −0.77 points; 95% CI −1.45 to −0.09) and no significant intervention effects when adding 2 SDs for missing values (mean difference 0.46 points; 95% CI −0.38 to 1.30).

Other subgroup analyses

Meta-analysis of the results of the 26 trials with a mean baseline HDRS score >23 points showed a mean difference of −2.69 HDRS points; 95% CI −3.59 to −1.78; P < 0.00001 [32,36,37,46,58,64,, 3538, 41, 43, 4547, 49, 52, 5759, 6365, 121, 137, 142, 145, 167, 170, 171]. Meta-analysis of the results of the 20 trials with a mean baseline HDRS score ≤23 points showed a mean difference of -1.29 HDRS points; 95% CI −1.76 to −0.82; P < 0.00001 [33,61,62,, 34, 39, 40, 44, 48, 50, 51, 54, 55, 6063, 103, 141, 144, 146, 162, 168]. Test for subgroup difference was significant (I 2 = 86.2%; P = 0.007). We performed post-hoc meta-regression (STATA 14) with baseline HDRS as a covariate in the meta-analysis. This analysis showed that the effects of SSRIs seem to increase with larger baseline HDRS scores (coefficient −0.33 points; 95% CI −0.44 to −0.22; P < 0.0001).

The following tests for subgroup differences did not show any significant differences: trials assessing the effects of the different SSRIs (number of trials 49; I 2 = 2.2%; P = 0.40) (Fig. 4); published trials (45 trials) compared to unpublished trials (4 trials): I 2 = 25.2%; P = 0.25; trials randomising elderly participants (6 trials) compared to younger participants (43 trials): I 2 = 0%; P = 0.94 (Fig. 5); trials with washout period (40 trials) compared to trials without washout period (9 trials): I 2 = 63.6%; P = 0.10 (Fig. 6); trials randomising drug or alcohol dependant participants (3 trials) compared to the remaining trials (46 trials): I 2 = 0%; P = 0.58; trials with an intervention period below 8 weeks (19 trials) compared to the remaining trials (30 trials): I 2 = 36.6; P = 0.21; and dose of the chosen SSRI (dose below the median (6 trials) compared to equal to or above the median (9 trials) of the SSRI: I 2 = 0%; P = 0.65. The latter two subgroup analyses were post-hoc analyses.

Fig. 4
figure 4

Subgroup analysis comparing trials assessing the effects of different selective serotonin reuptake inhibitors

Fig. 5
figure 5

Subgroup analysis comparing trials randomising elderly participants to trials randomising non-elderly participants

Fig. 6
figure 6

Subgroup analysis comparing trials using a placebo-washout period to trials not using a placebo-washout period

We performed an additional post-hoc subgroup analysis comparing trials with low risk of financial bias to trials with high risk of financial bias (Additional file 4: Figure S1). Test for subgroup differences showed no significant difference (P = 0.18). When the four trials with low risk of bias of financial bias were analysed separately then there was no significant difference between the SSRI group and the placebo group (−0.92 points; 95% CI −2.42 to 0.58; I2 26%) (Additional file 4: Figure S1).

Trial Sequential Analysis

The required information size was calculated based on an anticipated intervention effect of 3 HDRS points, the empirical variance, a risk of type I error of 0.025, and a power of 80% [12, 13, 17]. The Trial Sequential Analysis showed that the trial monitoring boundary for benefit was crossed after the 9th trial indicating a statistically significant result (Trial Sequential Analysis-adjusted CI −2.62 to −1.26) (Fig. 7).

Fig. 7
figure 7

Trial Sequential Analysis of the results of selective serotonin reuptake inhibitors on Hamilton depression rating scale

Bayes factor

Bayes factor was calculated based on our anticipated intervention effect of 3 HDRS points and the primary meta-analysis result (mean difference −1.94 points; 95% CI −2.50 to −1.37) [12, 13, 17]. Bayes factor (2.01*10−23) was below the threshold for significance of 0.1, supporting the statistical significant result.

Risk of publication bias

Visual inspection of the funnel plot did not show clear signs of asymmetry (Fig. 8) [13, 17].

Fig. 8
figure 8

Funnel plot of the random-effects meta-analysis of the effect of selective serotonin reuptake inhibitors on Hamilton depression rating scale

Hamilton Depression Rating Scale, Montgomery-Asberg Depression Rating Scale, and Beck’s Depression Inventory

Sixty-one trials [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,55,56,57,58,59,60,61,62,63,64,65,104,105,138,139,140,141,145,150,52, 5466, 103106, 120, 121, 123, 137142, 144146, 149151, 154, 155, 168, 172, 173] reported means and SD on the HDRS [18], MADRS [19], or BDI [174]. Using standardised mean difference, random-effects meta-analysis showed that SSRI versus placebo significantly decreased the standardised mean difference score (trials reporting mean scores: −0.23; 95% CI −0.31 to −0.14; P < 0.00001; trials reporting mean change scores: −0.26; 95% CI −0.35 to −0.17; P < 0.00001). The standardised mean difference was below our predefined threshold for clinical significance.

Long-term follow-up

One trial [33] reported mean HDRS scores and SD and one trial [146] reported mean HDRS change scores and SD at end of long-term follow-up. Random-effects meta-analysis of these trials showed a mean difference −0.18 points (95% CI −2.78 to 2.43; P = 0.89). Four trials reported mean HDRS scores or change scores at end of long-term follow-up but without reporting SDs [70, 97, 107, 143]. SDs were imputed. Random-effects meta-analysis of all the six trials showed a mean difference of −1.30 points (95% CI −2.72 to 0.13; P = 0.07).

No remission

Thirty-four trials [33,50,51,52,108,109,110,111,, 34, 38, 45, 4953, 55, 56, 60, 78, 81, 101, 104, 107112, 120, 128, 140, 141, 146, 153, 157, 158, 162, 171, 175, 176] reported the proportion of participants with no remission at end of treatment. A total of 1430/2211 (64.7%) SSRI participants experienced no remission compared with 1493/2003 (74.5%) control participants. Random-effects meta-analysis showed that SSRIs versus placebo significantly decreased the risk of no remission (RR 0.88; 95% CI 0.84 to 0.91; P < 0.00001) (Fig. 9). This corresponds to 657 (95% CI 642 to 679) SSRI participants out of 1000 will experience no remission compared with 746 control participants out of 1000 (see Additional file 5: Summary of findings table). Visual inspection of the forest plots showed no clear signs of heterogeneity [13, 16].

Fig. 9
figure 9

Random-effects meta-analysis of the results of selective serotonin reuptake inhibitors on remission of depression

The required information size was calculated based on the observed proportion of control participants without remission, a relative risk reduction of 30%, a risk of type I error of 0.025, and a power of 80% [13, 17]. The Trial Sequential Analysis showed that the trial monitoring boundary for benefit was crossed (Fig. 10) and the Trial Sequential Analysis-adjusted CI was 0.83 to 0.92 [13, 17].

Fig. 10
figure 10

Trial Sequential Analysis of the results of selective serotonin reuptake inhibitors on remission of depression

Bayes factor was 1426.8 based on the random-effects meta-analysis result and above the threshold for significance of 0.1 [13, 17]. This relatively high Bayes factor indicates that it is more likely that null effect (null hypothesis) compared to the anticipated intervention effect has produced this meta-analysis results [12, 13].

Best-worst case scenario showed a highly significant meta-analysis result (RR 0.78; 95% CI 0.73 to 0.83; P < 0.00001) [13, 17]. Worst-best case scenario showed no significant difference on risk of no remission (RR 0.95; 95% CI 0.89 to 1.02; P = 0.14) [13, 17].

Visual inspection of the funnel plot showed no clear signs of publication bias [13, 16, 17].

Serious adverse events

Because of the low proportion in the control group (around 2%) we used the Mantel–Haenszel (MH) odds ratio method with reciprocal zero cell correction (zero is replaced by the reciprocal of the size of the opposite treatment arm) [177]. Forty-four trials reported the proportion of participants with serious adverse events [48,55,113,114,115,116,117,138,145,150,179,, 49, 5456, 60, 63, 75, 78, 93, 94, 102, 105, 108, 112118, 120, 121, 137139, 141, 144146, 149151, 155, 167, 171, 173, 176, 178180]. A total of 239/8242 (2.7%) SSRI participants experienced a serious adverse event compared with 106/4956 (2.1%) control participants. Random-effects meta-analysis showed that SSRIs versus placebo significantly increased the risk of a serious adverse event (OR 1.37; 95% CI 1.08 to 1.75; P = 0.009) (Fig. 11). This corresponds to 31 (95% CI 25 to 40)/1000 SSRI participants will experience a serious adverse event compared with 22/1000 control participants (see Additional file 5: Summary of findings table). Visual inspection of the forest plot did not indicate significant heterogeneity [13, 17]. Even when using the multiplicity adjusted risk of type I error (0.05/2 = 0.025), the trial sequential boundary for harm was crossed (Fig. 12) and the Trial Sequential Analysis-adjusted CI was 1.03 to 1.89 [13, 17]. Table 2 summarises the types of adverse events.

Fig. 11
figure 11

Random-effects meta-analysis of the results of selective serotonin reuptake inhibitors on serious adverse events

Fig. 12
figure 12

Trial Sequential Analysis of the results of selective serotonin reuptake inhibitors on serious adverse events

Table 2 Summary of serious adverse events in the included trials

Bayes factor was 4.8*105 above the threshold for significance of 0.1 [13, 17]. This clearly shows that a beneficial effect of SSRIs on serious adverse events is very unlikely [13, 17]. Visual inspection of the funnel plot showed no signs of publication bias [13, 16, 17]. Based on the random-effects meta-analysis result, we calculated the number-needed-to-seriously harm one patient to be 138 patients.

Adverse events

Meta-analyses showed that the participants randomised to SSRIs versus placebo had a significantly increased risk of several adverse events. We have summarised the risks of the adverse events which were most reported in Additional file 6, including numbers-needed-to-harm. We have also included forest plots for the 25 most statistically significant adverse event results in the Additional files (see Additional files 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31: Figure S3-S27). The full list of the 84 reported adverse events are summarised in Table 3.

Table 3 Summary of all reported adverse events in the included trials

Clinical significance

All primary HDRS meta-analyses showed intervention effect estimates below our predefined threshold for clinical significance (a mean difference of 3 HDRS points or 0.5 standardised mean difference) [13, 17]. Our results show statistically significant effects, but the possible effects all seem to have questionable clinical significance [13].

Secondary outcomes

Suicides, suicide attempts, and suicide ideation

There were no significant differences between participants randomised to SSRIs versus placebo on number of suicides (RR 0.68; 95% CI 0.16 to 2.81; P = 0.59; Trial Sequential Analysis-adjusted CI 0.01 to 226.85; 6 trials [60, 71, 108, 113, 151, 155]); suicide attempts (RR 1.76; 95% CI 0.59 to 5.22; P = 0.31; Trial Sequential Analysis-adjusted CI 0.02 to 149.95; 8 trials [49, 56, 75, 94, 102, 139, 167, 181]); or suicide ideation (RR 0.80; 95% CI 0.36 to 1.77; P = 0.58; Trial Sequential Analysis-adjusted CI 0.03 to 23.20; 11 trials [49, 51, 120, 138, 139, 145, 151, 162, 167, 171, 180]). The required information size was not reached in any of the three Trial Sequential Analyses.

Quality of life

Only six trials assessed quality of life [48, 51, 63, 100, 101, 112] out of which four trials reported results on the quality of life enjoyment and satisfaction questionnaire (Q-LES-Q) [48, 51, 100, 101]. Two trials [48, 101] reported mean scores and SDs. Random-effects meta-analysis showed significant effect of SSRI on Q-LES-Q scores (RR 2.98; 95% CI 1.34 to 4.61; P = 0.0004).

Two trials reported results on the short form of the quality of life enjoyment and satisfaction questionnaire [63, 112], but only one trial reported mean scores and SDs [63]. The results from this trial showed that SSRIs (paroxetine) versus placebo significantly increased the mean score of the questionnaire (paroxetine 12.5 mg group mean 11.4, SD 16.7; paroxetine 25 mg group mean 11.5, SD 17.2; placebo group mean 5.3, SD 17.1) [63].

Post hoc analysis of no response

We identified 70 trials assessing the effects of SSRIs on no response defined as less than 50% reduction (from baseline) on either HDRS or MADRS. The meta-analysis showed that SSRIs seem to significantly decrease the risk of no response compared with placebo (RR 0.83; 95% CI 0.80 to 0.87; P = 0.00001) (Additional file 32: Figure S2).

GRADE assessments

GRADE assessments show that due to the high risks of bias the quality of the evidence must be regarded as very low (Additional file 5: Summary of Findings Table) [13].

Discussion

SSRIs may affect the concentration of essential neurotransmitter substances in the brain and are therefore considered to exert effects on depressive symptoms. However, whether these effects are beneficial and clinically meaningful are the questions. Estimating a meaningful threshold for clinical significance is difficult and an assessment of clinical significance should ideally not only include a threshold on an assessment scale [182]. Major depressive disorder affects daily functioning, increases the risk of suicidal behaviour, and decreases quality of life [183]. Some adverse events might therefore be acceptable if SSRIs have clinically significant beneficial effects [13, 183, 184]. We therefore both predefined a threshold for clinical significance and assessed the balance between beneficial and harmful effects [13, 17, 184].

As threshold for clinical significance [14], we chose a drug-placebo difference of 3 points on the 17-item HDRS (ranging from 0 to 52 points) or an effect size of 0.50 standardised mean difference. This has been recommended by the National Institute for Clinical Excellence (NICE) in England and has been chosen in other reviews [4, 8, 31]. Nevertheless, these recommendations are not universally accepted and have been questioned [3]. Others have suggested the following ‘rules of thumb’ regarding the standardised mean difference: 0.2 a small effect, 0.5 a moderate effect, and 0.8 a large effect [16, 185]. One study has shown that a SSRI-placebo mean difference of up to three points on the HDRS corresponds to ‘no clinical change’ [186]. Another valid study has shown that a SSRI-placebo difference of 3 points is undetectable by clinicians, and that a mean difference of 7 HDRS points, or a standardized mean effect size of 0.875, is required to correspond to a rating of ‘minimal improvement’ [187]. It has been speculated that the ‘placebo’ response in antidepressant trials has been increasing during recent years [188]. If there is a ‘response’ to placebo this has of course to be considered when interpreting a mean difference between drug and placebo. However, it is unlikely that depressed patient have a significant placebo effect [189] and it has recently been shown that the placebo response has been stable for 25 years [188]. Even based on our predefined minimal thresholds for clinical significance, the effects of SSRIs did not have a clinically meaningful effect on depressive symptoms. Furthermore, per our meta-analyses SSRIs significantly increase the risk of both serious and non-serious adverse events.

The best-worst and worst-best case scenarios showed that incomplete outcome data bias alone theoretically could have caused the apparent statistically significant beneficial effect of SSRIs. Furthermore, seen in the light of the total number of trials, only a relatively limited number of trials reported on each of our pre-defined outcomes. This increases the risk of selective outcome reporting bias. Apart from the high risk of incomplete outcome data bias and selective outcome reporting bias, all the included trials were assessed at high risk of bias. All trials used placebo as control intervention and due to the large number of adverse events, some patients might have figured out if they received an ‘active’ intervention or not, which might question the blinding of the trials. Nevertheless, it may be argued that our bias risk assessment often will lead to no trials with low risk of bias. However, similar bias risk assessments have been used in several previous systematic review (see, e.g., most Cochrane Hepato-Biliary Group systematic reviews) and our bias risk assessment is based on valid evidence clearly showing that if each of the used bias risk domains is ‘high risk of bias’ or ‘unclear risk of bias’ then there is a risk of overestimation of benefits and underestimation of harms [184,191,192,193,194,195,196,, 190197]. Furthermore, the risks of bias observed here just mirrors our experience in 786 randomised trials on depression [198].

We chose ‘remission’ as a primary outcome because we expected trialists to use this outcome frequently. To present a complete overview of the evidence on SSRIs for depression we also included ‘no response’ (less than 50% reduction on HDRS or MADRS during the intervention period) in a post hoc analysis because this outcome was frequently used in the included trials and by requests from peer reviewers. However, our results on no remission and no response should be interpreted with great caution for a number of reasons: 1) the assessments of remission and response were primarily based on single HDRS scores and it is questionable whether single HDRS scores are indications of full remission or adequate response to the intervention; 2) information is lost when continuous data are transformed to dichotomous data and the analysis results can be greatly influenced by the distribution of data and the choice of an arbitrary cut-point [16,200,, 199201]; 3) even though a larger proportion of participants cross the arbitrary cut-point in the SSRI group compared with the control group (often HDRS below 8 for remission and 50% HDRS reduction for response), the effect measured on HDRS might still be limited to a few HDRS points (e.g., 3 HDRS points) or less; 4) by only focusing on how many patients cross a certain line for benefit, investigators ignore how many patients are deteriorating at the same time. If results, e.g., show relatively large beneficial effects of SSRIs when remission and response are assessed but very small averaged effects (as our results show) – then it must be because similar proportions of the participants are harmed (increase on the HDRS compared to placebo) by SSRIs. Otherwise the averaged effect would not show small or no difference in effect. The clinical significance of our results on ‘no remission’ and ‘no response’ should therefore be questioned. The methodological limitations of using ‘response’ as an outcome has been investigated in a valid study by Kirsch et al. who conclude that: “response rates based on continuous data do not add information, and they can create an illusion of clinical effectiveness” [202]. In retrospect, due to these methodological limitations we should not have assessed ‘no remission’ or ‘no response’ as outcomes. This is a clear limitation of our review [16,200,, 199201].

Our tests for subgroup difference comparing trials with a baseline HDRS score below and above 23 points and meta-regression showed that the effects of SSRIs seem to increase with increased baseline HDRS score. Others have also shown that trials randomising participants with a higher baseline HDRS mean average seem to show larger effects of antidepressants [7, 8]. However, it is difficult to interpret why trials with higher average baseline HDRS score seem to have a larger effect of SSRIs. This might just be due to random error. No matter, it cannot be concluded based on these results that SSRIs work better on more severely depressed patients. To make such a conclusion individual patient data would be necessary, i.e., it would be necessary to show that it is actually the patients with higher baseline HDRS scores who have the larger effects. Gibbons et al. used longitudinal person-level data from a large set of published and unpublished studies and showed baseline severity was not significantly related to degree of SSRI treatment advantage over placebo [3]. It must be noted that the intervention effects in the group with HDRS scores above 23 points were still below our threshold for clinical significance, supporting Gibbons and co-workers’ results.

Leucht et al. have suggested that effects sizes of SSRIs in randomised clinical trials have declined over time [203]. Post-hoc meta-regression of the HDRS results confirmed their results (effect sizes going down from around 0.8 in the early 1980s to 0.25 in 2012). The reasons for the decreasing effect is not entirely understood but might be due to better methodology nowadays or recruitment of different types of participants [203]. Leucht et al. also suggested that a lack of difference between antidepressants and placebo is caused by an increasing ‘placebo’ effect (spontaneous recovery) [203]. This seem less important from a patient perspective, i.e., whether a certain drug should be used should be based on the benefits and harms of this drug compared with placebo. Furthermore, the increasing placebo effect has recently been severely questioned [188].

Our present systematic review has several strengths. Our protocol was registered prior to the systematic literature search in all relevant databases, data extraction, and data analyses [14]. Data were double-extracted by independent authors minimising the risk of inaccurate data extraction, and we assessed the risk of bias in all trials according to Cochrane [16]. We used Trial Sequential Analysis to control the risks of random errors [25, 29, 204], and the analyses of the primary outcomes showed that the accrued information sizes were sufficient. Both visual assessments of forest plots and statistical test showed limited signs of statistical heterogeneity, e.g., I2 was 0% when assessing risk of serious adverse events. Hence, these findings increase the validity of our review results and indicate that the effects shown are consistent across the different trials. Multiple previous reviews and meta-analyses have, as mentioned in our Background, assessed the effects of SSRIs and have generally concluded that SSRIs have significant effects on depressive symptoms [3,4,5,6,7,8]. However, the estimated results (and not the conclusions the review authors made) of these reviews and meta-analyses actually are in agreement with our present results and show that SSRIs do not seem to benefit patients more than a few HDRS points. This increases the validity of our present results. Furthermore, we assessed in detail the risks of serious adverse events and of non-serious adverse events and found that both were significantly increased by SSRIs.

Our systematic review has several limitations. Our HDRS mean differences were averaged effects. Hence, it cannot be concluded that SSRIs do not have clinically significant effects on all depressed participant. E.g., certain severely depressed patients compared with lightly depressed patients (e.g., so-called professional patients or symptomatic volunteers [203]) might benefit from SSRIs even though there is no evidence backing this hypothesis. However, any clinical research result will have this 'limitation'. Specific patients might benefit from any given intervention even though valid research results have shown that this intervention 'on average' is ineffective or even harmful. All trials were at high risk of bias per several bias risk domains and especially the risk of incomplete outcome data, selective outcome reporting, and insufficient blinding bias may bias our review results. Our GRADE assessments show that due to the high risks of bias the quality of the evidence must be regarded as very low. The high risks of bias question the validity of our meta-analysis results as high risk of bias trials tend to overestimate benefits and underestimate harms [194, 205]. The ‘true’ effect of SSRIs might not even be statistically significant.

We chose to include all SSRIs in our primary analysis. We did this to increase the statistical power and precision and to be able to compare the effects of the different SSRIs in subgroup analysis. Comparing the different SSRIs in test for subgroup differences did not show significant differences, indicating the effects (or lack of effects) of the different SSRIs are similar. Nevertheless, we cannot rule out that certain SSRIs may have beneficial or harmful effects that we have not identified in this review due to lack of relevant data. We identified very limited data on the effects of SSRIs on long-term outcomes, suicidal behaviour, and quality of life, so the effects of SSRIs on these outcomes are unclear. E.g., we only identified six trials assessing quality of life which substantially increase the risk of selective outcome reporting bias and thereby limit the validity of the meta-analysis result. Furthermore, the trialists did not use the same questionnaire. Quality of life is without question an outcome with great relevance to the patient and we urge future trialists to assess quality of life. However, any given quality of life questionnaire must be validated (shown to be correlated to, e.g., suicidal behaviour or other clinical events) before valid conclusions may be drawn based on this outcome. It must be shown that scores on a given questionnaire do reflect the actual ‘quality of life’. Valid consensus on choosing the optimal quality of life assessment method does not exist and this is a limitation of assessing quality of life in depressed patients. Our eight-step procedure used to assess if the thresholds for statistical and clinical significance are crossed, is based on generally accepted and validated methodology but the use of the eight-step procedure has not yet been validated in simulation studies or empirical studies [12, 13]. Even though the eight-step procedure has been used in several systematic reviews it is not universally accepted. This may be a limitation of our methodology.

The Committee for Medicinal Products for Human Use (CHMP) concluded”……… that, as no public health concerns have been identified, no regulatory action is necessary on the basis of Kirsch et al.'s findings” when the latter team questioned the benefits of antidepressants [182]. Per our results, we now believe that there is valid evidence for a public concern regarding the effects of SSRIs. We agree with Andrews et al. that that antidepressants seem to do more harm than good [206]. We have clearly shown that SSRIs significantly increase the risks of both serious and several non-serious adverse events. The observed harmful effects seem to outweigh the potential small beneficial clinical effects of SSRIs, if they exist. Our results confirm the findings from other studies questioning the effects of SSRIs [8, 207], but are in contrast to the results of other reviews concluding that SSRIs are effective interventions for depression [3, 6, 10, 208]. However, our present analyses represent the most comprehensive systematic review on the topic and we hope it may guide clinical practice.

Conclusions

SSRIs versus placebo seem to have statistically significant effects on depressive symptoms, but the clinical significance of these effects seems questionable and all trials were at high risk of bias. Furthermore, SSRIs versus placebo significantly increase the risk of both serious and non-serious adverse events. Our results show that the harmful effects of SSRIs versus placebo for major depressive disorder seem to outweigh any potentially small beneficial effects.