Background

Depression

According to the World Health Organization (WHO), 264 million people suffer from depression around the globe [1] and major depressive disorder has been estimated to be the third leading cause of years lived with disability in both sexes [2]. Major depressive disorder is, depending on the diagnostic system, characterised by the occurrence of depressed mood, loss of interest or pleasure, reduced energy or fatigue accompanied by other symptoms such as suicidal thoughts, sleep disturbances, psychomotor agitation or retardation and difficulty concentrating [3, 4]. With a 12-month prevalence of around 5.5% in high-income countries [5], major depressive disorder is a large economic burden due to decreased work productivity [6] and it significantly impairs quality of life [7, 8].

Antidepressants

Different classes of antidepressants are available for treatment of patients with major depressive disorder ranging from older antidepressants like mono-amine oxidase inhibitors (MAOI) and tri-cyclic antidepressants (TCA) to newer groups of drugs like selective serotonin reuptake inhibitors (SSRI) and serotonin-norepinephrine reuptake inhibitors (SNRI) as summarised in Table S1 (Additional file 1).

A report from the National Health and Nutrition Examination survey in the USA found an increase in the use of antidepressants from 7.7% in 1999–2002 to 12.7% in 2011–2014, wherein a quarter of those who took antidepressants had been using them for more than 10 years [9]. Whilst SSRIs remain the most commonly prescribed antidepressants, there has been a consistent increase in the prescription of other antidepressants such as duloxetine [10, 11].

Duloxetine

Duloxetine, a SNRI, is approved for the treatment of major depressive disorder in the USA and Europe [12, 13]. It is, additionally, approved for a number of other conditions such as generalized anxiety, diabetic neuropathic pain, and fibromyalgia and is among the top 50 most prescribed drugs in the USA with the number of yearly prescriptions exceeding 16,000,000 [14]. In vivo and in vitro studies indicate that duloxetine inhibits the presynaptic neuronal reuptake of the neurotransmitters serotonin and norepinephrine, leading to their greater availability at the neuronal junctions and potentiating their action in the central nervous system [15, 16]. Serotonin and norepinephrine have been suggested to be involved in the pathogenesis of major depressive disorder [17], and theoretically the antidepressant effects of duloxetine have been speculated to be mediated through antagonising the depletion of these two neurotransmitters in the brain [18]. However, the potential role of these and other neurotransmitters in the pathophysiology and treatment of major depressive disorder is unclear [19, 20].

Duloxetine has an average half-life of around 12 h and is metabolised mainly in the liver [21]. Duloxetine is administered orally at a starting dose for the treatment of major depression of 60 mg/day, potentially increased to a maximum dose of 120 mg/day [13]. The most commonly reported adverse effects are nausea, dry mouth, decreased appetite, excessive sweating and drowsiness, whereas the most serious adverse effects include hepatic failure, orthostatic hypotension leading to syncope and falls, suicidal ideation, serotonin syndrome and increased risk of bleeding [22].

Beneficial effects of duloxetine

Several previous reviews have shown that antidepressants seem to decrease depressive symptoms with a statistically significant effect [23, 24]. However, the effect is small and of uncertain clinical importance to patients [25]. A recent network meta-analysis including 23 trials on duloxetine reported for duloxetine versus placebo a standardised mean difference (SMD) of − 0.37 on depression scales. This was much lower than the empirically derived threshold of 0.875 SMD suggested by Moncrieff and Kirsch, corresponding to ‘minimal improvement’ on the Clinical Global Impressions-Improvement scale as well as lower than the less stringent criteria of 0.5 SMD, suggested by the National Institute of Clinical Excellence (NICE) in England [25,26,27]. More recently, Hengartner and Ploderl, reviewing both within patient and between patient anchor-based approaches, suggested that minimal important difference on 17-item Hamilton Depression Rating Scale (HDRS-17) is likely to be to be in the range of 3–5 points [28]. Whilst the ‘minimum clinically important difference’ on depression scales remains an area of debate with no consensus so far, the small statistically significant improvement in depressive symptoms with duloxetine must be weighed against the questionable clinical significance of the intervention whilst also taking harmful effects and costs into consideration.

Moreover, in many systematic reviews and meta-analyses including randomised placebo-controlled trials of duloxetine, remission and response defined as dichotomous outcomes were used as primary outcome measures and duloxetine was found to be superior compared with placebo (Table 1). However, dichotomisation of continuous scales to calculate response and remission has been criticised and might over-estimate the beneficial effects [25]. A decrease of only one point on the depression symptom severity scales can change categorisation of a trial participant from ‘non-remitter/non-responder’ to ‘remitter/responder’. It is therefore important to synthesise the evidence using the depression symptom severity scales without dichotomising the scores to assess the benefits associated with the use of antidepressants.

Table 1 Overview of previous reviews summarising benefits and/or harms of duloxetine versus placebo in participants with MDD

Harmful effects of duloxetine

In most reviews on duloxetine, adverse effects have not been sufficiently assessed. Instead, proxy measures like tolerability, acceptability and drop-outs due to adverse events have been used to assess safety profile of antidepressants compared with placebo [24, 35]. In other reviews, non-serious adverse events such as anticholinergic adverse effects, dizziness, nausea, sedation and hyperhidrosis have been frequently reported and discussed [29]. However, there is little to no information on more serious adverse events such as suicides or suicide attempts [29, 33, 34].

With regards to serious adverse events, some systematic reviews and meta-analyses report no increased risk of suicide or suicidal tendency with the use of antidepressants including duloxetine versus placebo in adult populations [37, 42], whilst others have observed an age-dependant increase in risk of suicidality [32, 44]. It is important to consider that these analyses suffered from incomplete reporting of adverse events in the included trials and limitations such as a lack of a pre-registered protocol [32, 44], no access to case-report forms [32] which are more likely to record adverse events in particular suicidal events as highlighted in other reviews [45] and low statistical power [42]. Moreover, some of the reviews on duloxetine were at risk of for-profit bias as the authors were employed at or the research was funded by the pharmaceutical industry [37, 42]. In one systematic analysis, Khan et al. examined safety data submitted to the U.S. Food and Drug Administration (FDA) during the period 1991–2013 for the approval of fourteen investigational antidepressants including duloxetine and reported a decline in suicide rates in the antidepressant groups of the clinical trials [46]. However, their analytical approach relied on patient-exposure years (PEY), which was deemed inappropriate by Hengartner and Ploderl [37]. They stressed that the risk of suicidal events is highest during the first few weeks of antidepressant use and this violates the constant-hazard requirement for a PEY analysis. Hengartner and Ploderl argued that this analytical approach can obscure the increased suicide risk associated with initiation of antidepressant use. They therefore reanalysed the data used by Khan et al. and found three times higher odds of suicide with the use of antidepressants [47]. The potential increase in suicide risk presented by Hengartner and Ploderl highlights the importance and need of evaluating adverse effects using appropriate methods.

A retrospective analysis of clinical study reports from 268 trials of drugs assessed by The German Institute for Quality and Efficiency in Health Care (IQWiG) between 2006 and Feb 2011 found that registry reports and publications were inferior to clinical study reports with regards to the outcome reporting, particularly of adverse events [48]. Another cross-sectional study of clinical trial registration summaries and their associated publications observed ambiguities and discrepancies in reporting of serious adverse events in journal articles and trial registration summaries [49]. A study comparing clinical study reports from nine randomised placebo-controlled trials of duloxetine with publicly available documents such as journal articles and results posted on trial registries found not only publication bias in favour of significant findings on efficacy analysis but also that information on serious adverse events was missing from journal articles and registry reports [50]. In addition, treatment emergent adverse effects were only reported in journal articles if the incidence was higher than a certain percentage, whereas information on discontinuation related adverse events was unavailable or vaguely reported if at all [50]. Another issue observed was that the coding of suicidality events from investigator reported adverse events resulted in inaccurate reporting of this information in clinical study reports as compared to the patient data [51]. Taken together, the evidence points to a need to extend the assessment of adverse events beyond published literature to get a more accurate assessment of benefits and harms associated with the use of antidepressants.

Evidence assessments of duloxetine for major depressive disorder

We searched PubMed and Google Scholar for existing evidence on duloxetine using the search terms ‘duloxetine’, ‘major depression’ and ‘systematic reviews’. We identified a total of 16 meta-analyses, overviews or systematic reviews including randomised clinical trials on duloxetine versus placebo as summarised in Table 1 [36, 39,40,41, 43]. We identified three reviews and meta-analyses that summarised the evidence on both benefits and harms of duloxetine and which also assessed the risk of bias in the included trials. Only one of these reviews met all the criteria outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist [52]. However, this review was of low generalisability as it focussed only on elderly participants and only included trials published in English [31]. Furthermore, the review included only three duloxetine trials and no duloxetine-related serious adverse events were reported or discussed in this review. Of the other two reviews, one focussed on older adults only [29], and one included only four duloxetine trials, did not publish a protocol and limited electronic searches to English language [29, 38]. None of the reviews evaluated duloxetine versus ‘active placebo’, i.e. an active substance with no anti-depressant effect, e.g. an antihistamine that mimics the adverse effects of duloxetine such as dizziness, dry mouth and nausea [20, 22].

We also searched for ongoing systematic reviews comparing duloxetine versus ‘active’ placebo, placebo or no intervention for the treatment of major depression in the international prospective register of systematic reviews PROSPERO. We only found one protocol for a systematic review comparing duloxetine versus placebo; it plans to include trials where duloxetine was used for a wide range of indications apart from major depressive disorder such as generalized anxiety disorder, fibromyalgia and diabetic peripheral neuropathic pain [53]. Other identified ongoing systematic reviews on duloxetine will only include head-to-head comparisons with other antidepressants [54,55,56] or involve indications other than major depression [57]. We identified no systematic reviews assessing the benefits and harms of duloxetine compared with ‘active’ placebo.

Thus, no former or presently planned review has systematically reviewed the beneficial and harmful effects of duloxetine taking into account both the risk of random errors and the risk of systematic errors in all randomised clinical trials on major depressive disorder [58]. Hence, we planned this systematic review to assess the beneficial and harmful effects of duloxetine versus ‘active’ placebo, placebo or no intervention in the treatment of major depressive disorder. This review will also contribute data to a larger project assessing the beneficial and harmful effects of all anti-depressants in patients with major depressive disorder [59].

Objectives

The objectives of this systematic review will be to assess the beneficial and harmful effects of duloxetine versus ‘active’ placebo, placebo or no intervention in adult participants with major depressive disorder.

Methods

The protocol meets the reporting standards outlined in the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) checklist (Additional file 2). The protocol was originally registered on PROSPERO in 2016, https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=53931; however, because of non-availability of funds, the review process never started. The current protocol represents an updated version of the original protocol following the project revival in January 2020. We guarantee that data extraction has not started at the time of protocol submission to Systematic Reviews.

Eligibility criteria

Trials

All randomised clinical trials comparing duloxetine with ‘active’ placebo, placebo or no intervention irrespective of publication type, publication status, publication year and language will be included. Quasi-randomised trials, e.g. trials using date of admission for allocating the participants, cluster randomised trials and observational studies will be excluded.

Participants

Adults as defined by the trialists with a primary diagnosis of major depressive disorder. The diagnosis of major depressive disorder must be based on one of the standardised criteria, from either International Classification of Diseases (ICD) 9, ICD 10 [4], ICD 11 [60], Diagnostic and Statistical Manual of mental disorders (DSM) III [61], DSM III-R [62], DSM IV, DSM IV-TR [63], DSM V [3] or Feighner criteria [64]. Trials exclusively including participants with a somatic disease and comorbid major depressive disorder and trials on major depressive disorder during or after pregnancy will be excluded as depression during or after pregnancy traditionally is investigated in separate trials; depression during or after pregnancy is theoretically influenced by hormonal changes and physical and psychological stress that may not be comparable to non-pregnant populations [65]. If only a subset of participants from a study is eligible, we will only include those that fulfil inclusion criteria provided data can be obtained for that specific group. We chose to include trials on adults only to avoid heterogeneity resulting from age of participants. Moreover, we focussed on major depressive disorders considering that it is a prevalent psychiatric disorder and a common indication for prescription of duloxetine [12].

Intervention

Duloxetine at any dose or duration.

Control

‘Active’ placebo, i.e. any active substance employed to mimic the adverse effects of taking duloxetine such as nausea, dry mouth, and dizziness.

Placebo, i.e. any ‘placebo’ substance containing no active substance.

No intervention, i.e. any control intervention with no treatment elements, e.g. ‘waiting list’. Our primary comparison of interest will be duloxetine versus ‘active placebo’. Secondarily, we will compare duloxetine versus placebo and no intervention, individually. We chose these comparisons as they represent real-life scenarios, e.g. placebo effect or effect of waiting for the treatment.

Co-interventions

Trials comparing duloxetine versus ‘active’ placebo, placebo or no intervention as add-on therapy to any other kind of intervention (e.g. treatment as usual or psychotherapy) will be included, but only if this co-intervention is described and delivered similarly in the intervention groups.

Outcomes

Primary outcomes

  • The difference between the mean values from the two intervention groups using the 17-item or the 21-item Hamilton Depression Rating Scale (HDRS) [66]. Where the 21-item scale is used, we will only include the result of the score based on the 17-item version.

  • The proportion of participants with one or more serious adverse events. We will use the International Conference on Harmonization of technical requirements for registration of pharmaceuticals for human use—Good Clinical Practice (ICH-GCP) definition of a serious adverse event, which is any untoward medical occurrence that resulted in death, was life-threatening, required hospitalisation or prolonging of existing hospitalisation and resulted in persistent or significant disability or jeopardised the participant [67]. If the trialists do not use the ICH-GCP definition, we will include the data if the trialists use the term ‘serious adverse event’. If the trialists do not use the ICH-GCP definition nor use the term serious adverse event, then we will also include the data if the event clearly fulfils the ICH-GCP definition for a serious adverse event.

Secondary outcomes

  • The proportion of participants with either a suicide or a suicide attempt (as defined by the trialists).

  • Quality of life (assessed with any valid continuous quality of life scale such as quality of life in depression scale, EQ-5D or any other scale used by the trialists).

  • Suicide ideation (assessed using, e.g. Columbia-Suicide Severity Rating Scale).

Exploratory outcomes

  • The SDM [66] between the two intervention groups including trials that use any form of HDRS, Montgomery-Asberg Depression Rating Scale (MADRS) [68] or Beck’s Depression Inventory (BDI) [69]. If the trialists report other scales in addition to HDRS, we will use HDRS-17 in this meta-analysis. If HDRS-17 is not reported, we will use HDRS-21 followed by HDRS-6. Similarly, if the trials report both MADRS and BDI, we will use MADRS in the meta-analysis. We will back-calculate mean difference on HDRS from the SDM.

  • The proportion of participants achieving response. We have defined response as a 50% reduction (from baseline) on either HDRS, MADRS or any other scale as used by trialists, in the stated order of preference.

  • The proportion of participants achieving remission. We have, pragmatically, defined remission as a HDRS less than 8, MADRS less than 10 and BDI less than 10 points, in the stated order of preference.

  • The proportion of participants with one or more adverse events not considered serious.

  • The serious adverse events individually as stated by the trialists.

  • The adverse events not considered serious individually as stated by the trialists.

We chose HDRS as the primary outcome in spite of its psychometric limitations as HDRS-17 is a commonly used assessment scale and recommended by international guidelines [70, 71]. Moreover, the minimal clinically important difference has been identified for HDRS-17 [26, 27].

Moreover, we do not intend to use SMD as the primary outcome as the underlying assumption as described in Cochrane Handbook for Systematic Reviews of Interventions is that ‘the differences in SDs among studies reflect differences in measurement scales and not real differences in variability among study populations. If in two trials the true effect (as measured by the difference in means) is identical, but the SDs are different, then the SMDs will be different. This may be problematic in some circumstances where real differences in variability between the participants in different studies are expected.’ [72]. We might observe variability in patients’ responses in these trials owing to the differences in inclusion criteria. For example, participants identified using different diagnostic criteria such as ICD 9 or DSM III that do not use operationalised criteria might differ from participants in other studies. Similarly, participants might differ in severity of depression at the time of inclusion, presence or absence of psychiatric co-morbidities or might come from different settings such as inpatient or outpatient departments.

Assessment time points

We will assess all outcomes at the end of treatment (our assessment time point of primary interest) as well as at maximum follow-up.

Search methods

We will search the Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE, EMBASE, PsycInfo, Science Citation Index Expanded, Social Sciences Citation Index (SSCI), Conference Proceedings Citation Index—Science (CPCI-S) and Conference Proceedings Citation Index—Social Science & Humanities (CPCI-SSH) (Additional file 3). We will also search Chinese databases (CNKI, Wanfang, VIP, Sinomed) and Google Scholar. We will search all databases from their inception to present. We will check relevant publications, e.g. included trials and systematic reviews, for relevant trials. To identify unpublished trials, we will search trials registers of pharmaceutical companies, the WHO trial registry, clinicaltrials.gov, including the websites of the FDA and the European Medicines Agency (EMA). Furthermore, we will request clinical study reports from FDA, EMA and national medicines agencies. We will contact trial authors to seek required information.

Screening of trials

Two of the review authors (FS and MB) will independently select relevant trials, based on criteria described in the above section. If a trial only has been identified by one of the two, it will be discussed whether the trial should be included. If the two review authors disagree, a third review author (JCJ) will decide if the trial should be included. All excluded trials assessed in full text will be entered on a list, stating the reason for exclusion.

Data extraction

Data will be extracted by two reviewers independently. The following data will be extracted from the included trials:

1. Trial: publication status, date of publication, year of study conduction/randomisation, duration of trial, trial design, for-profit funding of trial, NCT/EudraCT number.

2. Participants: mean age, sex distribution, number randomised to each comparison group, number analysed, number lost to follow-up, drug or alcohol dependence, chronically depressed or treatment resistant depression (any definition used by the trialists), baseline depression scores, comorbid psychiatric diagnoses, borderline personality disorder, inclusion and exclusion criteria.

3. Intervention: length of intervention period and follow-up period, dose of duloxetine, dosing schedule, co-interventions such as psychotherapy or electroconvulsive therapy, whether the experimental intervention is an add-on therapy on other antidepressants, placebo washout period, choice of control (‘active’ placebo, placebo or no intervention).

4. Outcomes: primary and secondary outcomes (e.g. HDRS scores, BDI scores, number of suicides), type of outcome reported (e.g. change in scores, post-intervention scores), mean, standard deviation (SD) and number analysed for all continuous outcomes, number of events and number analysed for dichotomous outcomes, method of data collection for adverse effects, i.e. active monitoring or spontaneous report monitoring.

5. Others: author’s affiliations, an evaluation of the bias risk and choice of method (see below).

Risk of systematic error (bias)

Two review authors will assess risk of bias in the included trials independent of each other using Cochrane’s risk of bias tool version 2 (RoB 2) [73]. The risk of bias assessment will be made for each outcome as well as overall risk of bias for the trial. We will evaluate the methodology to identify bias resulting from the randomisation process, deviation from the intended interventions, missing outcome data, measurement of the outcome as well as the selective reporting of results. We will classify the trials according to the components below as summarised in the RoB 2 guidance document (Table 2) [74].

Table 2 RoB 2 guidelines on risk of bias assessment

Overall assessment of risk of bias

Low risk of bias: The study is judged to be at low risk of bias for all domains for this result. Some concerns: The study is judged to be at some concerns in at least one domain for this result. High risk of bias: The study is judged to be at high risk of bias in at least one domain for this result OR the study is judged to have some concerns for multiple domains in a way that substantially lowers confidence in the result.

For our purposes, we will combine some concerns and high risk of bias judgements so that in our overall assessment of risk of bias, we will classify trials either to be at overall low risk of bias or at overall high risk of bias.

Assessment of publication bias and for-profit bias

On all outcomes, we will create and inspect a funnel plot to assess possible small-study biases if ten or more trials are included, unless the trials are of similar size. For dichotomous outcomes, we will test asymmetry with the Harbord test if τ2 is less than 0.1 and with the Rücker test if τ2 is more than 0.1. For continuous outcomes, we will use the regression asymmetry test [75] and the adjusted rank correlation [76].

We will account for for-profit interests under publication bias in the GRADE assessment. Trials initiated, conducted or funded by pharmaceutical industry as well as the trials with any of the authors affiliated with the industry or where authors received grants from industry (self-reported in the article) will be considered at risk of for-profit interests [77]. We will downgrade for for-profit influence if the subgroup analysis according to risk of for-profit interests (see below) shows a difference between the intervention groups.

Differences between the protocol and the review

The review will be conducted in accordance with this protocol. Deviations from the protocol, if any, will be reported in the systematic review under the section ‘Differences between the protocol and the review’.

Statistical methods

Data will be meta-analysed using statistical software STATA 16.1 (StataCorp 2019. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC). We will undertake meta-analysis according to the recommendations stated in the Cochrane Handbook for Systematic Reviews of Interventions and the eight-step assessment suggested by Jakobsen et al. [58]. When analysing continuous outcomes, we will calculate mean differences (MDs) with 95% confidence intervals (CIs). We will use the Sidik-Jonkman model for random-effects meta-analysis [78]. We will also use the SMD with a 95% CI to analyse the results when different scales have been used. We will pool trials reporting change in scores and post intervention scores for mean difference; however, they will not be pooled for SMD [72]. We will also calculate trial sequential analysis-adjusted CIs (see below). When analysing dichotomous outcomes, we will calculate risk ratios (RRs) with 95% CI as well as the trial sequential analysis-adjusted CIs (see below). For rare outcomes such as adverse effects, we will use binomial regression analysis [79].

Intervention effects will be assessed by both random-effects model meta-analyses and fixed-effect model meta-analyses and we will use the more conservative point estimate of the two [72]. The more conservative point estimate is the estimate with the highest P value. We plan to assess a total of five primary and secondary outcome therefore we will consider P ≤ 0.016 as statistically significant [58]. We will investigate possible heterogeneity through subgroup analyses. We will use the eight-step procedure to assess if the thresholds for significance are crossed [58]. Our primary conclusion will be based on trials at overall low risk of bias. Where multiple trial arms are reported in a single trial, we will include only the relevant arms. For trials with multiple intervention arms, we will correspondingly divide the control group. If quantitative synthesis is not appropriate due to considerable heterogeneity or a small number of included trials, we will report the results in a descriptive way.

Although there is no current consensus on the issue, the National Institute for Clinical Excellence (NICE) of the National Health Service in England has formerly defined a threshold for clinical significance for major depressive disorder as an effect size of 0.50 SMD or a drug-placebo difference of three points on the 17-item HDRS [27]. Others have suggested and used the following ‘rule of thumb’: 0.2 SMD represents a small effect, 0.5 SMD a moderate effect and 0.8 SMD a large effect [72, 80]. We have chosen, as NICE has formerly recommended and other reviewers have chosen [27, 81, 82], a drug-placebo difference of three points on the 17-item HDRS (for our primary outcome) or an effect size of 0.50 SMD (for our exploratory outcome) as the threshold for clinical significance. This is in line with findings from a recent review, suggesting that the most likely minimal important difference on the HDRS-17 is between 3 and 5 points [28].

To control the risk of type I and type II errors, we will use Trial Sequential Analyses. We will perform trial sequential analyses on all the outcomes [83,84,85], in order to calculate the diversity-adjusted required information size (that is, the number of participants needed in a meta-analysis to detect or reject a certain intervention effect) and the cumulative Z-curve’s breach of relevant trial sequential monitoring boundaries. A more detailed description of trial sequential analysis can be found at http://www.ctu.dk/tsa [84]. For continuous outcomes, trial sequential analysis will use the empirical SD, a mean difference of three points on the Hamilton Depression Rating Scale (17 or 21 item) and the observed SD/2 when other depression scales or quality of life scales are used, an alpha of 1.7%, a beta of 10% and adjustment for the observed diversity. For dichotomous outcomes, trial sequential analysis will use the proportion of participants with an outcome in the control group, a relative risk reduction of 25%, an alpha of 1.7% for primary outcomes, a beta of 10% and adjustment for the observed diversity of the trials in the meta-analysis.

Missing outcomes

We will use intention-to-treat data if reported by the trialists [86]. If intention-to-treat data are not reported, we will use the data as reported by the trialists. We will, as the first option, contact all trial authors to obtain any relevant missing data (i.e. for data extraction and for assessment of risk of bias, as specified above).

Dichotomous outcomes: we will not impute missing values for any outcomes in our primary analysis. In our sensitivity analyses (see paragraph below), we will impute data.

Continuous outcomes: we will primarily analyse scores assessed at single time points (end scores). If only changes from baseline scores are reported, we will analyse the results together with end scores [72]. If SDs are not reported, we will calculate the SDs using trial data, if possible. We will not use intention-to-treat data if the original report did not contain such data. We will not impute missing values for any outcomes in our primary analysis. In our sensitivity analysis (see paragraph below) for continuous outcomes, we will impute data.

Sensitivity analyses

To assess the potential impact of the missing data for dichotomous outcomes, we will perform the following two sensitivity analyses on both the primary and the secondary dichotomous outcomes. We will present results of both scenarios in our review.

  • ‘Best-worst-case’ scenario: we will assume that all participants lost to follow-up in the antidepressant group had a beneficial outcome, i.e. survived, had no serious adverse events, had no suicides or suicide attempts and had no non-serious adverse events, and that all those participants lost to follow-up in the control group had a harmful outcome, i.e. did not survive, had a serious adverse event, died by suicide or had a suicide attempt and had a non-serious adverse event.

  • ‘Worst-best-case’ scenario: we will assume that all participants lost to follow-up in the antidepressant group had a harmful outcome, i.e. did not survive, had a serious adverse event, died by suicide or had a suicide attempt and had a non-serious adverse event, and that all those participants lost to follow-up in the control group had a beneficial outcome, i.e. survived, had no serious adverse events, had no suicides or suicide attempts and had no non-serious adverse events. When analysing continuous outcomes like depressive symptoms and quality of life, a ‘beneficial outcome’ will be reduction in depression scores and increase in quality of life scale and will be calculated as group (intervention or control) mean plus two SDs (we will secondly use one SD in another sensitivity analysis) of the group mean. Similarly, ‘harmful outcome’ will be increase in depression scores and decrease in quality of life scale and will be calculated as the group mean minus two SDs (we will secondly use one SD in another sensitivity analysis) of the group mean [68]. This data imputation with 2 SDs will provide a possible range of influence that missing data might have on the results [87]. To assess the potential impact of missing data for continuous outcomes, we will perform the following sensitivity analysis:

  • Where SDs are missing and it is not possible to calculate them, we will impute SDs from trials with similar populations and low risk of bias. If we find no such trials, we will impute SDs from trials with a similar population. As the final option, we will impute SDs from all trials.

  • We will perform sensitivity analysis to assess the effect of using ICH-GCP definition of serious adverse events.

We will present results of these scenarios in our review. Other post hoc sensitivity analyses might be warranted if unexpected clinical or statistical heterogeneity is identified during the analysis of the review results.

Assessment of heterogeneity

We will primarily investigate forest plots to visually assess any sign of heterogeneity. We will secondly assess the presence of statistical heterogeneity by chi2 test (threshold P < 0.10) and measure the quantities of heterogeneity by the I2 statistic. We will investigate possible heterogeneity through subgroup analyses. According to the Cochrane Handbook, I2 statistic above 50% will be regarded as substantial heterogeneity and we may ultimately decide that a meta-analysis should be avoided [72].

Subgroup analyses

We have planned the following subgroup analyses on all the outcomes:

  1. 1.

    Whether the intervention effects from trials at overall low risk of bias (or lower risk of bias) differ from the trials at overall high risk of bias as it can potentially over-estimate beneficial effects or bias the estimates for harmful effects towards the null.

  2. 2.

    Whether the intervention effects from the trials using ‘active’ placebo, placebo, or no intervention differ.

  3. 3.

    Whether the results from trials using a placebo washout period before inclusion differ from the remaining trials.

  4. 4.

    Whether the intervention effects of duloxetine differ in trials at low risks of for-profit interests compared to trials at high risks of for-profit interests [77].

  5. 5.

    Whether the intervention effects from trials assessing the effects of duloxetine in elderly depressive participants (defined by the trialists but often adults ≥ 65 years) differ from the remaining trials.

  6. 6.

    Whether the intervention effects from trials with participants with a baseline HDRS score of 23 or above differ from the remaining trials as intervention effect might vary depending upon baseline scores.

  7. 7.

    Whether the intervention effects from trials assessing the effects of duloxetine in chronically depressive patients or treatment resistant depression differ from the remaining trials.

  8. 8.

    Whether the intervention effect differ by duration of treatment, i.e. the trials with duration of treatment below 6 weeks, between 6 and 12 weeks and above 12 weeks. Since the intervention duration could be an important determinant of intervention effect. If a trial reports multiple time-points within these groups, we will use the longest time period.

  9. 9.

    Duloxetine below or equal to median dose compared to above median dose.

  10. 10.

    Whether the intervention effect differ depending upon the scale used in the trial, i.e. HDRS, MADRS or BDI.

GRADE

We will assess the certainty of evidence of all outcomes using GRADE (Grading of Recommendations Assessment, Development and Evaluation) tool. Cochrane Handbook for Systematic Reviews of Interventions (Chapter 8: Section 8.5 and Chapter 12) will be followed for GRADE evaluation using the GRADEpro software [72]. We will use the five GRADE domains (bias risk of the trials, consistency of effect, imprecision, indirectness and publication bias) to assess the quality of a body of evidence. Imprecision will be assessed using trial sequential analysis [58]. We will downgrade imprecision in GRADE by two levels if the accrued number of participants is below 50% of the diversity-adjusted required information size (DARIS), and one level if between 50% and 100% of DARIS. We will not downgrade if the cumulative Z-curve crosses the monitoring boundaries for benefit, harm or futility, or DARIS is reached [88]. The findings for primary outcomes will be presented in a summary of findings table where each GRADE domain will be presented for trials contributing data to the meta-analyses for the prespecified outcomes [58, 89]. We will justify all decisions when downgrading the certainty of evidence using footnotes, and we will add comments to aid the reader’s understanding of the review where necessary.

Discussion

One major strength of this protocol is that we aim to compare benefits and harmful effects of duloxetine versus ‘active’ placebo, placebo or no intervention in adult participants with major depressive disorder. This is a strength as few earlier reviews have addressed both harms and benefits, and serious adverse events have not been sufficiently analysed in these reviews as demonstrated in our ‘Background’ section. Considering that the use of antidepressants is associated with several short-term and long-term adverse effects, it is critical to review available evidence and to establish if harms outweigh the benefits associated with the use of antidepressants.

Another strength of this protocol is its methodological approach. We will follow the recommendations outlined in the Cochrane Handbook for Systematic Reviews of Interventions [72]. We will use the eight-step assessment suggested by Jakobsen et al. [58], trial sequential analysis [84] and the GRADE assessment of the certainty of evidence [89] to assess clinical significance of our findings as well as to address the risks of random and systematic errors and to establish the quality of evidence.

The primary limitation of our systematic review is the potential for heterogeneity as a result of methodological variability in the included trials. To minimise this limitation, we will carefully look for signs of heterogeneity and ultimately decide if data ought to be pooled and meta-analysed, and we have planned several subgroup analyses.

Another limitation is the large number of comparisons which increases the risk of type 1 error. We have adjusted our thresholds for significance according to the number of primary and secondary outcomes, but we have not adjusted our thresholds for significance according to the number of subgroup analyses.

Another potential limitation is the insufficiency of adverse effect reporting in the published literature [90, 91]. To address that, we will request clinical study reports from FDA, EMA, other national medicines agencies as well as from the pharmaceutical companies as they are likely to contain more information on adverse effects compared to trial registries and published articles.

In a similar vein, we have decided only to include randomised clinical trials and exclude quasi-randomised studies and observational studies. Through these decisions, we run the risks of overlooking late as well as rare adverse effects. If we find benefits of duloxetine that is not overpowered by adverse events in the randomised clinical trials, someone needs to the assess the risks of adverse events according to quasi-randomised trials and observational studies [92].