Bipolar disorder is a chronic mood disorder characterized by episodes of mania (or hypomania), depression, and periods with normal mood (euthymia) [1]. Two forms of bipolar disorder are bipolar I disorder, which requires a history of at least one manic episode, and bipolar II disorder, which requires a history of at least 1 hypomanic episode and one major depressive episode [1]. The estimated yearly and lifetime prevalence for bipolar I- II disorders is 2.8 and 4.4% respectively [2, 3]. In 2015, the cost of bipolar I disorder in the United States was estimated at $81,559 per patient, made up predominately of unemployment (36%), caregiving (25%), and direct healthcare (23%) costs [4].

In bipolar I disorder, depression is the predominant abnormal mood state [5], with depressive symptoms being 3 times more common than manic symptoms [6]. Bipolar depression, but not bipolar mania, is associated with increased rates of unemployment [7] and among those who are employed, bipolar depression is associated with more absenteeism, presenteeism, and total lost workdays than bipolar mania [7, 8]. In addition suicide risk is substantially higher during depressive episodes than manic episodes [9].

The recommended pharmacological treatments for bipolar disorder vary depending on the phase of the disorder (acute depression, acute mania, or maintenance). Pharmacological treatments used for acute bipolar depression come from multiple different classes including atypical antipsychotics, anticonvulsants, lithium, and antidepressants.

The US Food and Drug Administration (FDA) has approved four medications, all atypical antipsychotics, for the treatment of bipolar depression: quetiapine, lurasidone, cariprazine, and the combination of olanzapine with fluoxetine. Lurasidone is the only available atypical antipsychotic indicated for treatment of bipolar depression in combination with lithium or valproate. Lurasidone, as monotherapy and in combination with lithium or valproate, and quetiapine are recommended as first-line treatments for the management of acute bipolar depression in the 2018 Canadian Network for Mood and Anxiety Treatment (CANMAT) and International Society for Bipolar Disorders (ISBD) guidelines. The CANMAT/ISBD guidelines recommend cariprazine and olanzapine-fluoxetine for second-line treatment. The use of antidepressants to treat bipolar depression remains controversial due to limited evidence for efficacy and the potential risk of switching into mania or inducing rapid cycling [10, 11].

Several meta-analyses [12,13,14] and network meta-analyses (NMAs) of atypical antipsychotics in bipolar depression have been conducted [15, 16]. NMAs allow for comparisons to be made between drugs that have a common comparator, such as placebo, even though no head-to-head trials have been conducted. While the methods and study inclusion criteria have varied, the meta-analyses have consistently reported that quetiapine, olanzapine, and lurasidone are more efficacious than placebo. Other atypical antipsychotics monotherapies have not consistently shown superior efficacy compared to placebo [12,13,14,15,16]. Only one prior NMA evaluated tolerability outcomes and found differences among the atypical antipsychotics [15]. Lurasidone was associated with significantly less weight gain than quetiapine and olanzapine and significantly less somnolence than quetiapine and ziprasidone. There were no significant differences observed in the rates of all-cause discontinuation between lurasidone and other atypical antipsychotics [15].

Since these NMAs were conducted, additional clinical trials of atypical antipsychotic monotherapies have been completed. The objective of this study was to update a recent NMA [15] to better understand the relative efficacy, safety and tolerability of currently available atypical antipsychotics approved for the treatment of bipolar depression.


Systematic literature review

A systematic literature review was conducted to identify randomized controlled trials of atypical antipsychotic monotherapy in bipolar depression. A protocol was developed for this review and NMA and can be found in Additional file 1, however it was not registered. The amendments to the protocol are listed in Additional file 1 section of protocol. The PRISMA 2020 expanded statement for reporting of systematic reviews incorporating network meta-analyses can be found in the Additional file 1: Table 1 [17]. The most recent bipolar depression NMA identified trials that were completed prior to May 2015 [15]. This update included searches in Embase, MEDLINE, Cochrane Library, and PsycINFO for studies published between May 2015 and 04 May 2020. Eligibility criteria for trial inclusion remained consistent with the previous NMA [15], which was developed using the Patient, Intervention, Comparator, Outcome, and Study type (PICOS) paradigm to minimize bias and identify as many relevant studies as possible [18]. To be included in this NMA, studies must have been double-blinded randomized controlled trials (RCTs) of adults (≥18 years old), with either bipolar I disorder or bipolar II disorder (at least 50% with bipolar I disorder), treated with an atypical antipsychotic as monotherapy, and reported at least one outcome of interest at study endpoint of week 8 or less (Table 1). The exact search terms and resulting number of records returned are reported in Additional file 1: Table 2a. In addition, conference abstracts were reviewed for the 2019–2020 meetings of 11 psychiatry professional organizations to identify secondary publications to supplement already included studies that were published in peer reviewed journals (Additional file 1: Table 2b). A clinical trial registry ( was searched for additional studies on 04 May 2020 and during the data extraction process. The registry was searched using keywords related to the interventions listed in the PICOS and disease terms related to bipolar I disorder. Study investigators, authors and companies were not contacted for additional data and only the published data was used for this review.

Table 1 Study Inclusion and Exclusion Criteria

Study selection

All references/publications were screened based on title and abstract against the inclusion and exclusion criteria by two independent reviewers and discrepancies were resolved by a third reviewer. The full text of publications retained during abstract review were again reviewed by two independent reviewers and similarly discrepancies resolved by a third reviewer. Following best practices, data extraction from selected studies was conducted by two independent researchers with results cross-checked to ensure accuracy.

Outcome variables

The primary efficacy outcome was change from baseline in the MADRS total score reported week 8 or before. In addition, change in CGI-BP-S-overall and CGI-BP-S-depression scores, response rate (≥ 50% improvement from baseline in MADRS), and remission rate (MADRS ≤12 and ≤ 10 at endpoint) were also examined. Discontinuation outcomes included all-cause discontinuation, discontinuation due to adverse events, and discontinuations due to lack of efficacy. Metabolic outcomes included change in weight, rate of ≥7% weight gain, changes in triglycerides, total cholesterol, low-density lipoprotein (LDL) cholesterol, glucose, and prolactin. Additional tolerability outcomes included rates of investigator reported somnolence, extrapyramidal symptoms (EPS), akathisia, and switch to mania.

Missing standard errors (SEs) for continuous outcomes were estimated from reported standard deviations, 95% confidence intervals, p values or standard errors (reported for baseline and endpoint values) [18].

Network meta-analysis methods

The NMA was conducted according to guidance published by National Institute for Health and Care Excellence’s Decision Support Unit [19] and the International Society of Pharmacoeconomics and Outcomes Research Task Force on Indirect Treatment Comparisons [20]. When available, results from mixed models for repeated measures (MMRM) were favored over those using the last observational carried forward method to handle missing data. For trials with multiple fixed dose arms, the results were pooled across dose, which is consistent with methods used in several past meta-analyses [15, 16]. Network diagrams were drawn for each outcome and can be found in Additional file 1: Figure 1a-d.

The NMA was conducted with a Bayesian framework using OpenBUGS v3.2.3 (OpenBUGS Foundation) and R 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria), following codes provided in the NICE Decision Support Unit (DSU) Technical Support Document 2 (TSD2) [21]. The methodology also followed guidance from the ISPOR Task Force on Indirect Treatment Comparisons [20, 22]. In addition, correlations induced by multi-arm trials were taken into account using the methods and codes recommended in the DSU TSD2 [21]. Continuous variables were modeled assuming an identity link and a normal distribution. Dichotomous variables were modeled using a logit link and binomial distribution. Results for continuous variables were reported as the difference in change from baseline and dichotomous variables were reported as odds ratios (OR). All results were reported along with the 95% credible intervals, the Bayesian analogue to confidence intervals. Treatments were ranked using the surface under the curve cumulative ranking (SUCRA) probabilities [23]. This ranking hierarchy was obtained by ordering the effects from the most to least effective (or tolerable) treatment in comparison to placebo. The base case models were fit with random effects models, which estimate additional variance parameters associated with study heterogeneity and generally have larger credible intervals than fixed effects models.

Number needed to treat or harm

The number needed to treat, and number needed to harm were computed using risk differences derived from the network meta-analysis results following methods described in the Cochrane Handbook for Systematic Reviews of Interventions Version 6.1.0 [18, 24]. NNT and NNH values were calculated by taking the reciprocal of the absolute values of the risk difference [25, 26]. NNT and NNH were rounded up to the next whole number. NNT is a measure of effect size that indicates how many patients would need to be treated with the medication of interest instead of a comparator (i.e., placebo in the case of all trials included in the present study) for a single patient to benefit. Lower NNT values represent superior performance of the treatment of interest on a given outcome. NNH is similar to NNT but measures the number of patients who would likely need to be administered a treatment in order for a single patient to encounter the adverse event. Higher values for NNH represent better performance (i.e., a greater number of patients are likely to be treated before a single patient experiences an adverse event).

Sensitivity analyses

We examined the impact of pooled vs. stratified doses of each atypical antipsychotics as well as restricted the data at the 6-week timepoint (i.e., data reported at other timepoints was removed.

Quality of evidence and heterogeneity

The Grading of Recommendation, Assessment, Development and Evaluation (GRADE) approach for NMA was used to evaluate the confidence of evidence [27, 28]. The evidence was evaluated for each direct comparison within each network separately, and since there were no closed loops, an evaluation of indirect comparison was not applicable. Ratings were based on the following domains: study design, risk of bias, inconsistency, indirectness, imprecision, and publication bias [28]. Risk of bias was assessed using the Cochrane Risk of Bias version 2 (RoB2) [24, 29]. Publication bias was assessed by comparison-adjusted funnel plots, with tests for asymmetry applied to cases with > = 10 studies [30].

The transitivity assumption of NMA was evaluated by comparing the distribution of potential effect modifies across clinical trials that were included in the NMA, to ensure that their populations were suitably comparable. The following baseline patient characteristics were examined: age, gender, body weight, BMI, percent with bipolar I disorder, baseline MADRS, baseline CGI-BP-S depression score, and age of bipolar onset. In the NMA random effects models, a common heterogeneity parameter across the various treatment comparisons was assumed, and heterogeneity was assessed by the between-study variance τ2 (tau squared) for each outcome, and further characterized by comparing with its predictive distribution [31]. Since there were no closed loops in the NMA, assessment of incoherence was not applicable to this analysis.


Literature review update

The update to the systematic literature review identified 1791 records in EMBASE, MEDLINE, the Cochrane Library, and PsycINFO for screening (Fig. 1). A total of 111 full-text articles and secondary conference abstracts were reviewed with 17 records meeting all the inclusion criteria. In addition, 6 more records were identified from searches of the conference abstracts. These 17 records reflected 10 unique RCTs, 4 of which [32,33,34,35] had not been included in the earlier NMA [15]. Additionally, the results from one RCT that had been included in the earlier NMA based on a conference presentation has since been published [36]. Study results which were published only in conference abstracts and not in a peer-reviewed manuscript were excluded. A total of 18 trials from 25 references were included in the NMA [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47]. Figure 1 gives the PRISMA flow diagram showing the reasons for record exclusion.

Fig. 1
figure 1

PRISMA flow diagram

Study characteristics

All studies were multi-site, randomized, double-blind, and placebo-controlled clinical trials. There were no head-to-head trials of atypical antipsychotics. Most trials were multi-national, but eight recruited only from sites in the United States (US) [35, 37, 39, 41,42,43] and two recruited only from sites in the People’s Republic of China [36, 46]. Most trials lasted 8 weeks in duration, but six studies [33, 34, 38, 39, 44] lasted only 6 weeks in duration.

Table 2 gives the baseline characteristics used in the assessment of heterogeneity for each of the included RCTs. The study populations were similar in terms of mean age (29.2–43.6 years), gender (34.3–48.1% male) distribution, mean baseline MADRS score (26.9–32.0), and mean baseline CGI-BP-S-overall (4.2–4.5) and depression (4.3–4.9) scores. In the quetiapine trials [36, 37, 40, 41, 43, 47], as few as 50.9% of patients were diagnosed with bipolar I disorder (the remainder were diagnosed with bipolar II disorder), whereas in the other trials 100% of patients were diagnosed with bipolar I disorder. Mean baseline body weight was not reported across all studies, but in 11 studies [32,33,34, 36, 38, 40,41,42, 46, 47] where it was reported, baseline body weight ranged from 63.9 to 88.8 kg. Mean age of onset was only reported in three trials [32, 38, 44], and ranged from 25.4 to 28.4 years.

Table 2 Study Design and Patient Baseline Characteristics

Efficacy measures

Lurasidone, olanzapine, quetiapine, and cariprazine (but not aripiprazole or ziprasidone) were all significantly more efficacious than placebo in change from baseline in MADRS but there was a larger magnitude of change for lurasidone, quetiapine and olanzapine versus placebo (Table 3). According to SUCRA rankings, lurasidone, olanzapine and quetiapine ranked first for change in MADRS score versus placebo followed by cariprazine, ziprasidone and aripiprazole (Table 14). In pairwise comparison for change from baseline in MADRS, lurasidone was similar to cariprazine, olanzapine, and quetiapine, and was significantly better than aripiprazole and ziprasidone (Table 3).

Table 3 Change from Baseline in MADRS and Odds Ratio for Response (≥ 50% improvement in MADRS)

For response (≥50% improvement in MADRS score), lurasidone, quetiapine, olanzapine and cariprazine (but not aripiprazole and ziprasidone) were associated with significantly greater odds of response compared to placebo (Table 3). According to SUCRA rankings lurasidone ranked the first in terms of response followed by quetiapine, olanzapine, cariprazine, aripiprazole and ziprasidone as compared to placebo (Table 14). In pairwise comparison for response lurasidone had significantly greater odds of response than cariprazine, aripiprazole, and ziprasidone (Table 3).

For change in CGI-BP-S-overall score, lurasidone, cariprazine and quetiapine were significantly better than placebo, but olanzapine and ziprasidone were not. According to SUCRA rankings lurasidone ranked first in improving the overall CGI-BP-S score followed by quetiapine, olanzapine, cariprazine and ziprasidone (Table 14). In pairwise comparison for change in CGI-BP-S-overall score, lurasidone was associated with a significantly larger improvement than cariprazine and ziprasidone but showed similar improvements to quetiapine and olanzapine. Quetiapine was significantly better than cariprazine and ziprasidone in improving the overall severity assessed by CGI-BP-S (Table 4).

Table 4 Change from Baseline in CGI-BP-S-depression and CGI-BP-S-overall

For studies reporting on the change in CGI-BP-S-depression score, lurasidone was significantly better than placebo, but olanzapine and aripiprazole were not (Table 4). No studies reporting on CGI-BP-S-depression score were identified for cariprazine, quetiapine, and ziprasidone. Lurasidone ranked first in improving the CGI-BP-S-depression score followed by olanzapine and aripiprazole (Table 14).

The odds of remission (MADRS ≤12) were significantly greater for lurasidone, quetiapine, and olanzapine compared to placebo, but not ziprasidone (Table 5). The most effective treatment in improving remission rates as per the SUCRA rankings was lurasidone, followed by quetiapine, olanzapine ad ziprasidone (Table 14). In pairwise comparison for remission, lurasidone had greater odds of remission compared to ziprasidone and quetiapine had greater odds of remission compared to olanzapine as well as ziprasidone. Clinical trials for cariprazine defined remission as MADRS ≤10 at endpoint, and data for this definition was available only for lurasidone. Both lurasidone and cariprazine had significantly greater odds of remission compared to placebo (Table 5); and lurasidone ranked higher than cariprazine according to SUCRA rankings (Table 14).

Table 5 Odds Ratios for Remission as MADRS ≤12 and Remission as MADRS ≤10

Discontinuation rates

All-cause discontinuation for lurasidone, olanzapine, cariprazine and, ziprasidone, were comparable to placebo. However, the odds of all-cause discontinuation for aripiprazole were significantly higher than placebo (Table 6). Based on SUCRA ranking, olanzapine and quetiapine ranked the best tolerated treatments in terms of all-cause discontinuation followed by cariprazine, lurasidone, ziprasidone and aripiprazole (Table 14). For discontinuations due to adverse events, lurasidone, cariprazine, olanzapine and ziprasidone had similar odds compared to placebo. Aripiprazole and quetiapine had significantly higher odds of discontinuation due to adverse events compared to placebo. According to SUCRA ranking lurasidone and olanzapine ranked first in terms of discontinuation due to adverse events followed by cariprazine, ziprasidone, aripiprazole and quetiapine. In pairwise comparison, for discontinuations due to adverse events there were no significant differences between the atypical antipsychotics (Table 6). For discontinuation due to lack of efficacy, quetiapine and olanzapine had a significantly lower odds of discontinuation than placebo (Table 7). Based on SUCRA values, quetiapine ranked first in terms of discontinuation due to a lack of efficacy followed by olanzapine, cariprazine, lurasidone, aripiprazole and ziprasidone (Table 14). No significant differences were found for discontinuation due to lack of efficacy between placebo, and lurasidone, cariprazine, aripiprazole, and ziprasidone (Table 7).

Table 6 Odds Ratios for All-Cause Discontinuation and Discontinuation Due to Adverse Events
Table 7 Odds Ratios for Discontinuation Due to Lack of Efficacy

Metabolic parameters measures

All atypical antipsychotics except lurasidone and aripiprazole were associated with significantly more weight gain than placebo. During the short-term trials, olanzapine had the largest mean weight gain relative to placebo (2.88 kg), followed by quetiapine (1.17 kg), cariprazine (0.65 kg), lurasidone (0.34 kg), and aripiprazole (0.20 kg). Olanzapine had significantly greater weight gain than all other antipsychotics and quetiapine had significantly greater weight gain than lurasidone and aripiprazole (Table 8). The SUCRA rankings further confirm that aripiprazole and lurasidone were most likely to be associated with smaller changes in weight from baseline followed by cariprazine, quetiapine and olanzapine (Table 14). All atypical antipsychotics except lurasidone and aripiprazole were associated with significantly greater odds of 7% weight gain versus placebo. In pairwise comparison for clinically significant weight gain (≥7%) olanzapine had significantly greater odds of weight gain compared to quetiapine, cariprazine and aripiprazole (Table 8). Descriptively, the rates of clinically significant weight gain were the lowest for lurasidone (2.4%) followed by cariprazine (3.2%), aripiprazole (4.7%)), quetiapine (6.9%)), and olanzapine (20.3%).

Table 8 Change from Baseline in Weight (kg) and Odds Ratios of ≥7% Weight Gain

There were no significant differences for change in total cholesterol among the atypical antipsychotics, but olanzapine was associated with greater increases in total cholesterol than placebo (Table 9). There were no significant differences from placebo or between atypical antipsychotics for change in triglycerides (Table 9), LDL (Table 10) or glucose (Table 10), during these acute trials.

Table 9 Change from Baseline in Triglycerides and Total Cholesterol
Table 10 Change from Baseline in Low-Density Lipoprotein Cholesterol and Glucose

Other tolerability measures

Lurasidone was associated with higher changes in prolactin than placebo, aripiprazole, and quetiapine; and cariprazine was also associated with higher changes in prolactin than placebo (Table 11). According to SUCRA ranking for changes in prolactin aripiprazole ranked first followed by quetiapine, cariprazine and lurasidone (Table 14). All antipsychotics except lurasidone, cariprazine, and aripiprazole had greater somnolence than placebo, with quetiapine having greater somnolence than all antipsychotics except ziprasidone (Table 12). On SUCRA analysis, lurasidone ranked the best tolerated option in terms of somnolence followed by cariprazine, aripiprazole, olanzapine, quetiapine and ziprasidone (Table 14). Switch to mania for all antipsychotics was comparable to placebo, except for quetiapine which had a significantly lower odds of switching. Quetiapine also had lower odds of switch to mania compared with aripiprazole (Table 12) and ranked the best according to SUCRA values (Table 14). Rates of EPS were higher than placebo for lurasidone, quetiapine, and cariprazine, but there were no significant differences among the atypical antipsychotics (Table 13). Aripiprazole and cariprazine ranked the best tolerated options in terms of EPS followed by quetiapine and ziprasidone (Table 14). Among the atypical antipsychotics where data on akathisia was reported (aripiprazole, quetiapine, and lurasidone), odds were higher than placebo (Table 13). According to SUCRA rankings, cariprazine and lurasidone ranked the best tolerated options with fewer akathisia rates followed by aripiprazole (Table 14).

Table 11 Change from Baseline in Prolactin
Table 12 Odds Ratios for Somnolence and Switch to Mania
Table 13 Odds Ratios for Extrapyramidal Symptoms and Akathisia
Table 14 Surface Under the Cumulative Ranking Curve (SUCRA)


Descriptive estimates of the number needed to treat (NNT) to achieve one additional responder relative to placebo and the number needed to harm (NNH) based on one additional all-cause discontinuation and discontinuation due to adverse events relative to placebo were calculated for each treatment arm. Lurasidone (5) had the lowest NNT value for response (highest responder rates) followed by quetiapine (6), olanzapine (10), cariprazine (11), aripiprazole (50) and ziprasidone (100). For remission defined as MADRS ≤12 lurasidone (6) and quetiapine (6) had lowest NNT followed by olanzapine (13) and ziprasidone (250). For remission defined as MADRS ≤10 lurasidone (8) had lower NNT than cariprazine (13). The NNH values for all-cause discontinuations were the highest for quetiapine (500) (lowest discontinuation rate) followed by lurasidone (100), cariprazine (100), olanzapine (15), ziprasidone (15) and aripiprazole (10). The NNH for discontinuation due to adverse events was the highest for lurasidone (250) followed by cariprazine (50), olanzapine (50), ziprasidone (50), aripiprazole (17) and quetiapine (15).

Sensitivity analyses

Dose stratified sensitivity analysis results were largely similar to the base case findings. For cariprazine, only the 1.5 mg and 3.0 mg cariprazine treatment arms used once daily were significantly greater than placebo, whereas all the stratified doses for lurasidone, olanzapine, and quetiapine were significantly more efficacious than placebo.

Quality of evidence and heterogeneity

Overall, the quality of evidence for the primary outcome was high for direct evidence with quality reduced for NMA evidence, primarily because of indirectness of results and imprecisions. For all other outcomes, in general, lurasidone vs placebo and cariprazine vs placebo comparisons had higher quality, whereas ziprasidone vs placebo, and aripiprazole vs placebo comparisons had lower quality owing to limitations in the risk of bias. GRADE results are presented in Additional file 1: Table 7a-b.

Overall, the risk of bias was low, with 6 studies showing some concerns related to the randomization process (e.g., incomplete description of allocation concealment). Results of the risk of bias assessment is presented in Additional file 1: Figure 2.

For the primary outcome change from baseline in MADRS, comparison-adjusted funnel plots of the network meta-analysis did not suggest that small studies gave different results from larger studies. This was true for all other outcomes evaluated in this network meta-analysis, except for continuous outcomes change from baseline in CGI-BP-S overall, and triglycerides, and dichotomous outcomes of response, ≥7% weight gain, and EPS. Potential asymmetry was detected in the funnel plots for these 5 outcomes, suggesting a possibility of reporting bias. Funnel plots can be found in the Additional file 1: Figure 3a-d. Assessment of transitivity showed most of the studies and comparisons had minimal variation in mean age, sex, MADRS and CGI-BP-S depression score at baseline results. Other effect modifiers such as age at onset, weight, and BMI at baseline were reported in few studies only. Most of the studies included only BPD-I patients, whereas 7 (39%) studies included both BPD-I and BPD-II patients. Detailed results are presented in Additional file 1: Figure 4a, b. Heterogeneity was assessed by the median between-study variance (Tau2) and ranged from 0 to 12.57, with some considered moderate to high (Additional file 1: Table 8).


In this NMA involving short term trials of atypical antipsychotic monotherapy treatment for patients with bipolar depression, lurasidone, quetiapine, olanzapine, and cariprazine were found to be significantly more efficacious than placebo as assessed by change in MADRS. In pairwise comparisons for change in MADRS, lurasidone, cariprazine, olanzapine, and quetiapine were found to be similar. According to SUCRA analyses lurasidone, olanzapine and quetiapine ranked first for improvement in MADRS compared to placebo followed by cariprazine, ziprasidone and aripiprazole.

While mean change from baseline on MADRS is often a primary endpoint in clinical trials, clinicians in practice are often also interested in understanding the magnitude of treatment response. Lurasidone had significantly greater odds of response (defined as ≥50% improvement in MADRS), than cariprazine, aripiprazole, and ziprasidone. In addition, the NNT for response was the lowest for lurasidone when compared to other atypical antipsychotics. With lurasidone, 5 patients needed to be treated for one patient to respond, while other atypical antipsychotics required treating 6 (quetiapine) to 12 (cariprazine) patients for one patient to respond.

For improving overall severity assessed by CGI-BP-S, lurasidone, quetiapine and cariprazine were significantly more efficacious than placebo; and in pairwise comparisons for overall CGI-BP-S, lurasidone was associated with significantly more improvement than cariprazine and ziprasidone. Lurasidone was significantly better than placebo for improvement in CGI-BP-S depression score, but olanzapine and aripiprazole were not. CGI-S scores represent a global assessment of patient severity by the investigator and therefore provide a clinically relevant measure of real-world effectiveness.

Lurasidone and aripiprazole had similar weight gain compared to placebo while cariprazine, olanzapine and quetiapine had significantly greater weight gain than placebo. Additionally, lurasidone also had significantly less weight gain than olanzapine and quetiapine. Lurasidone had highest NNH values (lowest rates) for discontinuation due to adverse events. The current NMA extends earlier meta-analyses by including cariprazine, conducting pairwise comparisons between all atypical antipsychotics, examining additional outcome variables such as triglycerides, LDL, glucose, prolactin, and assessing discontinuation due to efficacy and adverse events.

The efficacy findings for lurasidone, quetiapine, and olanzapine are consistent with previous meta-analyses of atypical antipsychotic monotherapy in bipolar depression [12,13,14, 16, 48,49,50,51,52]. While prior meta-analyses have largely focused on efficacy instead of tolerability, estimated differences in all-cause discontinuation in prior analyses have also been consistent with the current analyses [14, 15].

Antidepressant monotherapy appears to be the most common treatment in bipolar depression in usual clinical care, despite treatment guidelines highlighting the lack of evidence supporting their use and recognizing concern about switching patients into mania [49,50,51,52,53]. Given the limited evidence for efficacy and the potential to switch to mania or cause rapid cycling [11, 54] this practice appears to be a potential target for further evidence based investigation. Consistent with accumulating evidence of the efficacy of some atypical antipsychotics in bipolar depression, the use of atypical antipsychotics has been increasing in bipolar disorder [55, 56].


There were several limitations to this NMA. The quetiapine trials included bipolar II patients which may confound the results. While the baseline patient characteristics that were examined from the different trials appeared largely similar, unmeasured confounders could exist. Another limitation was inconsistent reporting of outcome variables. Some outcome variables such as EPS symptoms were not reported for all trials or not reported consistently [35, 36, 39, 44,45,46]. In addition, the metabolic laboratory values from the different studies were not all specified as fasting measurements. To increase consistency in study design and reported outcomes, this NMA was limited to atypical antipsychotics used as monotherapy to treat patients with bipolar depression. Inclusion of other treatments such as lithium, lamotrigine, and divalproex as well as combination treatments was beyond the scope of the current analysis. Although the included studies were deemed comparable in transitivity assessment, moderate to high heterogeneity was observed for some outcomes in the NMA, compared with the predictive distribution of heterogeneity [31]. Similar to the original NMA, meta-regression, which can potentially adjust for effect modifiers, was not performed because of the absence of a sufficiently large number of trials per comparison that is required to render meta-regression feasible at the aggregate level [15]. Random effects models were applied to account for between-study variance for the NMA, but the presence of heterogeneity should be acknowledged when interpreting the findings. For the included evidence, despite the statistical tests showing no small-study effects for most outcomes, we found some potential asymmetry of comparison-adjusted funnel plots in this network meta-analysis. All the identified trials were placebo controlled RCTs leading to star shaped networks, and therefore indirect evidence and hierarches should be interpreted with caution.


In this NMA in adults with bipolar depression, which evaluated change in depressive symptoms (assessed by MADRS) across short-term trials, the largest improvement versus placebo was observed for lurasidone, olanzapine and quetiapine with cariprazine, showing smaller treatment effect. Aripiprazole and ziprasidone were ineffective for the treatment of bipolar depression.. Improvement in CGI-BP-S score for lurasidone was larger than cariprazine and ziprasidone but similar to quetiapine and olanzapine. Based on short term studies lurasidone and aripiprazole had similar weight gain compared to placebo.