Background

Bipolar disorder is now recognised as a potentially treatable psychiatric illness with substantial morbidity and mortality and high social and economic impact [1]. There is no cure, and every aspect of its definition, mechanisms and treatment is subject to debate. Moreover, bipolar disorder is common, with an estimated lifetime prevalence of 2%in a recent Canadian study [2], and of 3% for bipolar I disorder in a US study [3].

Therapies must address the control of acute episodes (manic, depressed or mixed), and maintenance of remission of symptoms. Drug treatments have included lithium, anticonvulsants, and antipsychotics, but current therapies have proven inadequate for many patients; only half of bipolar patients achieve remission over two years, and half of these relapse within the two years [4]. Issues in drug treatment involve not only efficacy, but also tolerability. Adverse events, including extrapyramidal symptoms and weight gain, can be significant and influence adherence.

Newer (atypical) antipsychotics are generally considered to have fewer extrapyramidal effects. They have proven efficacy in treatment of acute mania and schizophrenia [5] and have also been used in dementia [6]. Newer drugs are often subjected to more, better, and more detailed investigation in randomised trials than older medicines.

Given the likely nature of randomised trials available, the aim of this review was to:

1. Examine the efficacy in randomised trials of atypical antipsychotics where the presenting episode is depression or manic/mixed, comparing atypical antipsychotic with placebo or active comparator.

2. Examine withdrawals for any cause, or due to lack of efficacy or adverse events.

3. Combine all phases for adverse event analysis.

Potential sources of clinical heterogeneity in the studies are types of patient, severity and duration of symptoms, drug and dose used, the duration of therapy and/or study, and the aim of therapy, whether for treatment of acute symptoms or maintenance of remission. In addition there may be differences in which outcomes were measured and reported.

Methods

We searched PubMed, EMBASE and the Cochrane Library up to December 2006 for randomised controlled trials using atypical antipsychotic drugs to treat bipolar disorder. The search strategy used individual drug names, "bipolar" and "random*", together with appropriate indexing terms for bipolar disorder and randomised controlled trial.

For inclusion trials had to be randomised and double blind, and use an atypical antipsychotic drug alone or in combination with a mood stabilising drug such as lithium, valproate, divalproex, lamotrigine, or carbamazepine to treat adult patients with documented bipolar disorder, with either a placebo or active comparator. Trials had to have a minimum of 10 patients per treatment arm, and a planned duration of at least three weeks. The abstracts were read, and potentially useful reports retrieved in full paper copy. Decisions on inclusion or exclusion were made by consensus. No information was taken from posters or abstracts, and studies were read carefully to avoid including duplicate material. Studies were scored for reporting quality using a common method [7] utilising reporting of randomisation, blinding and withdrawals. The maximum score possible was 5 points, and no study could be included with fewer than 2 points (one for randomisation and one for blinding).

Information extracted from the trials included details of the patients (number, age, sex, nature of presenting episode), treatment regimens and concomitant medications. We used the number of patients randomised and receiving at least a single dose of drug in order to have an intention to treat analysis; almost all outcomes were reported in this way. Outcomes of efficacy, tolerability and harm, and switching to the opposite state/pole were extracted, using dichotomous data wherever possible. For efficacy we particularly sought information on response and/or remission, and for harm, information on weight gain, extrapyramidal symptoms, and changes in prolactin, glucose and lipid levels.

Guidelines for quality of reporting of meta-analyses were followed where appropriate [8]. The prior intention was to pool data where there was clinical and methodological homogeneity, with similar patients, dose, duration, outcomes, and comparators, but not where numbers of events were small, and random chance could dominate effects of treatment [9]. Homogeneity tests and funnel plots, though commonly used in meta-analysis, were not used here because they have been found to be unreliable [1012]. Instead clinical homogeneity was examined graphically [13]. Relative benefit (or risk) and number-needed-to-treat or harm (NNT or NNH) were calculated with 95% confidence intervals. Relative benefit or risk was calculated using a fixed effects model [14], with no statistically significant difference between treatments assumed when the 95% confidence intervals included unity. We added 0.5 to treatment and comparator arms of trials in which at least one arm had no events. Number-needed-to-treat (or harm) was calculated by the method of Cook and Sackett [15] using the pooled number of observations only when there was a statistically significant difference of relative benefit or risk (where the confidence interval did not include 1). Statistical significance of any difference between numbers needed to treat for different drugs was assumed if there was no overlap of the confidence intervals, and additionally tested using the z statistic [16].

The following terms were used to describe adverse outcomes in terms of harm or prevention of harm [17]:

• When significantly fewer adverse events occurred with atypical antipsychotic than with control (placebo or active) we used the term the number-needed-to-treat to prevent one event (NNTp).

• When significantly more adverse events occurred with atypical antipsychotic compared with control (placebo or active) we used the term the number-needed-to-harm to cause one event (NNH).

We chose only to pool data for analysis if there were at least two trials and at least 250 patients [9]. We chose to analyse according to comparator (placebo or active) and trial duration, separating short-term trials of less than six weeks, from those of six to 12 weeks. Longer duration trials involved maintenance therapy following response to treatment, and these were also analysed separately as trials of longer than 12 weeks.

Results

We found five trials [1822] in which the participants presented with a depressive episode, and 25 [2349] (two reported separately at two time points) in which the presenting episode was manic or mixed. All had industry sponsorship. Details of the included studies together with outcome data extracted from the studies are provided for presenting episode of depression [see Additional file 1] and mania/mixed [see Additional file 2], as well as individual adverse events [see Additional file 3], and a list of excluded studies [see Additional file 4].

Reported outcomes were measured using some kind of scale (depression or mania rating scales, weight, cholesterol levels) [see Additional files 1 and 2], while other outcomes, predominantly treatment emergent adverse events, were elicited from patients as subjective evaluations [see Additional file 3]. A few outcomes were reported both using scale measurements and subjective evaluations. Wherever the distinction was clear, both sets of data are presented.

Efficacy

Presenting episode: depression

Five trials reported on 1,739 patients, 2206 of whom were treated with an atypical antipsychotic. Mean ages in the trials were 36 to 42 years, and just under half (44%) of patients were men. Patients were diagnosed as Bipolar I [18, 21] or Bipolar I or II [19, 20, 22], and in one trial [20] patients were excluded if they had failed to respond to at least two classes of antidepressant in the current episode. The trials were of mixed reporting quality, with one scoring 5, and three 4, and one 3, out of a maximum 5 points.

Most patients (80%) were in three large placebo-controlled trials [18, 20, 23] lasting eight weeks, one comparing olanzapine monotherapy or olanzapine plus fluoxetine with placebo, and the others comparing quetiapine monotherapy, at different target dosages, with placebo. One, small (30 patients), placebo-controlled trial lasting 12 weeks [19], examined a mood stabiliser together with risperidone, paroxetine or a combination of the two. The remaining patients were in an active controlled trial comparing olanzapine plus fluoxetine with lamotrigine [21]. All trials permitted limited use of benzodiazepines for the first three to four weeks of treatment. The numbers of patients treated with each drug are in Table 1, and dosage, mean daily doses of trial drugs elsewhere [see Additional file 1].

Table 1 Numbers of patients treated with different drugs in trials of atypical antipsychotics in bipolar disorder

Response to treatment was generally defined as ≥50% reduction in depression rating scale measurement, remission as ≤12 on MADRS or ≤7 on HAM-D, and emergence of/switch to mania as YMRS ≥15 or 16. Results for the three trials with placebo-only control groups [18, 20, 22] are in Table 2 with analysis for individual monotherapy using titrated doses of olanzapine [18] and quetiapine monotherapy [20, 22] against placebo combined and separately, but omitting olanzapine plus fluoxetine [18] where there were fewer than 100 patients treated. For both response and remission, all treatments were significantly better than placebo, with a number-needed-to-treat (NNT) of about 4–5 for quetiapine, and about 12 for olanzapine alone; quetiapine was significantly better than olanzapine. The combined NNT was about 6 for response and 5 for remission. The rate of switch into a manic state was low (2–6%), and not significantly different from placebo (4–7%) for either treatment.

Table 2 Outcomes for placeo controlled trials in bipolar depression [18, 20, 22] – efficacy and discontinuations in trials lasting 8 weeks

All cause discontinuations were significantly less common for olanzapine than placebo. Discontinuations for lack of efficacy were significantly less common in all active treatment groups than placebo, with NNTps of about 7 to 11; the combined NNT to prevent one lack of efficacy discontinuation was 7 (95% confidence interval 5 to 9). Discontinuations for adverse events were significantly more common for olanzapine alone and quetiapine than for placebo (NNH 23 for olanzapine, 9 for quetiapine).

There were two trials with an active control. One [19] had only ten patients in each treatment group, and the other [21] demonstrated no large difference between lamotrigine and olanzapine plus fluoxetine.

Presenting episode: mania or mixed

Twenty-five trials [2349] reported on a total of 6,174 patients, 3,226 of whom were treated with an atypical antipsychotic. Mean ages in trials were generally 35 to 43 years, and about half of patients were men (33 to 62% in individual trials). Six trials specifically excluded patients who had a history of intolerance to the experimental, or similar, drugs [27, 30, 33, 45, 46, 48] six excluded those with a history of poor response [28, 34, 37, 38, 40, 44], and six excluded those with a history of either intolerance or poor response [31, 32, 42, 43, 47, 49]. In addition, five trials excluded patients with rapid cycling [31, 36, 37, 43, 44]. Sixteen trials only reported outcomes at less than six weeks (mostly three weeks). Six trials [3944] reported at six to 12 weeks (three reporting results also at three weeks in the same report). Five trials [4549] reported at times longer than 12 weeks, (26 to 78 weeks); two of these trials had previously reported three week results in separate papers included in the 13 papers of less than six week results. Papers were of good to high reporting quality, with one scoring 2, 13 scoring 3, nine 4, and four 5, out of a maximum 5 points. Points were lost due to inadequate descriptions of randomisation/allocation or blinding methods; all described withdrawals and dropouts.

Trials compared atypical antipsychotic as monotherapy or in combination with a mood stabiliser, with placebo, mood stabiliser monotherapy, or other active treatment (divalproex or haloperidol). All trials permitted limited use of benzodiazepines, usually with tapering dose over the first two weeks, and all but four [30, 32, 44, 49] permitted use of anticholinergics for treatment of extrapyramidal symptoms; prophylactic use was not permitted in any trial. Details of dosage, mean daily doses of trial drugs, and concomitant medication are in Additional file 2 [see Additional file 2].

Table 1 shows the number of patients treated with each drug, for periods of less than six weeks, 6–12 weeks, and for longer than 12 weeks. The figures in Table 1 are larger than the total number of patients because some trials reported outcomes after more than one time. Because some trials were only placebo-controlled, and others only active-controlled, there were limits on the amount of information available for analysis.

Response

Response to treatment was generally described as ≥50% decrease in YMRS score (or equivalent) from baseline. For placebo controlled trials lasting less than six weeks, there was remarkable consistency for response between different treatment regimens (Figure 1). Overall, for over 3,000 patients treated with either atypical antipsychotic or placebo, the relative risk was 1.6 (95%CI 1.5 to 1.8), with an NNT of 5.1 (4.4 to 6.2). Results for individual drugs and combined therapy with a mood stabiliser had NNTs between 4.3 and 6.1 (Table 3). In placebo controlled trials lasting 6 to 12 weeks involving over 700 patients (Table 3), the relative risk was 1.6 (1.4 to 1.9) and the NNT 4.0 (3.1 to 5.6), again with very similar results in individual trials (Figure 2).

Table 3 Outcomes for placebo controlled trials in bipolar mania (<6 weeks and 6–12 weeks)
Figure 1
figure 1

Response rates with atypical antipsychotic and placebo in placebo controlled trials lasting less than six weeks, where the presenting episode was mania or mixed. The inset scale relates the number of patients in the comparison.

Figure 2
figure 2

Response rates with atypical antipsychotic and placebo in placebo controlled trials lasting 6–12 weeks, where the presenting episode was mania or mixed. The inset scale relates the number of patients in the comparison.

For active controlled trials there were data for over 900 patients in trials lasting less than six weeks (Table 4), and over 1,200 patients in trials lasting 6 to 12 weeks (Table 4), with no significant difference between treatments. One study [44] individually showed aripipazole to be better than haloperidol, but response rates in that trial were low (Figure 3).

Table 4 Outcomes for active controlled trials in bipolar mania (<6 weeks and 6–12 weeks)
Figure 3
figure 3

Response rates with atypical antipsychotic and comparator in active controlled trials lasting 6–12 weeks, where the presenting episode was mania or mixed. The inset scale relates the number of patients in the comparison.

Only two trials reported on time to response. In one [27] median response time was significantly shorter for olanzapine than divalproex, and in the other [39] it was shorter for olanzapine plus mood stabiliser than for placebo plus mood stabiliser (18 vs 28 days).

Trials lasting longer than 12 weeks enrolled patients who had already responded to treatment, and so response was not an outcome measured or reported in these trials.

Remission

Remission was generally described as YMRS score of ≤12. In placebo controlled trials lasting less than six weeks, data for symptomatic remission were available for over 900 patients in four trials (Table 3), giving a relative risk of 1.7 (1.4 to 2.0), and an NNT of 5.4 (4.0 to 8.1). At 6 to 12 weeks, in over 700 patients in three trials (Table 3), the relative risk was 1.5 (1.3 to 1.7) and the NNT 4.0 (3.1 to 5.5). All atypical antipsychotics appeared to perform equally well.

In active controlled trials there was no significant difference between treatments in trials shorter than six weeks (Table 4). In trials lasting 6–12 weeks remission rates with atypical antipsychotics (54%) were barely different than those with active control (48%) (Table 4).

Three trials reported on median time to remission. It was shorter for olanzapine than divalproex (14 vs 62 days; [27, 45]), shorter for olanzapine plus mood stabiliser than placebo plus mood stabiliser (14 vs 22 days; [39]), but similar for olanzapine and haloperidol (34 vs 29 days; [41]).

Trials lasting longer than 12 weeks enrolled patients who had already responded to treatment with a lessening of symptoms, and so remission was not an outcome measured or reported in these trials.

Emergence of depression

Emergence of depression was generally defined as MADRS score of ≥18 with increase ≥4 from baseline on two consecutive occasions or at endpoint, or HAM-D score ≥15. Few trials lasting up to 12 weeks reported on the emergence of depression (Table 3). In placebo controlled trials lasting up to 12 weeks, no trial individually reported a significant difference, nor was there a difference when trials were combined.

None of four active controlled trials lasting 6 to 12 weeks that reported this outcome individually reported significant difference between atypical antipsychotic and haloperidol or lithium. However, when combined, these trials with 1,000 patients reported significantly lower rates of emergence of depression with atypical antipsychotic (8%) than with active controls 13%) (Table 4), with a relative risk of 0.6 (0.4 to 0.9), and a NNT to prevent one emergent depression compared with active control of 21 (12 to 99). The active controls in this comparison were haloperidol and lithium, and atypical antipsychotics appeared to be particularly better than haloperidol (Figure 4). In the comparison with haloperidol alone in three trials and 795 patients, the NNT to prevent one emergent depression compared with haloperidol was 15 (9 to 48).

Figure 4
figure 4

Emergence of depression with atypical antipsychotic or active comparator in placebo-controlled trials lasting 6–12 weeks, where the presenting episode was mania or mixed. The dark symbol indicates lithium as the comparator, and the light symbols haloperidol. The inset scale relates the number of patients in the comparison.

Relapse in maintenance trials

Trials lasting longer than 12 weeks were designed to investigate maintenance of remission, in terms of relapse into an affective state. They were therefore were much longer than 12 weeks; the range was 26 to 78 weeks.

In three placebo-controlled trials, with 589 patients [46, 48, 49], 135/332 (41%) suffered any relapse (depressive, manic or mixed) with atypical antipsychotic (olanzapine or aripiprazole), compared with 166/257 (65%) with placebo. The relative risk of relapse was 0.6 (0.5 to 0.7), with an NNTp to prevent a relapse of 4.2 (3.1 to 6.2) for olanzapine compared to placebo. Two active controlled trials (487 patients) had slightly lower relapse rates with olanzapine (32%) than lithium or divalproex (41%) but the difference was barely significant with a relative risk of 0.8 (0.6 to 0.98). Time to any relapse was longer for olanzapine than placebo (174 vs 22 days; [46]) and for olanzapine plus mood stabiliser than placebo plus mood stabiliser (163 vs 42 days; [46]), but there was no significant difference for olanzapine and lithium [47].

For relapse into a depressive state, there was bare significant difference (upper limit of confidence interval 0.98; Table 5) between atypical antipsychotic (25%), mostly olanzapine, and placebo (31%) in three trials with 589 patients [46, 48, 49] (relative risk 0.8; 95% confidence interval 0.6 to 0.98). For relapse into a manic state, there was a significant difference between atypical antipsychotic (12%), mostly olanzapine, and placebo (29%) in three trials with 589 patients [46, 48, 49], with a relative risk 0.4 (0.3 to 0.6) and NNTp to prevent one manic relapse of 5.9 (4.2 to 9.5).

Table 5 Outcomes in maintenance trials in bipolar mania – efficacy and discontinuations in trials lasting 26 weeks or longer [vs placebo 46. 48, 49 versus active 45, 47]

Discontinuations

All cause discontinuations were less frequent with atypical antipsychotic than placebo in trials lasting less than six weeks (Table 3), and 6–12 weeks (Table 3). Discontinuations for lack of efficacy were also less common with atypical antipsychotic than placebo in trials lasting less than six weeks and 6–12 weeks, with a similar NNTp as for all cause discontinuations (Table 3). Discontinuations due to adverse events were not significantly different from placebo in trials lasting less than six weeks, but more common in trials lasting 6–12 weeks (Table 3).

In active controlled trials discontinuations for any cause, lack of efficacy or adverse events were not statistically different between treatments for trials lasting less than six weeks, but in trials lasting 6–12 weeks, both all cause and adverse event discontinuations were less common with atypical antipsychotic than active control (Table 4).

In long-term maintenance trials olanzapine had more all cause discontinuations than placebo, but fewer than active comparators, though event rates differed considerably (Table 5). Lack of efficacy and adverse event discontinuations did not differ significantly.

Adverse events

Some adverse events were measured using a scale with predefined criteria; these include weight gain >7%, extrapyramidal symptoms, and glucose and cholesterol levels [see Additional files 1 and 2]. Other adverse events were spontaneously reported by patients or elicited by questions. Most trials reported events only if they occurred in at least 10% of any treatment group, although occasionally there was no lower limit or the limit was 5%, or events were reported if there was a statistically significant difference between groups [see Additional file 3].

Adverse events could be included in both categories. For instance, weight gain may have been measured and gains in excess of 7% reported as a pre-defined outcome, but weight gain may also have been reported by patients as an adverse event. Again, extrapyramidal symptoms may have been prospectively assessed, but patients may also have reported tremor or other symptoms.

Trials presenting with a manic, mixed or depressive episode were analysed together. Treatment emergent adverse events were frequently described as mild or moderate, or of limited duration, particularly somnolence or gastrointestinal events. Few trials specifically reported the absence of serious adverse events, or serious adverse events as a separate category.

Measured adverse events

Weight gain >7% baseline

Predictably, short term trials lasting less than six weeks found no significant difference between atypical antipsychotic (quetiapine, aripiprazole or ziprasidone) and placebo (Table 6). In trials lasting 6–12 weeks [20, 43, 45] and over 12 weeks [46, 48, 49] significantly more patients gained this amount of weight with quetiapine, olanzapine and aripiprazole than with placebo.

Table 6 Adverse events in placebo controlled trials

In active controlled trials atypical antipsychotics olanzapine (and olanzapine and fluoxetine) and quetiapine produced more weight gain than active comparators over 6–12 weeks [21, 43, 45], as did olanzapine in trials lasting more than 12 weeks [45, 47] (Table 7).

Table 7 Adverse events in active controlled trials

Extrapyramidal symptoms

All trials assessed extrapyramidal symptoms using recognised scales (SAS, BARS, AIMS). Few reported actual numbers of patients affected, but rather reported lack of statistical difference [seeAdditional files 1 and 2]. Atypical antipsychotics were reported to produce symptoms in significantly fewer patients than haloperidol [37, 41, 43, 44] and lithium [42].

Prolactin, glucose, lipids

There were few statistically significant changes in laboratory values, and no pattern of change with any treatment.

Patient reported

Almost all trials did not report adverse events occurring below a frequency of 10%, with some occasionally using a lower threshold. In consequence, a number of adverse events were reported sporadically (like constipation, or nausea), making sensible analysis of them impossible.

Weight gain

In placebo-controlled trials of any duration above six weeks, treatment emergent weight gain was reported to occur at approximately the same rate as the rate of measured weight increase above 7% (Table 6) with both atypical antipsychotics (mainly olanzapine) and placebo. In active controlled trials, weight gain was reported as an adverse event less often than when it was measured as an outcome of the trial, both for atypical antipsychotic (mainly olanzapine) and active control (divalproex, lithium, and lamotrigine) (Table 7).

Extrapyramidal symptoms

In placebo-controlled trials the frequency of akathisia was higher than placebo in trials lasting less than six weeks, but not those lasting 6–12 weeks (Table 6). Tremor was more common in both. Where symptoms were reported as extrapyramidal disorder, information was available only for trials of less than six weeks, and in these short-term trials the rate of reporting (20%) for atypical antipsychotics (risperidone and ziprasidone) was significantly more common than with placebo (6%).

Compared with haloperidol and lithium, atypical antipsychotics (olanzapine, quetiapine, and aripiprazole) produced significantly lower rates of akathisia in trials of 6–12 weeks (Table 7). Tremor occurred at the same rate with atypical antipsychotics (olanzapine and risperidone) as with haloperidol and divalproex (Table 7) in trials of six weeks or less, but significantly less frequently for olanzapine, quetiapine and aripiprazole (6%) than haloperidol or lithium (21%) in trials lasting 6–12 weeks. In the single trial using lamotrigine as active comparator, tremor was reported at a higher rate with olanzapine plus fluoxetine [21].

Somnolence

Somnolence occurred significantly more often with atypical antipsychotics than placebo in trials lasting less than six weeks, or of 6–12 weeks (Table 6). In maintenance trials lasting longer than 12 weeks somnolence was not significantly different between atypical antipsychotic (olanzapine) and placebo, but with only 32 events reported in total, and at a much lower rate (6% with atypical) than in the shorter duration trials (26%–30%).

In active controlled trials somnolence was reported frequently with atypical antipsychotic in trials of less than 6 weeks (21%), 6–12 weeks (19%) and longer than 12 weeks (19%). It occurred more frequently than with active controls (haloperidol, divalproex, lithium, or lamotrigine) (Table 7).

Depression

Depression in mania trials did not occur more frequently with atypical antipsychotic than with placebo in trials of less than six weeks or of 6–12 weeks (Table 6). It was not reported in longer duration comparisons with placebo. In longer comparisons there was significantly more treatment emergent depression with olanzapine than divalproex and lithium (Table 7).

Discussion

An evidence-based approach to therapy requires certain fundamentals in order to have confidence in a result, and most confidence comes from systematic review and meta-analysis of good quality randomised trials [50]. Trials should be free from known sources of bias, as far as is practically possible. This includes randomisation, blinding, and using an intention to treat population, or at least knowing about withdrawals and drop outs [5153]. It also includes having information on sufficient numbers of patients [9].

This review extends that of Perlis et al, 2006 [54], which included trials published to 2004, with 18 trials, 4,304 patients in the treatment of mania. This review also includes depression, and included information from 27 trials published to end 2006 with 7,838 patients. In addition to pooling information from these trials on efficacy, as did Perlis et al, we have also pooled information of adverse events.

A number of other systematic reviews and meta-analyses have addressed similar topics. Two Cochrane reviews [55, 56] report on olanzapine and risperidone in acute mania. Other reviews have concentrated on particular aspects – mania, for instance, [54, 57, 58], or bipolar depression [59], or maintenance [60]. While generally similar, they have tended to use different methods. For instance, most concentrated on continuous outcomes, but a problem with mean changes in rating scales is that they can often be mean results of highly skewed distributions, making the means meaningless [61]. This review concentrated on dichotomous outcomes reflecting clinically relevant endpoints of efficacy and harm, and included four atypical antipsychotics over the short, medium, and long term, with trials included if they were published up to December 2006. Different approaches can be helpful in a number of ways, perhaps principally in providing information in ways that can be used by a wider audience.

There is an additional point of contention in systematic reviews and meta-analysis that is relevant here, namely how much it is acceptable to combine data from similar but not identical interventions, participants, duration, or outcome. We have chosen what we believe is a sensible middle course. Efficacy data are shown by both combined and by individual drugs (while recognising that numbers for some outcomes for some drugs may be small). Adverse events are not well reported, and we have chosen to combine the data. Additional files have the results from individual trials, so that others may perform analyses based on their logic or preference.

Trial design and outcomes have also to be valid and useful. This means, for instance, that in life-long illness we have longer rather than shorter trials, or that trials study appropriate patients without unrealistic exclusions or inclusions. It also means that outcomes have to be clinically relevant, and measured and reported in ways that are useful. For instance, a mean change in a composite measure is less useful than knowing the number of patients who have achieved an adequate level of response. Clinical trials reported in journals are limited in the amount they can report, and we know that using more detailed clinical trial reports improves both data access and utility [6264].

The trials included in this review were of good reporting quality. All but one of the included trials scored 3 points or more out of a maximum of 5, a level known to limit the possibility of bias [52]. Points were lost due to inadequate descriptions of randomisation/allocation or blinding methods, and it was likely that allocation and blinding was, in fact, better than reported.

Trials were disparate in terms of atypical antipsychotic used, with olanzapine and quetiapine most commonly used in depression and in mania studies of less than 12 weeks, and olanzapine and aripiprazole the only atypicals tested in long-term maintenance studies. On the one hand, dividing studies by type of presenting episode, by duration, and by comparator, meant limiting the number of patients in each group available for analysis. On the other, combining these trials meant introducing a potentially unacceptable level of clinical heterogeneity. We chose to avoid this as much as possible by analysing by presenting episode and duration, but combining different atypical antipsychotics with a common comparator (placebo or active). There are potential problems with this approach, exemplified by apparent differences in efficacy between olanzapine and quetiapine in bipolar depression (Table 2).

Reporting of outcomes in trials was limiting. Although the number of patients experiencing response and remission were reported, form many other efficacy outcomes like depression and mania rating scores, or global impression, were predominantly reported as mean changes only, when it would be more useful to know how many patients experienced clinically relevant outcomes. The clinical relevance of some efficacy outcomes has also been challenged. For instance, re-analysis of an open-label extension of a randomised trial suggested a better outcome of sustained clinical recovery, where remission was sustained for at least eight weeks, rather than just occurring for any duration [65]. This outcome was not reported in any trial included in the review, and if used would give a much lower, but perhaps more realistic, impression of efficacy.

Trials usually only reported adverse events occurring in at least 10% of patients, so that for many events only sporadic information was available [see Additional file 3], and no analysis of adverse events could be complete. In addition, adverse event reporting could overlap. For example extrapyramidal symptoms like tremor or akathisia might be reported by patients alongside extrapyramidal syndrome or symptoms measured using recognised scales. Moreover, most trials permitted use of medication to treat extrapyramidal symptoms when they occurred, which could, of course, result in lower scores than otherwise on symptom rating scales. Extrapyramidal symptoms would still be recorded as spontaneous adverse events. Because patients experiencing extrapyramidal symptoms are more likely to withdraw or require dose reduction, it is possible that use of anticholinergics may have affected attrition rates, though related better to clinical practice.

The evidence available allows a number of inferences. Where the presenting episode was depression, both olanzapine and quetiapine appear to be efficacious over eight weeks, with more responses and remissions than placebo, and fewer lack of efficacy withdrawals. Adverse event withdrawals were higher than with placebo. There is some evidence that quetiapine at target doses of 300 or 600 mg daily is more efficacious than olanzapine, but at the cost of more adverse event discontinuations.

Where the presenting episode was mania, olanzapine, risperidone, and quetiapine had similar event rates and NNTs compared with placebo in trials shorter than six weeks (Table 3). Combining all atypicals compared with placebo, NNTs for response and remission were about 5. In trials lasting 6–12 weeks, NNTs for response or remission were somewhat better, at about 4. This is generally in accord with a previous meta-analysis [54], though that review combined continuous data to come to the conclusion that atypical antipsychotics were superior to placebo. Atypical antipsychotics produced fewer discontinuations for any cause or lack of efficacy, but somewhat more adverse event discontinuations in longer studies.

Limited comparison with active controls (lithium, valproate, haloperidol, lamotrigine) showed that there was no major difference in efficacy or discontinuations, in shorter duration trials. Perhaps notable, though, was a significantly reduced rate of adverse event discontinuations for atypicals than older active comparators in trials over 6–12 weeks.

Conclusion

In general, atypical antipsychotics are effective in treating both phases of bipolar disorder compared with placebo, and as effective as established drug therapies, though only two (olanzapine and quetiapine) have been tested where the presenting episode was depression. In general, atypical antipsychotics produce fewer extrapyramidal symptoms, but weight gain is more common with olanzapine and quetiapine. There is insufficient data to confidently distinguish between different atypical antipsychotics, predominantly due to the clinical heterogeneity engendered by presentation, drug, dose, comparator, duration of trial, and outcomes measured. Moreover, the weight of evidence of efficacy in bipolar depression resides with olanzapine and quetiapine, and that on weight gain overwhelmingly with olanzapine; extrapolation to other drugs in the class may not be appropriate in these circumstances.