Background

Results from effectiveness trials on antipsychotics have been awaited with anticipation. Several ongoing or recently completed effectiveness studies in both the USA and Europe have been expected to supplement the base of evidence regarding the clinical use of antipsychotic drugs. According to present international recommendations most second generation antipsychotics (SGAs) other than clozapine are considered first line drugs for a patient suffering from psychosis [14]. Despite differing chemical and pharmacological properties double-blind, randomized, controlled clinical trials (RCTs) fail to consistently demonstrate superiority for any of these drugs on efficacy measures. This is reflected in the inconclusiveness of systematic reviews on antipsychotics which call for longer-term trials with more pragmatic designs [512]. RCTs of efficacy are indeed important for new candidate antipsychotics in establishing superiority over placebo and/or non-inferiority compared to reference antipsychotics. Several methodological issues concerning sample selection and the rigid experimental environment could restrict the generalizability of the efficacy trial results, however. Selection bias for one is a major concern. The proportion actually included in the studies are in many instances difficult to quantify, as the number of patients initially assessed for eligibility rarely is disclosed in the scientific papers. Where reported, however, this proportion in different studies is found to be as low as 7–27% [1316]. Then the results from efficacy trials are extrapolated to clinical populations that may have different characteristics. Adding to this, patients in normal clinical practice commonly use more than one psychotropic drug [1720]. As to what extent these combinations will modulate the antipsychotic drug effects and tolerability outcomes registered in RCTs of efficacy remains to be answered.

The trials of effectiveness have been launched in recent years to address some of the limitations of efficacy trials. Effectiveness trials, as opposed to efficacy trials, take a more pragmatic approach, and could be a rational approach to the problems related to selection and experimental settings. The trial design is also frequently labelled "naturalistic", "real-life", "pragmatic", or "practical". These terms are not strictly defined, but the common denominator is that both sample and experimental environment should resemble daily clinical practice. The core question of effectiveness trials is how a treatment works under normal clinical circumstances, not in the ideal situations in the efficacy setting [21]. Another important feature of effectiveness trials is that outcome measures also include more global aspects of patient functioning, such as quality of life measures. Nasrallah et al. [22] propose a model where effectiveness is measured according to four domains: Symptoms of disease, treatment burden, disease burden, and health and wellness. Using a modified, 3-domain version of the effectiveness definition proposed by Nasrallah et al., the aims of the review were to investigate whether effectiveness trials have disclosed differences between SGAs in the domains of global outcomes, symptoms of disease, and how between-drug differences in adverse effect profiles are expressed in naturalistic settings.

Objective

To review the head-to-head effectiveness of SGAs in the domains of global outcomes, symptoms of disease, and tolerability.

Methods

Types of studies

All relevant original randomized, controlled clinical effectiveness trials with head-to-head comparisons of second generation antipsychotics were eligible for inclusion. Trials were categorized as effectiveness studies if there was a statement from the authors that a naturalistic, pragmatic, practical, or real life study design was used, or if the methodology section was presented in corresponding terms.

Studies that restricted the use of concomitant medications were excluded. In every day clinical practice, adjunctive psychoactive medications such as antidepressants and mood stabilizers are commonly used concomitantly with antipsychotics. This practice is in many instances in accordance with treatment guidelines [1, 23]. Exclusion from studies of patients that qualify for adjunctive antidepressants or mood stabilizers, or restricting them from using these drugs, are in the opinion of the authors in conflict with the pragmatic principle of effectiveness studies. Only restrictions on the use of more than one antipsychotic drug were tolerated. Conference abstracts were excluded. Clozapine trials were excluded as this agent is commonly not regarded a first-line treatment option.

Types of participants

Adult patients (over 16 years of age) with a diagnosis of schizophrenia or schizophrenia like disorder such as delusional disorder, schizoaffective disorder, schizophreniform psychosis.

Types of intervention

1. First line second generation antipsychotic drugs: aripiprazole, olanzapine, quetiapine, risperidone, ziprasidone.

2 First generation antipsychotic drugs: chlorpromazine, haloperidole, perphenazine (when included as comparators in head-to-head comparisons of first line second generation antipsychotic drug).

3. Placebo (when included as a comparator group in head-to-head comparisons of first line second generation antipsychotic drug).

Types of outcome measures

1. Global outcomes

1.1 Time until discontinuation of assigned antipsychotic drug for any and specific causes.

1.2 Compliance with assigned antipsychotic drug.

1.3 Duration of hospitalisation.

1.4 Total mental health treatment costs.

1.5 Quality of life – as defined by each of the studies.

2. Symptoms of disease

2.1 Average score/change in mental state – as defined by each of the studies.

3. Tolerability

3.1 Extrapyramidal side effects.

3.2 Metabolic side effects.

3.3 Prolactin related symptoms.

3.4 Other adverse effects, general and specific.

4. Concomitant medication

4.1 Incidence of antidepressants, antiparkinson drugs, mood stabilizers, sedatives.

Search strategy for identification of studies

1. Electronic searching

Searches were made in Embase, PubMED and the Cochrane central register of controlled trials for articles published from 1980 to 2008, week 1 using the phrase:

[(unwanted effect* AND compar* AND antipsychotic* AND random*) OR (unwanted effect* AND compar* AND neuroleptic* AND random*) OR (side effect* AND compar* AND neuroleptic* AND random*) OR (side effect* AND compar* AND antipsychotic* AND random*) OR (tolera* AND compar* AND neuroleptic* AND random*) OR (tolera* AND compar* AND antipsychotic* AND random*) OR (efficacy AND compar* AND neuroleptic* AND random*) OR (efficacy AND compar* AND antipsychotic* AND random*) OR (pragmatic AND antipsychotic* AND random*) OR (pragmatic AND neuroleptic* AND random*) OR (real life AND neuroleptic* AND random*) OR (real life AND antipsychotic* AND random*) OR (naturalistic AND antipsychotic* AND random*) OR (naturalistic AND neuroleptic* AND random*) OR (open AND neuroleptic* AND random*) OR (open AND antipsychotic* AND random*) NOT (antipsychotic* AND tourette*) NOT (neuroleptic* AND tourette*) NOT (neuroleptic* AND mania) NOT (antipsychotic* AND mania)].

2. Reference searching

The references of all identified studies were inspected for more trials.

Methods of the review

1. Selection of trials

All citations were inspected by the principal reviewer (EJ). Studies were selected on the basis of abstracts, and in cases of doubt, the full text articles were consulted. If doubt remained, this was resolved by discussion with HAJ.

2. Data collection

Data were extracted from both text and tables of the original papers.

3. Data synthesis

3.1 Data presentation

Data on effectiveness was evaluated and grouped according to the three domains: global outcomes, symptoms of disease, and tolerability.

Outcomes were compared between the treatment groups and graded according to statistically significant inferiority (<), statistically significant superiority (>), or equality (=) between groups. Level of significance was set at 0.05.

3.2 Sensitivity analyses
3.2.1 First episode

Results from those experiencing their first episode of psychosis are analyzed separately.

3.2.2 Intention to treat analyses

When result gained from both intention to treat analyses (ITT) and observed cases (OC) are available, both are included. P-values are only given for ITT analyses if the results of both ITT and OC analyses are statistically significant.

4. Heterogeneity

Results from different trials are analyzed separately because of study heterogeneity.

5. Addressing funding bias

The funding parties of the studies were registered, as the matter of conflict of interest in research has become a growing concern in recent years.

Results

Description of studies

The search provided more than a thousand different hits in the databases, of which the vast majority were non-relevant, animal studies or studies on efficacy. A total of 16 different reports from 10 randomized trials of effectiveness with head-to-head comparisons of SGAs were located. The methodological aspects of the trials are presented in Table 1. Five studies present data from different phases of the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) [31, 3538]. The studies by Ritchie et al. [33, 34] include elderly patients exclusively, as opposed to the others. The study sample is the same in the two studies by Ritchie et al., but the studies present data from different phases of the same trial. There are differences across all studies concerning source of sample recruitment (in- or outpatients), phase of illness (first episode or chronic), antipsychotic drug doses, and level of diagnostic specificity in the inclusion criteria. The study by McEvoy et al. [25] and the CATIE studies applied double-blinding, the studies by Robinson et al. [26] and McCue et al. [29] had rater-blinding, whereas the rest were without blinding.

Table 1 Methodology

Global outcomes

Table 2 gives an overview of global outcomes. The most consistent difference is that chronic patients treated with olanzapine used this antipsychotic drug for a longer time or with better adherence compared to the other SGAs [30, 31, 36, 37, 39]. Regarding total mental health treatment costs, two studies found significantly lower costs using FGAs compared to SGAs [30, 35], whereas one study found the total costs to be equal across the drug generations [38]. Costs were only estimated in chronic-phase studies.

Table 2 Global outcomes

In first episode patients the comparators performed equally.

Symptoms of disease

Regarding symptoms of disease (Table 3) the studies on acute-phase patients including first-episode patients disclosed no differences between comparator drugs in the clinical global impression [24, 25]; overall symptoms of psychosis [2427, 29], depression or mania [24]. In one study comparing olanzapine and risperidone in first-episode patients the response seemed to be more stable in the risperidone group, as measured by the Schedule for Affective Disorders and Schizophrenia Change Version with psychosis and disorganization items (SADS-C+PD) and the Clinical Global Impression scale (CGI) [26]. In chronic patients the results were less uniform. The majority of the studies found the SGAs to be equally effective for symptoms of psychosis [3035, 37]. Olanzapine was more effective for symptoms of psychosis compared to quetiapine and ziprasidone as measured by PANSS in one study [36]. In one study quetiapine was significantly more effective for symptoms of depression compared to risperidone as measured by the Hamilton Rating Scale for Depression (HAM-D) [32], whereas olanzapine and risperidone was equally effective on this outcome measure in 3 studies [30, 33, 34]. No differences between the SGAs were disclosed on the CGI scale.

Table 3 Symptoms of disease

Tolerability

Concerning the tolerability outcomes (Table 4) the most consistent differences between the SGAs were more weight gain and adverse effects on serum lipids in the olanzapine treated groups. This was found in both acute phase including first-episode patients and chronically ill patients [2427, 31, 36, 37, 39]. Only one study comparing olanzapine and risperidone in the elderly found no significant difference between the drugs with regards to weight gain [34]. Regarding sexual dysfunction and related symptoms only one study used this outcome measure in the acute phase trials and found no difference between haloperidol, olanzapine and risperidone. In the studies including chronic patients no differences between olanzapine, quetiapine, risperidone and ziprasidone were found in one study on this outcome measure despite significantly higher mean prolactin change in the risperidone group compared to the others [31], and no difference was found between olanzapine and risperidone in the elderly [34]. Risperidone was associated with more sexual dysfunctions and gynecomastia/galactorrhoea compared to olanzapine, quetiapine and ziprasidone in CATIE patients who had previously discontinued previous treatment with an SGA [36]; whereas olanzapine, quetiapine and risperidone were equally associated with sexual dysfunctions in those who had previously discontinued perphenazine [37]. In both the latter studies risperidone treated groups had more prolactin elevation than the other SGAs. The incidence of extrapyramidal symptoms (EPS) were equally distributed among the comparator SGA groups across all studies using this outcome measure, whereas the FGAs were associated with significantly more EPS or discontinuation owing to EPS in 3 [24, 31, 39] of 5 studies, whereas 2 studies did not find EPS differences between FGAs and SGAs [29, 30]. The last study including an FGA arm did not have outcomes on this measure.

Table 4 Tolerability outcomes (rating scales/outcome measures)

Concomitant medication

On the use of concomitant medications (Table 5) there were no consistent differences between the SGAs.

Table 5 Concomitant medication

Funding

Six studies were supported by pharmaceutical companies. Reported findings of differences between comparator SGAs were in favour of the supporter's drugs in 5 of these studies [27, 3234, 39], whereas one study found equal effectiveness among the comparators [25].

Discussion

In this literature search 16 reports were located with comparisons of SGAs in clinical settings, and performed with the basic methodological demands such as randomization fulfilled. With regards to global outcomes the most consistent finding was a superior drug adherence or time to treatment discontinuation (drug survival) for olanzapine in patients suffering from chronic schizophrenia. Drug adherence and survival were considered global effectiveness outcome measures, as they were thought to reflect both efficacy and tolerability of the drugs as judged by both the patient and treating psychiatrist. The outcome measure is clinically important, as antipsychotic drug adherence has major influences on risks of relapse, rehospitalisation and suicide in patients with schizophrenia [40]. Three of the five studies using this outcome measure were from the CATIE trial. A critical question is whether the comparator drugs were used in equivalent doses in the CATIE studies [41]. To permit blinding combined with flexible-dose regimens in the CATIE study, drug doses representing a quartile of the maximal daily drug dose were packaged in 4 capsules that were identical-appearing for all study drugs. The matter of choice of upper dose limit has substantial impact upon the individual steps in the up-titration of the drugs. The upper dose limit for olanzapine is, in contrast to the other drugs, set above the label-defined upper dosage limit of 20 mg [31]. This causes bigger up-titration steps for olanzapine than the comparators, as each up-titration step correlates with more "response" for olanzapine compared to the other drugs under investigation. The fact that the studies by Jerrel [30] and Tunis et al. [39] have similar results to the ones from the CATIE despite the use of lower doses of olanzapine supports the finding of superiority of olanzapine on treatment adherence, however.

Regarding ability to alleviate symptoms of psychosis, the drugs performed equally in all acute phase studies and all but one chronic phase studies. The solitary CATIE study that found olanzapine to be superior to quetiapine and ziprasidone had a sample of chronic schizophrenia patients that had previously discontinued an SGA because of intolerability. In this study the mean DDD of olanzapine was 2.05 compared to 1.41 and 1.45 for quetiapine and ziprasidone respectively. The difference in total PANSS response may in line with the above mentioned argument be a result of non-equivalent doses. In the study by Tunis et al. [39] olanzapine had more clinical responder days compared to risperidone, as defined by mean number of days with scores of the Brief Psychiatric Rating Scale (BPRS) less than 18. This finding does not necessarily imply that the olanzapine group was more effective than the risperidone group as measured by total reduction of BPRS. The latter comparison is not disclosed in the paper.

The tolerability outcomes were somewhat surprising as the SGAs performed equally on most measures. Maybe the most striking finding was the lack of differences between the SGAs with regard to the extrapyramidal syndrome and related side effects across all studies. This might indicate that the drugs' distinct side effect profiles derived from efficacy trials are "levelled out" at least to some degree in the naturalistic setting where samples are more heterogeneous and concomitant psychotropics are less restricted. It is worth noting that some of the studies have rather low sample sizes, which increases the risk of statistical type 2 errors and thereby failure to detect real differences between drugs. The most pronounced difference between the SGAs was in the area of metabolic adverse effects, where olanzapine-treated patients gained more weight and had the most adverse influence on cholesterol and triglycerides levels.

Six studies included a FGA arm in the design and two of these studies found the FGA(s) to be associated with lower total mental health care costs [30, 35], whereas one study found equal total costs between olanzapine, risperidone and the FGAs haloperidol and perphenazine [39]. The latter study was supported by industry. Cost-effectiveness measurements were not included in the rest of the studies. Based on the present review the SGAs were not superior to FGAs with regards to treating symptoms of disease. The FGAs were associated with more EPS and related adverse effects in 3 of the studies involving a FGA arm, however. Important factors not reviewed in the present paper include the potential differential effects of SGAs versus FGAs on cognitive impairments and risk of relapse. There are some indications that SGAs are superior to FGAs on these outcome measures [42, 43]. Before these issues are further investigated in effectiveness studies it would be premature to properly estimate cost-benefit of the drugs.

One third of the studies were funded by the pharmaceutical industry and in these studies main findings of differences between the SGAs were in favour of the funder's product in 5 of 6 cases. In recent years the matter of conflict of interest in research has become a major concern as a high number of psychotropic drug trials are financially supported by the industry, and "funding bias" has been pin-pointed by several authors [4449]. In a review of head-to-head RCT comparisons of SGAs, outcomes were in favour of the funding party in 9 out of 10 studies, which also led to contradictory results in studies from different sources of sponsorship [50]. In a recent meta-analysis by Davis et al. [51] on the efficacy of SGAs, the conclusion is that some of the SGAs are more efficacious than others when compared to FGAs. Davis et al. state that almost all the studies in their analysis were supported by industry. In the SGA vs. FGA comparisons it is reasonable to presume without exception that the FGAs are the reference drugs and that sponsorships are strongly associated with the SGAs. Interestingly, the SGAs with the highest effect sizes were also the ones represented with the largest number of studies (Figure 1). In fact, there seems to be an almost linear relationship between number of studies on, and effect sizes for individual SGAs, with remoxipride being the only deviator. If there is a systematic bias that favours the drug of the funding party, this could at least in part explain the obvious "dose-response" relationships observed in Figure 1, with number of studies being the "dose" and effect size being the "response". Results from both efficacy and effectiveness studies funded by the pharmaceutical industry should accordingly be interpreted bearing the possibility of "funding bias" in mind.

Figure 1
figure 1

Effect sizes of second generation antipsychotics compared with first generation antipsychotics. Relationship between number of studies on the individual drugs, and effect sizes of the respective 10 second generation antipsychotics compared to first generation antipsychotics. Adapted from Table 2 in Davis et al. [44]

The present sixteen studies were all performed according to naturalistic designs. However, certain methodological differences reflect that the concepts of effectiveness trials and of naturalistic/real life/pragmatic/practical approaches are not strictly defined. There are obvious differences between the studies' samples (Table 1). This fact makes it very dubious to pool the rates from individual studies in meta-analyses and for instance calculate joint effect-sizes ("the apples and oranges error) [52]. To bypass this source of bias, we presented the main results from each study separately and with emphasis on statistically significant differences between the SGAs excluding absolute figures or rates. This of course represents a crude method but is well suited in search of robust differences between the SGAs. Four studies permitted the use of additional antipsychotics which could confound comparisons between the SGAs [2830, 32]. No differences were found between the SGAs in the use of supplemental antipsychotics, however. Another limitation of this review is that only 3 studies included first-episode patients of which 1 study was of very short duration. From a clinical point of view, the first-time antipsychotic intervention is the one associated with the highest degree of uncertainty. In the chronic patient, prior experience with antipsychotics may deliver valuable information in the decision making about choosing an antipsychotic drug. For the drug-naïve, physicians are forced to perform sometimes extensive drug "trials" in the individual patient before an antipsychotic drug with satisfying effect profile is identified. Besides the economical aspects and the strain on the patient, this "trial" approach that may take several months extends the duration of untreated psychosis which may be a negative prognostic factor. More effectiveness studies on first-episode patients and with longer follow-up are called for.

Conclusion

Despite the limitations mentioned above we conclude that in chronically ill patients olanzapine may have an advantage over other SGAs regarding longer time to treatment discontinuation and a better drug adherence, but olanzapine is also associated with more metabolic side effects. The SGAs were equally associated with EPS and related side effects.

More studies on first-episode psychosis are needed.