The history of medicine is effectively one of trial and error. For millennia, plants and potions were tried out on people and their effects observed and reported. This rudimentary method is of course open to considerable bias and corruption. Recognition of this led to the development of randomised and blinded controlled trials (RCTs) as the key test of efficacy and tolerability. Evidence from RCTs now forms the basis of the use of all licensed drug treatments.

A development of the RCT is the concept of meta-analysis. Combining results of RCTs of a particular treatment in meta-analyses allows us to distinguish between two treatments (e.g. drug vs. placebo) with much greater certainty. Most recently, the combining of RCTs of different treatments with comparators in common with at least one of the treatments (using the network meta-analysis [NMA] method) has allowed us to rank a large number of treatments according to efficacy and acceptability.

The results of these NMAs often confirm what has already been observed clinically: that antipsychotics are the most effective drugs for mania [1], that sertraline is the best of a group of modern antidepressants [2] or that the combination of olanzapine and fluoxetine is the most efficacious for bipolar depression [3]. The largest NMA of any group of psychotropic drugs so far conducted [4] suggested that clozapine was the most effective antipsychotic and that there was little to choose between non-clozapine antipsychotics.

These largely predictable results have had two effects: clinicians now feel reassured that their clinical observations remain of value (being precisely reflected in the results of NMAs), and the confidence of academics and policy makers in the NMA method has increased to the point where NMA outcomes now strongly influence clinical guideline development throughout the world.

Sometimes, however, NMAs produce results that go against clinical experience and with which few can agree. An NMA of treatments for generalised anxiety disorder (GAD) [5] found fluoxetine to be the drug of choice. This result being in some contrast to clinical observation, this NMA, as far as can be told, had little influence on formal GAD guidelines. The apparently anomalous result was explained away by the fact that the NMA had excluded all trials conducted before GAD was defined and codified (and so few trials of benzodiazepines or buspirone could be included).

In 2016, an NMA was published where findings coincided with perhaps nobody’s clinical experience: Samara et al. [6] conducted an NMA that concluded, effectively, that clozapine was no better than other antipsychotics in the treatment of refractory schizophrenia. Thus, a modern evidenced-based technique concluded that clinicians’ observations over nearly 30 years were wrong. The question that arises is this: is clozapine really not the drug we have believed it to be over decades of widespread use or is the NMA method perhaps not as reliable as we would want it to be?

In any NMA, the validity of its calculated outcomes is dependent on three factors: the inclusion and exclusion of clinical studies, the quality of the included studies, and the statistical robustness of the NMA methodology. The NMA by Samara et al. [6] is unarguably well conducted: the authors were clearly aware of the multitude of possible biases and performed a wide range of pre-planned sensitivity analyses to account for such things as clozapine dose, comparator dose, trial duration and sponsorship. These a priori sub-analyses made little difference to overall outcomes—clozapine remained “not significantly better than most other drugs.” Of course, sensitivity analyses do to some extent minimise the influence of trials with important biases, but in many cases the analyses will inevitably lack statistical power because trial exclusions reduce the ‘strength’ of the network. Moreover, biases can only be accounted for where they are suspected and recorded, and Samara et al. [6] could not account for some highly influential methodological factors.

Let us compare with later trials the structure and design of the study that reported the highest effect size for clozapine (the landmark study by Kane et al. [7]). This seminal study was a statistical outlier in the NMA by Samara et al. [6], which would suggest, under normal circumstances, somewhat suspect validity. An alternative view is that in fact this study calls into question the validity of virtually all other studies included in the NMA.

In the study by Kane et al. [7], participants had failed to respond to at least two prior antipsychotics and, importantly, to high-dose haloperidol given under controlled conditions. No later study assured treatment resistance with such certainty. The NMA sensitivity analyses by Samara et al. [6] did account for trials with treatment-intolerant subjects but not for the rigidity of the definition of treatment resistance. The work by Kane et al. [7] also allowed titration of clozapine according to response, such that the mean daily dose was over 500 mg by the third week of the trial. The effect of dosage and participant selection is well illustrated by a comparison of the outcomes of two apparently similar trials. Bondolfi et al. [8] investigated risperidone (mean dose 6.4 mg/day) compared with clozapine (mean dose 291.2 mg/day) in subjects who were either treatment intolerant or treatment resistant (the proportions of each were not reported). Risperidone and clozapine were found to be similarly effective. Azorin et al. [9] studied risperidone (median dose 6 mg/day) compared with clozapine (median dose 600 mg/day) in subjects with established non-response to previous antipsychotics confirmed by prior poor functioning for 2–5 years. In this study, clozapine was found to be substantially more effective than risperidone.

These two factors—dose and treatment resistance—are crucial to the conduct of fair trials of clozapine. All antipsychotics seem to show a threshold-type dose–response effect whereby there is no effect below a certain dose or plasma level and the total effect of the drug is seen once the threshold is crossed [1012]. Thus, restricting, by whatever means, the dose of any antipsychotic is likely to render it ineffective in some individuals who would otherwise respond to higher doses. Including non-refractory patients in trials has the effect of enhancing the perceived efficacy of the non-clozapine comparator. Clozapine has never been shown to be superior to any drug when used as the first treatment in first-episode schizophrenia [13, 14] (where treatment resistance is relatively less common [15]) but is only comparatively more effective once treatment non-response has been established [16]. The sub-analyses by Samara et al. [6], which excluded trials from the NMA on the grounds of dose or recruitment of treatment-intolerant subjects, did not show clozapine to be superior to comparators, but these exclusions must inevitably have weakened the statistical power of the NMA to show any differences.

The trial by Kane et al. [7] was also perhaps one of the last truly blind studies of clozapine. At the time of the study, little was known about clozapine, and its characteristic adverse effect profile had yet to be identified. As a result, investigators very probably could not distinguish between clozapine and chlorpromazine under standard blind conditions. As evidence of this, Kane et al. [7] found that, of patients receiving clozapine, only 21% reported experiencing drowsiness, 16% reported constipation and 13% salivation. We now know that almost all patients receiving clozapine demonstrate these adverse effects: the low rates of adverse effects reported by Kane et al. [7] probably reflect a failure to look for or anticipate these effects (all later trials reported higher frequencies of sedation and salivation). Double-blind trials of clozapine inevitably consist of a cohort of participants who are sedated, have obvious hypersalivation and complain of constipation and another cohort who report or display radically different adverse effects in terms of both nature and intensity. No trial of clozapine can now be truly ‘blind’—clozapine’s effects are too easily recognised and so bias is inevitable.

A final aspect of this trial [7], which is perhaps unique to it, is the absence of baseline rating inflation. This is a rarely discussed phenomenon that takes place behind the scenes of any RCT. In a multi-centre trial, each study centre will have a recruitment target that is often linked to financial reward. Researchers expend considerable effort searching for suitable participants, and there will always be a temptation to bend the rules of recruitment to make patients fit the pre-determined criteria—often termed ‘enrolment pressure’. Where a lower limit of symptom score is specified, researchers may deliberately or unconsciously inflate the score so as to make a patient eligible. So, in the case of clozapine trials, non-refractory subjects may wrongly be recruited. In all trials, all treatments will tend to show an improvement over baseline when subsequent assessments are made (and where the tendency is to underestimate symptom scores in the expectation of improvement).

This is the type of bias that cannot be accounted for because it is very difficult to identify and quantify. In the study by Kane et al. [7], the imperative was to find truly refractory patients (hence the pre-trial with haloperidol) to test the worth of a potentially toxic drug. In most subsequent trials, the imperative was more often to show a sponsor’s drug to be equivalent to clozapine.

Other commentators have identified other potential biases [17]. Perhaps the most important of these is that, in more recent trials, subjects receiving clozapine may have already failed to respond to, say, risperidone or olanzapine, whereas such non-response would preclude randomisation to the comparator. A further consideration is that ethical restrictions mean modern trials often include patients who are not representative of clinical practice.

Of course there is a huge body of non-RCT evidence to support the unique effect of clozapine in schizophrenia. Naturalistic observations of outcomes in clinical practice strongly support the superiority of clozapine over other antipsychotics in terms of continuation with treatment [18], relapse [19], hospitalisations [20, 21] and even mortality [22]. Clozapine may be the only antipsychotic with which relapse occurs only after periods of non-ingestion and never during full compliance [23]. Clozapine also has a unique ability to reverse years or even decades of illness. In an analysis of 100 long-stay treatment-refractory patients given clozapine, 63 improved, and 40 of these were subsequently able to live in the community [24]. No other antipsychotic has ever been shown to have such a profound effect in practice since the introduction of chlorpromazine in the 1950s. Clinical observations like these make RCTs and NMAs largely redundant: the effect of clozapine in refractory psychosis is one of the few things in psychiatry that is clearly visible to the naked eye.

The findings of meta-analyses and NMAs are, as already mentioned, largely dependent on which studies are included and which are excluded. Ironically, at the same time the NMA by Samara et al. [6] was published, another meta-analysis, using different source data, found clear superiority for clozapine over other drugs in refractory schizophrenia [25]. In effect, the results of any trial of any drug in ‘refractory (or resistant) schizophrenia’ are substantially dependent on what that term means. An analysis of 42 studies [26] found that half did not provide details of how treatment resistance was defined and only two (5%) of the remainder used the same criteria.

With such a hotchpotch of studies available for inclusion, no NMA, however carefully conducted, could generate results that should be taken at face value. Our view of clozapine should not be altered by the NMA by Samara et al. [6]; our clinical observations have not deceived us: clozapine remains the gold-standard treatment for treatment-resistant schizophrenia.