Introduction

The major psychoses (e.g., schizophrenia and bipolar disorder) are complex, heritable brain disorders defined by core symptoms affecting perception of reality, thinking, behavior, mood, and motivation. Most patients experience multiple relapses within the first 5 years of treatment, and long-term course is typically characterized by an episodic or continuous illness, with significant functional impairment. For the ~2 % of adults affected, the experience of a psychotic episode is highly variable: patients can have different clinical symptom profiles and functional outcomes, ranging from recovery to persistent symptoms and cognitive deficits [1]. Having a major psychotic disorder reduces average life expectancy by more than a decade [2].

In the late 19th century, Emil Kraepelin postulated a unifying pathophysiological process to explain the psychopathology that he observed in his patients. He identified this as a disease, dementia praecox, and made a distinction between this disease and manic-depression in his psychotic patients. By defining this dichotomy, between what are now termed schizophrenia and bipolar disorder, he has been central to conceptualization of the major psychotic disorders since. His influence is evident in our modern reliable, classification systems (the American Psychiatric Association (DSM-5) and World Health Organization (ICD-10)). But the clinical classification of psychosis continues to evolve. Kasanin described schizoaffective disorder in 1933 to capture cases acknowledged to have clinical features overlapping both schizophrenia and bipolar disorder [3], but this category has been curbed in the transition from DSM-IV to DSM-5 to become a longitudinal, rather than cross-sectional diagnosis (reviewed in [4]). Kraepelin sought pathophysiology and after a century of enquiry more than a hundred medical causes of psychosis have been identified, from syphilis to NMDA-receptor autoantibody encephalitis [5]. However, collectively, these are likely to account for a small fraction of cases (<2 %). A cohesive understanding of the pathophysiology of the major psychotic disorders is lacking, even to the extent of defining whether one or many disease processes are involved.

A series of pivotal twin and adoption studies beginning in the 1960s confirmed that the long observed familial clustering of psychosis represented a significant genetic contribution to illness susceptibility. In the last decade, the development of high-throughput array technology, genome sequencing, and collaborative research has facilitated unprecedented progress in understanding the genetic architecture of psychosis [610]. Can this information be usefully employed to develop biomarkers for psychosis? What do we even mean by biomarkers? A PubMed search (8th April 2015) using the terms “biomarker” [AND] (“psychosis” [OR] “schizophrenia” [OR] “bipolar disorder”) generated 610 results. Yet, there are no psychosis biomarkers being used currently in routine clinical practice [11]. Accepting that psychosis is a field of intensive research, this suggests that as in other fields, the term biomarker is being used more broadly than the intended definition as “objective, quantifiable characteristics of biological processes, or of response to treatment or challenges” [12]. We follow the approach suggested by Davis and colleagues [13] in actively classifying biomarkers into those of (1) risk, (2) diagnosis/trait, (3) state or prognosis, (4) stage, and (5) treatment response. We review recent advances in molecular genetics/genomics and consider how these are, or could, contribute to biomarker development under each of these categories.

Common Variation from Genome-Wide Association Studies

In the mid-noughties, the advent of arrays capturing most common single nucleotide polymorphism (SNP) variation (minor allele frequency (MAF) >5 %) in the human genome provided a framework for systematic genome-wide association studies (GWAS) (reviewed in [14]). Notable successes for other common disorders (e.g., diabetes, inflammatory bowel disease) and traits (e.g., height, lipid) spurred international collaborative efforts to generate the large sample sizes required to perform studies for the major psychiatric disorders. In 2009, three schizophrenia GWAS consortia reported robust, replicable association signals in a 5.5 megabase (Mb) region around the major histocompatibility complex (MHC) on chromosome 6p [1517]. This locus, one of the most complex, genetically diverse genomic regions encodes ~250 genes including the classical and transplantation HLA alleles, but also many immune and non-immune genes. Subsequent studies have provided further association evidence while raising questions, including whether the locus represents a point of genetic difference between schizophrenia and bipolar disorder [18]. What these studies proved was that collaborative efforts successfully applied elsewhere in human genetics could fruitfully translate to psychiatric research. This gave impetus to an even larger effort, termed the Psychiatric GWAS Consortium (PGC), initially to combine all available GWAS data for five psychiatric disorders including schizophrenia and bipolar disorder [19].

In 2011, the Schizophrenia Working Group of the PGC identified seven novel significant loci in a sample of 9394 cases [20]. A GWAS of 7481 individuals with bipolar disorder (published in the same issue of Nature Genetics) confirmed association at the gene CACNA1C and identified a new intronic risk variant in ODZ4 [8] Combining schizophrenia and bipolar disorder yielded strong association evidence for CACNA1C and at the NEK4-ITIH1-ITIH3-ITIH4 locus [8]. Driven by increasing sample size, a steep trajectory of discovery has been evident, particularly in schizophrenia, where the number of independent risk loci has increased to 108 in an analysis of 36,989 cases and 113,075 controls [10]. For the purposes of this article, GWAS analyses have yielded four major insights, based on the most common psychotic disorders (schizophrenia and bipolar disorder). In the absence of data on other psychotic entities (e.g., delusional disorder, catatonia, brief psychotic disorders), we don’t yet know if these insights extend to all psychosis cases. The first three findings have implications for biomarkers of risk or diagnosis, the final insight being of relevance to biomarkers of state, stage, treatment response, and prognosis.

First, GWAS data made it possible to empirically test the role of polygenic inheritance. The International Schizophrenia Consortium reported a “polygene score” method that summed variation across a large number of nominally associated loci into quantitative risk profile scores (RPS) to ask whether these scores could predict disease state in independent samples [16]. From this work and subsequent studies, it has been estimated that thousands of common loci of small effect explain 30–50 % of the variance in genetic risk to schizophrenia, with similar results reported in bipolar disorder [2123].

Second, by comparing the polygene risk score across disorders, it was possible to demonstrate that schizophrenia risk overlaps significantly with bipolar disorder but not with six common non-psychiatric diseases. The PGC dataset allowed for more systematic testing for evidence that genetic risk factors are shared more broadly across psychiatric disorders, finding substantial genetic correlations between both schizophrenia and bipolar disorder with major depressive disorder, and a small but significant correlation between schizophrenia and autistic spectrum disorder (ASD) [23].

Third, RPS could be used to evaluate risk prediction in case–control samples in different populations. In the PGC2 schizophrenia study, the authors grouped individuals into deciles and estimated the odds ratio for affected status at each decile compared to the lowest decile [10]. The odds ratio increased to a maximum for the tenth decile in each of three independent samples, indicating that carrying more risk loci was associated with a demonstrable and progressive increase in risk. This was largest for a Swedish sample based only on individuals that had been hospitalized (OR = 15 (95% CI 12.1–18.7) and least for a population-based Danish sample (from inpatient and outpatient facilities) (OR = 7.8 (95% CI 4.4–13.9). Even in the Swedish sample, the sensitivity and specificity of RPS do not support its use as a predictive test.

Finally, the results of GWAS are potentially biologically informative and support established hypotheses, for example, the involvement of dopaminergic, glutamatergic, and calcium-channel genes. As more GWAS data becomes available across psychiatric disorders, it will be interesting to establish whether genes involved in immune function, synaptic plasticity (e.g., NLGN4X, IGSF9B, CNTN4, PTN), or neurodevelopmental mechanisms (e.g., FXR1, SATB2, TLE1) define discrete molecular etiologies with phenotypic expression including psychosis, or have a much more general role in susceptibility to psychiatric or brain disorders.

Rare Variation from Copy Number Variants to Point Mutations

Copy number variants (CNVs), or structural variants, are chromosomal rearrangements involving the deletion, duplication, inversion, or translocation of segments of DNA from a thousand to several million base pairs in length. Published in 2004, two independent studies demonstrated that in addition to larger variants defined cytogenetically, submicroscopic structural variants could be assayed using array-based methods and were unexpectedly widespread in normal human genomes [24, 25]. Two large studies using GWAS array platforms confirmed an excess burden of CNVs in schizophrenia, confirmed the known association with chr22q11.2 deletions, and identified novel associations with deletions at 1q21.1, 15q11.2, and 15q13.3 [26, 27]. Since then, further studies have extended the list to include more than a dozen loci including duplications (the Williams-Beuren syndrome (WBS) region, Angelman/Praeder-Willi syndrome (AS/PW) region, at PAK7, at 16p13.11), and deletions (at 3q29, 16p11.2, 17q2, 17p12) [2830].

Individually, the CNVs identified are of moderate penetrance for schizophrenia (odds ratio (OR = 2–30)), but in almost all cases, these mutations confer risk across diagnostic boundaries for developmental phenotypes including ASD, intellectual disabiliy (ID), and epilepsy [28]. In most cases, the risk for developing any significant developmental phenotype is high (ranging from 10 to 100 %) with the majority of the risk being for the development of an early onset disorder (e.g., ID, ASD, developmental delay) rather than schizophrenia, with lower risk again for bipolar disorder [31] and the position less clear for other psychiatric disorders.

Exome and genome sequencing makes it possible to assay rare coding point mutations and small insertions and deletions (indels): a huge reservoir of rare or even private sequence mutations in the human genome [32]. Such mutations are strongly selected against and represent de novo events or have very recent founders. As a class, rare mutations may have a more direct impact on function and phenotype (higher penetrance) than the subtle regulatory effects being elucidated for common risk variants. This makes them potentially important both as biomarkers and in informing models for functional follow-up in animal and cellular systems. The twin challenges are that risk or causative mutations may be distributed across hundreds of genes and the baseline mutation rate in the human genome is higher than expected making it difficult to draw statistical inference from a small number of rare, observed events in studies reported to date. More success has been had at the level of pathways, where enrichment of mutations has been identified for schizophrenia in genes that are part of the N-methyl-D-aspartate receptor (NMDAR) signaling complex, activity-regulated cytoskeleton-associated (ARC) protein complex, and among Fragile-X mental retardation protein (FMRP) interactors. Sequencing studies in both schizophrenia and bipolar disorder also identify significant enrichment for mutations among the 26 voltage-gated calcium ion channel genes [7, 9, 33].

Risk Biomarkers

A risk biomarker should be applicable to asymptomatic individuals to allow characterization of their risk of developing disease. A standard method of evaluating the clinical efficacy of such a biomarker is to apply the area under the receiver operator characteristic curve (AUC). Random prediction corresponds to an AUC value of 0.5; values greater than 0.75 can usefully identify high-risk groups for screening, but values of 0.99 can reliably diagnose a disease in the general population (in effect being diagnostic biomarkers) [34].

Most common disorders are highly polygenic with each risk variant only explaining a very small proportion of risk. Moderate prediction, sufficient to identify high-risk groups, can be achieved for common complex disorders where a small number of relatively large genetic effects contribute to risk (e.g., type 1 diabetes, coeliac disease) [35, 36]. By considering the effects of the large numbers of known risk SNPs jointly (e.g., with risk profile scores), the PGC2 schizophrenia analysis found that risk increases substantially between individuals in the lowest decile compared to those in the tenth decile for RPS (odds ratio (OR) estimated 7.8–20.3). The RPS is sensitive to risk and can be modestly boosted by including family history information or using broader diagnostic criteria (schizophrenia, bipolar disorder, and major depressive disorders) [37, 38]. However, on its own, RPS lacks specificity as the basis of a predictive test as the accuracy of risk prediction at the level of the individual is currently low (AUC < 0.7). The performance of polygenic models to predict risk will improve as sample sizes increase by virtue of more accurate estimation of effect sizes for individual SNPs. For most people, psychotic disorder susceptibility involves a contribution from thousands of risk alleles that are common in the general population, and this is likely to limit how accurate prediction based solely on RPS can become. Stratifying risk by RPS is likely to be useful in a research context, to examine specific risk groups (e.g., interaction with environmental cannabis exposure) or to investigate prognosis. There is also substantial interest in examining whether RPS data can be usefully integrated with other psychosis risk biomarkers (e.g., of immunological or inflammatory), an area that has been recently reviewed elsewhere [39].

An interesting aspect of the genetic architecture of psychotic disorders is the contribution to risk of rare CNVs, a finding that is not being seen for most common disorders. As much of the genetic variance underlying psychotic disorders is to be explained, rare highly penetrant mutations may have a significant role. For individual carriers, these genetic variants may represent risk or even diagnostic biomarkers. Within schizophrenia populations, 2–3 % of patients carry a large, detectable CNV at a locus known to be associated with either schizophrenia or other neurodevelopmental disorders [40, 41]. A rate of ~20 % is reported in cases identified clinically as having syndromal features (e.g., dysmorphology, learning difficulties, short stature) [41]. This is significantly higher than the reported rates in controls (<1 %) and has led to calls for routine CNV screening in clinical practice. At present, that might involve screening for 10–15 CNV events well supported by the literature. For each event identified by screening, it would be important to estimate the risk to a carrier of developing psychosis (i.e., the penetrance). This is modest (from 2 to 12 %), but not the full story: most of these CNVs are highly pathogenic with substantial risk of developing a neurodevelopmental disorder (NDD) phenotype. As an example, including developmental delay, autistic spectrum disorder (ASD) or congenital malformations (CM) as risk phenotypes increase the penetrance of 1q21.1 deletions from 5.2 to 40 %. The range of penetrance for serious NDD phenotypes for all 12 events is 10–100 %. Screening, particularly in patients with syndromal features, may prove important in managing co-morbid medical problems associated with specific CNVs (e.g., obesity, seizure disorder, or congenital cardiac defects) but also for genetic counseling. As more information on rare variation becomes available from sequencing studies and these studies are integrated with results of GWAS studies, interpretation of data for risk prediction, or diagnosis will become more complex.

Diagnostic Biomarkers

To be diagnostically useful, a biomarker must have sufficient sensitivity and specificity to characterize a disease or disorder and ideally should not overlap with other conditions. In our view, there is likely to be a future role for diagnostic biomarkers, but only in a small subset of psychosis cases. The extent of this role will depend on the contribution of rare, highly penetrant mutation to risk in the population, the ability to accurately identify pathogenic mutations and the ongoing challenge of characterizing disease.

We will understand more about the potential importance of rare mutations as ongoing large-scale collaborative genome sequencing projects help to delineate the genetic architecture of psychosis. Every human genome carries mutations predicted to disrupt protein function. Identifying sufficient statistical evidence that a rare mutation at a gene causes, or is associated with, disease is challenging. Where this has been established, an example being BCRA1 and breast cancer, further work can be required to characterize individual mutations at the gene. For BRCA1, more than 1000 mutations (from highly penetrant to benign) have been identified and with sequencing technology testing panels for familial breast cancer are still evolving to reduce false-negative findings [42]. The relationship between mutation and disease is not always so straightforward. Williams-Beuren syndrome is an example of a genomic disorder matching criteria for a diagnostic biomarker, as affected individuals have a similar or overlapping deletion impacting 26–28 genes. Carriers of the deletion at least share a definable, syndromal phenotype with a characteristic facial appearance, personality, neurodevelopmental and cardiovascular anomalies [43]. But in other instances, an example being the 1q21.1 microdeletion, the clinical findings are variable and don’t lead to a recognizable syndrome with some carriers being phenotypically “normal” [44].

Finally, genetic risk does not map neatly to how we currently classify psychiatric or developmental disorders. The evidence of shared common risk variants across categories and similar evidence of pleiotropy at rare CNVs points to underlying sharing of molecular disease mechanisms. A psychosis phenotype present as a feature of one genomic disorder may represent a completely different disease process to a similar phenotype in another such disorder. This is readily acknowledged in the drafting of current classification systems and is behind the Research Domain Criteria (RDoC) initiative to increase knowledge about fundamental processes associated with behavioral constructs and neural circuits to inform future tools for diagnosis [45].

State, Staging, or Prognostic Biomarkers

A biomarker of state is a measurable characteristic that reflects the severity of a particular disease process. It is yet to be established whether rare risk variants or a large burden of common risk variants are associated with greater psychosis severity. As the best characterized of the genomic syndromes related to psychosis, 22q11.2 deletion syndrome may in fact be associated with less impairment in global function than “sporadic” psychosis [46]. Understanding the relationship between more recently described genomic syndromes and psychosis severity is an important avenue for future research.

Staging models are routinely used for medical disorders based on the principle that earlier treatment is associated with better initial response, modifying disease progression and improving prognosis. Staging model for schizophrenia and bipolar disorder has been proposed [47, 48]; these share common features:

  • Stage 0 (increased risk of psychosis but no current symptoms)

  • Stage Ia (mild or non-specific symptoms with mild functional change)

  • Stage Ib (ultra-high risk, moderate but subthreshold symptoms (Global Assessment of Functioning (GAF) score < 70)

  • Stage II (first episode of psychosis, full threshold disorder)

  • Stage IIIa (incomplete remission)

  • Stage IIIc (recurrence or relapse)

  • Stage IIIc (multiple relapses with worsening in clinical extent)

  • Stage IV (severe, persistent, or unremitting illness)

Early intervention services have developed in many countries to identify and follow-up individuals at Stage I. The challenge is that the rate of transition to psychotic disorder is estimated as being 36 % after 3 years from early studies and less in more recent cohorts [49]. Consequently, treatments offered to this group tend to conservative and don’t include medication. There is significant interest in examining whether the genetic risk markers described above could be used to improve risk prediction, for example in combination with other biomarkers suggested to identify transition in patient groups (e.g., endocrine dysfunction, immune markers, neurophysiological biomarkers, neurocognitive changes, or neuroimaging findings). A natural continuation of this work would be to identify prognostic biomarkers to predict disease course and outcome.

Treatment Response

Ideally, biomarkers of treatment response would allow clinicians to provide personalized treatments, more likely to be effective for an individual patient. This is an important issue as it is not possible to predict which patients with psychosis will respond to which treatments. Not all patients respond to treatment; of those that do, a subset also develops serious side effects (e.g., weight gain, tardive dyskinesia, agranulocytosis (CIAG)).

Early studies investigated genes directly involved in drug action or metabolism with standard genetic association study methods. These provided no consistent evidence for useful treatment biomarkers [39]. More recently, GWAS methods have been applied to look for common risk variants, but the reported studies have been relatively small. Where promising findings have been reported (an example being a genetic variation at the GADL1 gene proposed as a sensitive (93 %) predictor of lithium response in a Han Chinese population) [50], these have generally failed to replicate in subsequent studies [5154]. This area remains one of substantial interest as there is evidence of a genetic contribution to the variability in treatment response of current treatments including clozapine [55] and lithium [56].

By contrast, encouraging progress is being made in identifying genetic biomarkers for antipsychotic drug metabolism, CIAG, and weight gain. The cytochrome P450 (CYP) enzymes metabolize antipsychotic medications and determine the proportion of the drug that is ultimately available to act on its target receptors. Genetic variability in the CYP genes has long been known to influence catalytic activity and drug metabolism. The CYP2D6 gene is responsible for regulating metabolism of 40 % of antipsychotic medications, and a poor metabolizer genotype has been identified where patients will have higher plasma levels of particular medications (e.g., haloperidol and risperidone) with consequently greater risk of side effects. A test based on this genotype, to identify individuals known to be poor metabolizers as a guide for antipsychotic prescribers has been approved by the FDA. However, perhaps because studies have not found a strong relationship between CYPD26 genotype and treatment response, this test is not widely used in clinical practice [57].

Clozapine is the most effective available antipsychotic treatment, including in otherwise treatment-refractory patients. Clozapine-induced agranulocytosis (CIAG) is a very rare but potentially life-threatening side effect. Having a biomarker for this side effect would be of great clinical utility. At present, most reported studies have been small although a SNP in HLA-DQB1 is significantly associated with CIAG (OR = 16.9). This is of potential clinical utility, but most people who develop CIAG don’t carry this allele so in the patient population, the sensitivity of the marker to predict CIAG is 21.5 %. More recently, a large international effort reported a GWAS for CIAG and identified two independently associated loci in the MHC region (HLA-DQB1 and HLA-B). The study was biologically informative, in localizing the signal to two discrete amino acids, but because the risk alleles are present in less than half of patients, these findings do not yet constitute a clinically useful test to predict “safe” clozapine use across the patient population [58].

Weight gain is a common and serious side effect of antipsychotic medication. There is robust evidence that variation in the genes coding for the melanocortin 4 (MC4R) and serotonin 2C (HTR2C) receptor genes have at least a moderate effect on antipsychotic-induced weight gain [59]. Tardive dyskinesia is a potentially irreversible motor side effect of long-term antipsychotic treatment. Modest effects have been identified for variants in a number of genes including CYP2D6, DRD2, and HSPG2, but results are not sufficiently consistent to inform risk prediction [60].

Conclusions

Despite a vast literature, there remains a dearth of clinically useful biomarkers to inform diagnose, stage, or guide treatment of psychosis. This is not surprising when considering that psychosis captures a clinical syndrome of uncertain etiology and may represent one or many discrete disease mechanisms. The major psychotic disorders are substantially heritable, and rapid progress in understanding the molecular etiology involved may guide the development of better biomarkers. Across the population, many common risk variants indicate overlap between psychosis syndromes and with other neuropsychiatric disorders. Ongoing efforts by the PGC will soon see GWAS studies with sample sizes (for schizophrenia) of >100,000 cases, and this is likely to identify many additional risk variants and provide more accurate estimates of what are small individual risk effects. For a subset of cases, rare mutations may be important in quantifying an individual’s disease risk or even in confirming diagnosis. As understanding of the genetic architecture of the more severe psychotic disorders (schizophrenia and bipolar disorder) improves, this may be helpful in stratifying patients by molecular etiology to identify more homogeneous patient populations for future research. Finally, in most studies, sampling is heavily weighted towards more severely affected individuals. Further work is required to identify overlap and discontinuities between these disorders and other psychosis syndromes (e.g., delusional disorder or brief psychotic disorders) particularly in informing prognosis and guiding treatment.