Introduction

Individualized treatment choice is one of the key areas of translational research. The number and quality of studies aimed at identifying the measures predictive of treatment response – so called response biomarkers – has substantially risen over the past few years. ‘Biomarkers’, defined as ‘characteristics that are objectively measured and evaluated as indicators of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention’ [1], are sought after in many disciplines of medicine – psychiatry is no exception.

Currently, the choice of psychiatric medications is based on general knowledge about the efficacy and safety of individual drugs [2•], drug pharmacology, and individual preferences of clinicians and patients. This will work for some patients, but it also means that others will face delays in response as certain treatments that are generally effective may not work for a particular individual.

Depression is a prime example of how helpful response biomarkers could be in expediting the treatment selection process. Although effective treatments for depression exist, only one third of patients respond to their initial medication, with many patients needing multiple changes before significant improvement can be seen, as highlighted by the Sequenced Treatment Alternatives to Relieve Depression study (STAR*D). STAR*D evaluated the effectiveness of depression treatments in a large cohort of primary and secondary care patients, in a design that reflected the real-life clinical practice, with changes of treatment every 3 months in case of response failure [3]. STAR*D not only made it painfully clear how few patients respond to their first line treatment but also that subsequent treatments have limited success rates, with about one third of patients not responding to any of the interventions attempted over a 12-month period [4]. Given the commonality of depression, with about 300 million sufferers around the world, according to World Health Organization, and about 1 in 5 people experiencing at least one episode in their lifetime [5], this translates into high levels of individual suffering with substantial societal and economic costs.

Biomarkers could substantially shorten time to response, by indicating which treatment an individual patient is most likely to respond to, irrespective of the place of this treatment in a traditional algorithm. For example, in STAR*D, tranylcypromine was introduced as an option as late as 9 months into the process, while a biomarker might indicate it as the best alternative for an individual patient, saving them almost a year of waiting to experience symptomatic improvement.

A range of treatment response biomarkers have been proposed, including symptom-based, sociodemographic, genetic, immune, endocrine, and neuroimaging markers [6]. Among these, neural predictors may be of particular interest as dysfunctional neural processes are central to the development and maintenance of depressive symptoms. The rapid development of neuroimaging techniques over the past 30 years has allowed for an assessment of the living brain in an unprecedented fashion, bringing about a wealth of information about its structure (e.g. structural magnetic resonance imaging, sMRI, diffusion tensor imaging, DTI), function (functional magnetic resonance imaging, fMRI, positron emission tomography, PET, electroencephalography, EEG, magnetoencephalography, MEG), or biochemical processes (magnetic resonance spectroscopy, MRS).

Neuroimaging in Treatment Response Prediction: Where Are We Now?

Single Drug Studies

The search for treatment biomarkers interweaved with research into the processes underlying depression and mechanisms of antidepressant (AD) action. A typical drug neuroimaging study employed a design in which a brain scan was performed before and after, typically, 6–12 weeks of treatment with a single AD medication; this time period corresponded with an expected significant clinical improvement in responders. In order to address the question of why only some people respond to medications, pre-treatment differences in brain function and structure between future responders and non-responders, as well as correlations between baseline brain activity and a change in depression severity over treatment, were explored (e.g. [7, 8, 9, 10]). These studies have provided valuable information on the mechanisms of action of AD treatments and the factors that determine response to individual drugs [11]. Brain regions most consistently identified as abnormal prior to treatment, in either function or structure, included the amygdala, ventral striatum, thalamus, hippocampus, anterior cingulate cortex (ACC), ventromedial PFC (vmPFC), orbitofrontal cortex (OFC), and dorsolateral prefrontal cortex (dlPFC). More recent studies have emphasized the role of brain networks of which these structures are a part, rather than that of single structures [12]. Importantly, AD treatment effects have been mapped onto the widely accepted models of biological mechanisms of depression [13]. These models advocate, schematically, an imbalance between overactive ‘bottom-up’ circuitry responsible for quick automatic response to emotionally salient information, in particular that of a threatening nature, and a decreased ‘top-down’ cognitive control. Both pharmacotherapy and psychotherapy appear to normalize this aberrant neural activity and restore the balance [11].

A different approach has been adopted in order to refine the understanding of the mechanisms of AD action, in particular those related to delays in symptomatic improvements that both patients and clinicians would perceive as significant. A new model, the cognitive neuropsychological hypothesis of AD action, proposed that a crucial element for AD response was an early positive shift in the processing of emotionally salient information, preceding any significant changes in mood. This shift would then be followed by learning new positive associations in the social environment, over time leading to the symptomatic improvement. A reduction in the negative bias has indeed been observed as early as after one dose at both behavioural and neural levels (see [14] for review). Recent neuroimaging studies in depressed patients focused on the understanding of the significance of this phenomenon for treatment response. One study [15] showed normalization of the amygdala response to fearful versus happy facial expressions, and another [16], normalization of mPFC and ACC activity during self-referential processing, in depressed patients receiving escitalopram for 1 week but not in the patients receiving placebo.

A proof-of-concept study provided support for the main point of the hypothesis that this early positive shift in emotional processing was predictive of future response to AD treatment in depressed patients [17•]. It showed that changes in neural response to fearful versus happy facial expressions across a number of structures, including ACC and amygdala, could differentiate between responders and non-responders to 6 weeks of escitalopram treatment. Crucially, for the model validation, the neural changes were seen before any significant effect on depressive symptoms could be measured. There is ongoing work aiming at translating this hypothesis into a tool that would allow its practical application, for example, to facilitate drug development through an identification of agents with an antidepressant profile or elimination of those which profile might suggest undesirable side effects (e.g. [18]).

Due to small sample sizes in individual studies and a high between-study variability – addressed in more detail below – it has been difficult to extract clear structural or functional patterns predictive of antidepressant response that could be consistently replicated in other populations. Meta-analyses (e.g. [19]) and systematic reviews [11, 20••, 21] have attempted to tackle this issue. They have identified the pregenual anterior cingulate cortex (pgACC) as the most reliable response biomarker, and the amygdala as the second best [11], although the amygdala’s consistency as a response predictor has also been questioned [20••].

Pregenual Anterior Cingulate Cortex (pgACC): A General Response Biomarker

At this point, probably the best supported candidate for a neuroimaging response biomarker is pgACC, with its increased activity being a predictor of good clinical response consistently shown over the past 20 years (e.g. [10, 22, 23]) and supported by meta-analyses and systematic reviews [19, 20••]. PgACC plays a crucial role in the development of depressive symptoms and antidepressant response due to its central position between the circuits responsible for a quick automatic response to emotionally salient stimuli and prefrontal regions exerting cognitive control over them. A recent review [12] emphasized its role at the network level and suggested that functional connectivity from pgACC showed a particular consistency in predicting antidepressant response.

This increased activity state may also be important for the clinical response to rapid-acting glutamatergic drugs [24]. Intriguingly, increases in activation of pgACC within the first few hours after administration of NMDA agents, lanicemine and ketamine, were shown to predict improvements in mood after 1 and 7 days [25]. It was hypothesized that this may represent pgACC’s switching into a treatment-responsive mode, necessary to restore the equilibrium between cognitive and emotional networks. Interestingly, conventional antidepressants seem to necessitate this increased activity from the beginning. This difference may be significant in the context of glutamatergic agents’ ability to elicit response in patients not responding to conventional ADs.

An association of increased pretreatment pgACC activity with positive treatment response has been seen across a number of treatments (pharmacological interventions and CBT), designs (e.g. task-based or task-free), scanning modalities (e.g. sMRI, fMRI, EEG, PET), and analytical approaches. Although this may indicate its robustness as a general response biomarker, it also suggests that it may be less useful in the context of the individualized treatment choice. On the other hand, normal or lowered pre-treatment activity of the pgACC could act as ‘the lack of response’ biomarker and help to identify people who might require more intense therapeutic input from the start.

From general to specific response biomarkers: direct comparisons of AD interventions.

Although general markers of response might have some clinical use, the ‘holy grail’ is the identification of biomarkers that could clearly indicate which specific drug or intervention would best match an individual patient, i.e. would have the highest likelihood of producing a symptomatic response in the particular individual. Although this research journey has only just begun, some intriguing observations have already been made.

In this context, particularly valuable have been studies in which two or more interventions were used in large groups of patients, with patterns of brain structure and activity associated with response to each intervention being examined. Such a design offered decreased variability due to similar testing conditions for each treatment arm, allowed direct comparisons of treatments and tackled problems associated with small sample sizes (see below for a more detailed discussion of these issues).

A number of studies addressed a general – but highly important – clinical question whether an individual patient with depression should be treated with pharmaco- or psychotherapy. Currently, talking therapies, and in particular cognitive-behavioural therapy (CBT), are recommended as the first-line treatment for mild to moderate depression and as such, in ideal conditions, should be implemented before pharmacological treatments [26]. In practice, the access to such therapies is often limited; hence a tool helping the decision of whom to allocate the available resources would be of great value. Also, given that not all patients with mild depression will respond to CBT, it is important not to treat these patients with CBT as it would cause an undue delay in response.

McGrath et al. [27•] examined this subject in a PET study, which showed that baseline hypometabolism in the right anterior insula was predictive of good response to CBT and poor response to an SSRI, escitalopram, while hypermetabolism in this region was associated with remission to escitalopram and poor response to CBT. This was an important finding; one question it was unable to answer was whether the patterns of brain metabolism were predictive of response to pharmaco- and psychotherapy as groups of treatments or if they were specific to the interventions used (i.e. CBT rather than psychological therapies and escitalopram or SSRIs rather than medications in general). Another study [28] showed a differential response to CBT and antidepressant treatments (an SSRI escitalopram and an SNRI duloxetine) based on resting-state functional connectivity of the subgenual cingulate cortex. A similar pattern of brain activity for both drugs was observed, suggesting differences between CBT and pharmacotherapy in general, rather than between individual treatments. This paper is based on the data from the Predictors of Remission in Depression to Individual and Combined Treatments (PReDICT) study, a large multicenter initiative comparing CBT, duloxetine and escitalopram [29]. Other neuroimaging results from PReDICT have not yet been published.

Another trial, International Study to Predict Optimized Treatment - in Depression (iSPOT-D), aimed at comparing response predictors to specific drugs, SSRIs escitalopram and sertraline, and an SNRI, venlafaxine, in 2016 depressed patients [30]. Although neuroimaging data was restricted to fewer than a hundred patients, interesting differences between treatments emerged. Among other findings, pre-treatment amygdala hyporeactivity to subliminal happiness and threat was identified as a general predictor of treatment response, regardless of the medication type, and pre-treatment amygdala hyperreactivity to subliminal sadness as a differential moderator of non-response to venlafaxine [31•]. This highlights a possibility that various tasks may be needed to assess response likelihood to different types of medications. In the similar vein, healthy control-like activation of dlPFC during a cognitive ‘go/no go’ task was a general predictor of remission, while its hypoactivation, relative to controls, was predictive of poor treatment outcome in general [32]. By contrast, inferior parietal activation during the same task, differentiated between SSRI and SNRI responders, with greater pretreatment activation associated with remission to SSRIs and no remission to SNRIs, while lower activation was related to remission to SNRIs and no remission to SSRIs.

Establishing Moderators and Biosignatures of Antidepressant Response for Clinical Care for Depression (EMBARC), another large multisite trial [33], provided interesting observations on sertraline and placebo response over 8 weeks of treatment, with about 300 neuroimaging datasets available for analysis. Responders to sertraline, as compared to placebo, were characterized by higher baseline connectivity within the default mode network (DMN), greater between-network connectivity of the DMN and executive control networks [34], an abnormal pattern of baseline pretreatment ventro-striatal response to reward expectancy and prediction error [35], and abnormal perfusion across a number of structures [36]. Intriguingly, some brain patterns seemed to favour placebo over sertraline. Lower connectivity of the hippocampus with the limbic and sensorimotor networks predicted better response to placebo, while higher connectivity predicted good response to sertraline [34]. In line with findings suggesting pgACC as a general predictor of response, stronger baseline connectivity between pgACC and rostral anterior insula, a hub in the salience network, predicted good response to both sertraline and placebo [23, 37]. A calculator for predicting the likelihood of placebo response at the individual level, based on clinical and biological data, has been presented [38].

These are examples of studies that through their design allow a direct comparison of therapeutic interventions in large numbers of patients, tackling the most problematic issues described in more detail below. An important aspect of these studies is collection of multiple types of data, not only neuroimaging but other biological measures, such as genetic, proteomic, immune, or endocrine data, as well as symptomatic, behavioural, neurocognitive and sociodemographic profiles. This presents new possibilities of in-depth analyses, taking into account numerous factors potentially affecting response and allowing development of more complex and more precise response predictors.

The above studies are a good start in the process of moving towards precision psychiatry. At this point, the numbers of studies and directly compared treatments are still restricted but hopefully in the future will grow sufficiently to allow more sophisticated comparisons, similarly to studies on the efficacy and safety of drugs [2•].

An important Issue: Reproducibility of Results

High reproducibility and generalizability of results is an obvious pre-requisite for a conversion of research findings into clinically useful tests. Reproducibility is unfortunately one of the common problems not only of neuroimaging studies in psychiatry but also of biomedical research in general. One analysis estimated that about 85% of biomedical research is wasted through a combination of researcher and institutional factors [39]. Regarding neuroimaging, there are also important methodological reasons why findings may be difficult to replicate; these factors need to be taken into account if prediction biomarkers are to make their way into the clinic.

The key issue is variability in study design [40]. The elements that may impact significantly on the outcome include the size of the group studied; eligibility criteria and sample characteristics, such as age, gender, duration of illness, past treatments, severity and length of episodes; choice of treatment; definition and measurement of response, and time when it is assessed (typically 6 to 12 weeks). There is also high variability in testing conditions, scanning parameters and analytical tools and approaches.

The sample may suffer from both too low and too high variability. Low sample sizes may result not only in inadequate power to reveal changes but also in restricted representations of features, which will affect generalizability of the model to other populations. This can be aggravated by a tendency, common in research, to include selected groups of patients with as ‘clean’ a version of the particular condition as possible, for example, depressed but otherwise healthy volunteers; such samples may not be representative of the ‘real life’ clinical populations. Contrastingly, another potential problem is too high variability, related to the nature of the diagnostic criteria. Psychiatry still uses traditional diagnostic labels, based on symptoms rather than causes. While this may be helpful for clinical management, given the lack of better alternatives, in research, it can translate into test groups including patients with similar symptoms but different biology, further reducing statistical power. Recent attempts to escape rigid symptomatic definitions of mental health disorders resulted in the development of approaches – such as Research Domain Criteria (RDoC) – integrating different types of information, such as genetic, imaging, behavioural and self-reported data, into dimensions of functioning [41]. This indicates a shift in thinking; however, it is rather cumbersome, and most studies still embrace traditional diagnoses.

Social factors are an important but often neglected aspect; for example, adverse external circumstances, such as low income or unemployment, have been shown to be among the best predictors of treatment outcomes. Their presence can strongly affect individual response and hence findings across different studies [42].

In task-based neuroimaging studies, unlike in cases of structural MRI or resting state studies, an important aspect to consider is variability of the tasks used. For example, the commonly used task based on watching emotional facial expressions has many different versions, regarding, for example, the types of emotions presented or whether emotions can be clearly seen or masked by neutral emotional expressions. On the one hand, this increases variability between studies. However, it may also be what is actually needed for a personalized treatment approach. As highlighted by the aforementioned studies (e.g. [31]), different variations of the task may be required to assess the likelihood of response to different types of interventions. More data needs to be collected to explore this intriguing and potentially practically important issue.

The Future of Neuroimaging in Treatment Response Prediction

Multimodal Approaches

Despite initial hopes, it seems unlikely that there will be a single consistently replicated neuroimaging biomarker that can predict antidepressant treatment response with accuracy high enough to warrant its translation into the clinical setting. Accuracy of neuroimaging response biomarkers typically stays in the range of 60–80%. In order to increase it, some studies have attempted to combine two or more markers, with some encouraging outcomes. For example, one of the iSPOT-D papers [43] described two decision trees allowing identification of treatment non-responders, one based on the volumetric measurements and another one on structural connectivity measures. Separately, they yielded accuracy of around 85%, however, if criteria for non-response based on both decision trees were met, accuracy rose to 100%. Crane et al. [44], in a study of escitalopram and duloxetine, showed increased accuracy of 90% when performance markers on a cognitive task and fMRI data were considered together, compared to 74% accuracy derived from clinical data alone.

This idea has been embraced by the large multisite studies mentioned above. Although for now reports mostly refer to single imaging modalities, the studies were designed to explore complex interactions between individual factors necessary for treatment response. To achieve this, a wealth of data is being collected, including neuroimaging, clinical, behavioural, cognitive, genetic, endocrine and inflammatory measures. One of the challenges is the lack of current understanding of which measures are significant, or indeed crucial, for response prediction. Just how important such knowledge is has been highlighted by one of the iSPOT-D papers, which showed a higher pre-treatment dlPFC activity during a cognitive task being associated with clinical improvement, but only in people without history of childhood abuse [45••]. It was hypothesized that this effect was due to diminished cognitive flexibility related to aversive childhood experiences.

Recently, inflammation has been suggested as a potentially important factor in treatment response and its prediction [46]. Baseline CRP levels have been shown to be associated with differential response to SSRIs and other AD medications, such as nortriptyline [47••] or an SSRI-bupropion combination [48]. These studies highlight the potential role of inflammation in treatment prediction modelling and emphasize that personalized treatment choice may necessitate going beyond standard AD strategies, such as an addition of anti-inflammatory agents in patients with high inflammation.

One of the ways to approach the vital questions about the role and weight of specific factors in response prediction is through employment of artificial intelligence, with machine learning as a tool.

Machine Learning

Machine learning is an approach involving patterns being extracted from existing datasets to predict outcomes in new datasets. This may particularly well suit psychiatric research, where large and complex datasets are still poorly understood, and often, there is no awareness what the important ‘pieces of the puzzle’ are. Machine learning allows for work around this problem as it searches for regularities in the data to fit a defined outcome, such as the response status, or without a defined outcome, in a fully date-driven way [49].

The latter approach was employed in a large multisite study [50••], which included 1188 patients with depression. Machine learning, based on distinctive patterns of resting state connectivity in limbic and frontostriatal networks, allowed for an identification of four diagnostic neurophysiological clusters, associated with differing clinical profiles; multisite validation and out-of-sample replication in populations similar in size showed its high (82–93%) diagnostic sensitivity and specificity. Importantly, the clinical data alone was not able to distinguish between the groups. The study has attracted a lot of attention in the context of clinical applicability. Its unquestionable advantage has been the inclusion of high numbers of patients. An adequate sample size is particularly important in the context of machine learning as training the models on small numbers of datasets leads to an exclusion of important pieces of information, strongly affecting generalizability of the findings and an application of the models in new populations. Pertinent to the topic of this paper, these clusters were associated with the level of response to transcranial magnetic stimulation (TMS) therapy in a group of 154 depressed patients. Unfortunately, another group, aiming at replicating the findings in a clinically more heterogeneous group of 187 patients with depression and anxiety, following the original procedures as closely as possible, failed to replicate the findings, i.e. show relationships between brain connectivity and clinical symptoms or identify distinct subtypes of depression [51]. This shows that even if machine learning could be potentially a powerful tool in precision psychiatry, at this point one should stay carefully optimistic; it is still a new instrument and any results should be interpreted with caution.

Treatment Response in Psychiatry: Precise and Personalized Treatment Choice

The concept of ‘precision psychiatry’ is gaining momentum, in line with increasing knowledge about different factors that can influence treatment response [52]. This approach focuses on the importance of unique profiles created for a given individual on the basis of their genetic, biological, behavioural and environmental features, and the use of such profiles to support choice of treatment that would best suit the specific patient. This is still work in progress, now supported by bioinformatics and artificial intelligence. For now, although a personalized approach to patients is common, the ‘precision’ has not yet been achieved.

Conclusions: Is There a Future for Neuroimaging Biomarkers in Treatment Choice for Depression?

Neuroimaging has allowed us to gain an unprecedented insight into the brain mechanisms of AD action in a relatively short time and has greatly increased our understanding of the neural factors that may play a role in response to antidepressant treatments. It has certainly proved itself as a valuable tool in research on treatment response biomarkers.

The question of whether neuroimaging will be used in clinical settings for now remains uncertain. At this point, the accuracy of prediction and replicability of findings are too low to support its conversion into a clinical tool. Another problem is availability of this technology and costs of scanning, when depression is a common condition mostly treated in primary care. At the same time, if accuracy of differentiation between treatment responders and non-responders is high enough, potential benefits may outweigh the costs of assessment. It needs to be seen whether the positive findings described here will be replicated in independent samples and whether they can be translated into ‘real-life’ clinical populations.

At this point it seems unlikely that there will turn out to be a single neuroimaging response biomarker of accuracy high enough to warrant its practical application. Neuroimaging findings may need to be considered together with other predictors in multimodal models, for example, by combining different imaging modalities or neuroimaging with other types of data, biological, social or clinical. The development of artificial intelligence and employment of machine learning gives hope that important measures will be identified within complex sets of data, leading to an identification of predictors, moderators, and mediators of treatment response. The ultimate goal would be establishing easy to apply clinical calculators, incorporating multiple predictors, indicating which medication should be used in an individual patient, and leading to the therapeutic process that will be both personalized and clinically valuable.