1 Introduction

The global obesity epidemic is one of today’s major public health concerns. According to the World Health Organization, 650 million adults or 13% of the world’s adult population were obese in 2016 and the worldwide prevalence in adults nearly tripled between 1975 and 2016. In addition to these concerning epidemiological characteristics, obesity is associated with multiple adverse consequences, including increased risk for cardiovascular disease, diabetes, cancer, premature mortality as well as depression and anxiety [1,2,3,4,5,6,7]. Besides social [8], genetic [9], hormonal [10] and behavioral [11,12,13] factors, central nervous factors promote the development and maintenance of obesity. Neural mechanisms that (i) underlie impaired food reward [14, 15], (ii) link food cues to (anticipated) reward [16, 17] and (iii) underlie reduced psychobehavioral control [16, 18, 19] are considered the main central nervous contributors to obesity. In developed countries these effects are increased by 24-h advertisement and availability of low cost, calorie dense, and highly palatable food.

Compatible with this multi-factorial etiology, three groups of treatments exist: lifestyle interventions (LIs), bariatric surgery (BS) and pharmacological interventions (PIs). LIs include caloric restriction, physical exercise, eating behavior modification and dietary counselling. Balanced hypocaloric diets induce clinically meaningful weight loss [20]. Optimal weight loss and maintenance are achieved when caloric restriction and physical exercise are combined [21, 22]. However, long-term weight regain is relatively common [23].

BS is currently the most effective treatment with regard to weight loss, attenuation of comorbidity (e.g., type 2 diabetes) and mortality prevention [24, 25] and thus the treatment of choice for severe obesity [26]. Sleeve gastrectomy (stomach volume reduction to 80 to 120 mL) is the most frequently recommended BS technique. Nevertheless, there is considerable variability in weight loss. Weight regain occurs in 20–30% of patients [27,28,29,30,31,32,33,34]. Long-term weight regain has been associated with a reversal of surgery-induced hormonal variations (e.g., in ghrelin and GLP-1; [35]), post-bariatric hypoglycemia [36], dietary non-adherence [37], and physical inactivity [38]. Problematic behavioral patterns are likely further aggravated in psychiatric patients [39]. The safety of BS has improved drastically in the last two decades, with perioperative mortality rates ranging from 0.03 to 0.2% [40,41,42,43]. However, complications can still occur and include early complications such as bleeding, thromboembolism, bowel obstruction and wound infection and as well as late complications such as stricture, reflux disease, cholelithiasis, hernia, nutritional and vitamin deficiencies, and dumping syndrome [44, 45]. In the long term up to 22% of patients require reoperation [46,47,48,49]. Due to the differences in efficiency and risk between treatment options, a prognostic tool which predicts treatment success and could thus guide individual treatment choices in a personalized medicine framework is highly desirable.

Only few drugs are used in clinical obesity management, including Orlistat, a pancreatic lipase inhibitor, Phentermine/topiramate, a sympathomimetic, appetite suppressant, Lorcaserin, a 5-HT2C receptor activator, Naltrexone/bupropion, a transmitter reuptake inhibitor and Liraglutide, a glucagon-like peptide 1 (GLP-1) analogue. Especially GLP1-analoge trials have produced promising results with Liraglutide treatment usually resulting in a weight loss of 4 to 6 kg and Semaglutide treatment demonstrating even greater weight loss [50, 51]. Side effects of liraglutide include gastrointestinal symptoms, such as nausea, diarrhea, constipation, and vomiting [52]. PI weight loss is partly or completely reversed after treatment [53]. PIs are only recommended as add-on to LIs [54]. See [54] for a more detailed overview on LI, [55] for BS, and [52] for PI.

Currently, a variety of studies exist that use computational approaches and neuroimaging signals to predict treatment outcome in obesity. In this review, we explain the key central nervous mechanisms assessed in these studies, present the different neuroimaging and computational prediction techniques, and give a detailed overview of existing studies. In the Discussion, we pay special attention to the questions of (i) whether the results obtained are sufficient to legitimate clinical real-world applications (which is presumably not yet the case), (ii) what could be done to meet this requirement and (iii) how statistical analyses could be improved to provide more detailed models for “treatment” and “treatment outcome”.

We included longitudinal LI, BS, and PI studies. We required a period of at least one months between treatment initiation and the latest follow-up. Compatible with the general meaning of “prediction” as a forecast of future events and of “prediction” as a statistical process modelling some factor based on other factors, we included studies that prognose future treatment outcome using neuroimaging biomarkers and studies that predict treatment-induced variations in outcome markers based on variations in neuroimaging parameters across treatment.

2 Central nervous mechanisms affecting body weight

At least partially motivated by the discovery of overlapping psychobehavioral symptoms in persons with obesity and substance dependence such as loss of control over consumption and craving [16, 56], neuroimaging research on central nervous parameters impacting body weight has focused on three major reward-related mechanisms: Reward system hyposensitivity to food consumption, reward system hyperresponsivity to stimuli predicting food consumption, and dysfunctional psychobehavioral or goal-directed control system respectively.

2.1 Reward system hyposensitivity to food consumption

A reduced sensitivity of the brain reward system to food consumption (including the actual pleasurable impact of consumption) is regarded as a factor that triggers excessive overeating as a method of compensation [14, 15, 57]. In accordance with addiction research, a reduced striatal dopamine (DA) release after food consumption and reduced availability of DA receptor subtype D2 (D2R) are discussed as causes for said hyposensitivity.

Specifically, Small et al. found a food-intake induced DA release after a 16 h fast that reflected the pleasantness of food consumption in normal weight persons [58]. This is compatible with findings showing that amphetamine-induced DA release correlates with the experienced pleasantness of amphetamine consumption in healthy subjects [59] and a reduced DA release in detoxified cocaine abusers [60]. The pleasurable impact of consumption is frequently referred to as “liking” [17]. Van de Giessen et al. found reduced DA release in obese persons in the sense that amphetamine induced a significant DA release in lean but not obese persons [57]. Based on these findings, it was concluded that blunted DA release after consumption is a mechanism underlying compensatory overeating in obesity (e.g., [57]). Moreover, in agreement with findings made for addiction [61], obesity research found that the availability of striatal D2R is significantly decreased in obese persons [15, 62] and is negatively related to their BMI [15]. Consistently, Johnson & Kenny found a progressive decrease in D2R-availability accompanied by a progressive increase in compulsion-like overeating in rats randomized into an overeating condition compared to control animals [14] and Geiger et al. found reduced extracellular dopamine in the striatum in a comparable experimental setting [63]. Johnson & Kenny concluded that the observed progressive dopaminergic hyposensitivity reflects the transition from normal to compulsive overconsumption [14].

However, some findings question the importance of (dopaminergic) reward system hyposensitivity as a risk factor for weight gain. Hardman et al. found that suppressing DA signals leads to a reduced food intake in humans [64]. Tellez and colleagues showed that down-regulation of DA release due to a prolonged high-fat diet reduces caloric intake in rodents [65]. Several studies on addiction research depleting/antagonizing DA functioning did not find an impact on drug or food-liking. Only a reduction in motivational properties (i.e., of “drug-wanting”, see Sect. 2.2) was found [66, 67]. Importantly, Tellez and colleagues also showed that application of oleoylethanolamine restores DA release [65]. Oleoylethanolamine is a lipid messenger whose synthesis is suppressed due to the high-fat diet. Their study provides a mechanism that explains reduced DA release as a consequence of a prolonged high-fat diet. Consistent with this description of reduced DA release as a consequence (and not a cause) of an excessive calorie intake and given the experimental design applied by Johnson & Kenny [14], one might assume that the progressive decrease in D2R-availability observed in their study reflects the end-point of rather than the transition to compulsive overconsumption. See [6870] for an overview. Finally, the exact role of calorie-intake related DA receptor alterations appears unclear due to findings of Dobbs et al. showing that D2R downregulation can be associated with a D1R hyper-reactivity, suggesting more heterogeneous DA adaptations [71].

Therefore, a causative role of a DA-mediated reward system hyposensitivity for weight gain remains debatable. Only one of the two Positron-Emission-Tomography (PET) studies directly evaluating a link between DA functioning and future weight loss [72, 73] found such a link [72]. Authors elaborating on alternative neurotransmitters suggest endorphins or endocannabinoids as substrates of drug or food liking (see e.g., [17] for an overview). An important role of endorphins would be consistent with findings of the PET study not showing a link for DA [73] but showing negative associations between body weight variations and μ-opioid receptor availability in amygdala, insula, ventral striatum, and putamen. In conclusion, additional research on DA-based reward hyposensitivity appears necessary given contradictory findings.

2.2 Reward system hyperresponsivity to stimuli predicting food consumption

“Incentive salience” is a motivational mechanism considered to initiate compulsive food seeking and consumption after food cue exposure because these cues were coupled to reward (consumption) by Pavlovian conditioning in the learning history of an individual. Consequently, food cues are predictive of food intake and can acquire similar motivational properties as food reward after repeated couplings [74]. Incentive salience relies on a hypersensitivity of the DA reward system to these cues [17] and corresponds to a strong desire, a strong “I want to consume feeling” on a psychological level. This desire has consistently been termed as “wanting” [17] or “craving” [56].

Early work relating this cue-dependent motivational mechanism to striatal DA was done by Schultz et al. who showed in a pivotal animal study that the response of striatal DA neurons varies across different stages of food exposure. Animals respond to palatable food consumption during early stages of exposure, but only to cues predicting consumption after repeated exposure [75]. Hamid et al. linked striatal DA to cue-sensitivity by showing that it reflects the willingness to engage in effortful activities to obtain reward after cue exposure in rats [76]. Furthermore, the role of striatal DA for food wanting in humans was underlined by van de Giessen et al. who found that DA release after amphetamine intake in obese persons correlated with food craving on trait level [57].

Although incentive salience is primarily a motivational phenomenon, it also comprises attentional, affective, learning-related, and behavioral facets. Consistently, functional magnetic resonance imaging (fMRI) studies using cue reactivity (CR) tasks did not only find striatal hyperresponsivity to high-calorie food cues but also a hyperresponsivity in anterior cingulate cortex and visual areas, amygdala, orbitofrontal cortex, and hippocampus [77,78,79,80]. CR tasks represent the key functional paradigms for studying incentive salience which contrast neural signals emerging during perception of food cues to those during control conditions (see 3.1.1). In this framework, the anterior cingulate cortex / amygdala / visual areas are supposed to modulate the attentional [81] / emotional [82] / sensory salience of food cues (cf. [83]). The orbitofrontal cortex might underlie stimulus – outcome encoding in Pavlovian conditioning [84]. The hippocampus plays an inhibitory role in appetitive Pavlovian conditioning [82]. Consistent with the concept of incentive salience, a hyperresponsivity in these areas was found to predict unfavorable treatment outcome in a variety of reviewed CR studies.

However, some studies did not support a link between incentive salience and treatment outcome. Specifically, neither the BS fMRI study of Bach et al. nor the PI fMRI study of Ten Kulve et al. found significant associations between brain activity evoked by food-cue presentation before the treatment and treatment-induced weight loss [85, 86].

2.3 Dysfunctional goal-directed control system

A dysfunctional psychobehavioral or goal-directed control system and reduced modulation of incentive salience by this system is considered a further mechanism contributing to overeating [19]. This can be understood when viewing eating behavior from a decision-making perspective. The Pavlovian incentive salience mechanism primarily mediated by the striatal DA system can be seen as a decision-making mechanism favoring choices that have previously been associated with immediate and highly rewarding consequences. In line with its subcortical location, this striatal mechanism does not consider future consequences [19]. By contrast, the goal-directed decision-making system is driving (food) choices by comparing different options based on action plans encoding their present and future consequences [19, 87]. Thus, this system could inhibit the impulse to eat a tasty but unhealthy food (e.g., triggered by the striatal DA system) because it predicts that the negative consequences of future overweight outweigh (i.e., have a higher negative value) the positive consequences of immediate reward (i.e., their positive value) [19].

Hare et al. identified value-based goal-directed decision-making regions in the brain by having self-reported dieters choose between two food items: a constant reference item with average taste- and health related properties and another that varied in these aspects [88]. Ventromedial prefrontal cortex (vmPFC) activity predicted the food choice (i.e., its value) independent of the food’s tastiness or healthiness. Activity in the dorsolateral prefrontal cortex (dlPFC) reflected self-control (i.e., was higher when subjects chose healthy). VmPFC and dlPFC activity correlated only during successful self-control trials. The authors concluded that the vmPFC computes a value-signal which determines food-choice and relies on both factors, reward (taste) and control (health) only when it reflects control-related dlPFC activity. VmPFC activity alone only reflects reward (taste).

Another study employed a delay discounting (DD) paradigm [87]. In DD tasks, participants have to decide repeatedly between rapidly available smaller rewards or larger rewards available at a later time (see 3.1.2). A weaker preference for earlier smaller than for larger delayed rewards is considered as a behavioral marker for goal-directed control. This study highlighted the importance of the interplay between fronto-parietal control areas and striatal incentive salience areas for goal-directed control. Stronger goal-directed control depends on stronger lateral-prefrontal relative to striatal activity. Please see [89] for findings suggesting an inhibitory impact of prefrontal on incentive salience regions including striatal ones (modulated by the specific calorie-restriction type applied). A direct link between key regions of goal-directed and striatal Pavlovian control is consistent with the finding that DA depleted mice do not at all initiate goal-directed behaviors including feeding [90]. In addition, animal studies suggest that the insular cortex also contributes to goal-directed decision-making as lesions to this area impaired the ability of rats to devalue food after satiety and to adjust their food choice accordingly [91].

The clinical importance of this factor has been demonstrated on a behavioral level in DD studies showing reduced goal-directed control in obese persons [92, 93]. These studies controlled for nuisance factors (e.g., age and income). Studies not controlling for these variables failed to show these effects (e.g., [94, 95]). Neuroimaging studies in obese subjects revealed a link between reduced D2R availability in the striatum and a reduced resting-state (RS) glucose metabolism in regions involved in goal-directed decision-making such as vmPFC and dlPFC [96]. In [97] we could demonstrate the importance of behavioral and neural measures of goal-directed control and their interplay with striatal Pavlovian regions for the dietary success of obese persons in a 12-week LI. Higher behavioral goal-directed control was coupled to better weight loss. Functional connectivity (FC) between vmPFC and dlPFC was positively related to behavioral control and weight loss and FC between vmPFC and dorsal striatum was negatively linked with future weight loss. We evaluated the role of the interplay between Pavlovian and goal-directed neural systems in a LI study by testing whether future dietary weight loss and long-term maintenance after treatment across 39 months could be predicted based on activity assessed in a food CR paradigm, a food-specific DD paradigm, and the interaction of these activities [18]. This revealed a strong link between future long-term weight loss and interactions between visual Pavlovian and insular control areas.

3 Neuroimaging techniques and parameters used for prediction

Task- and RS-fMRI as well as structural MRI (sMRI) are the neuroimaging acquisition techniques predominantly employed in the reviewed studies. fMRI provides indirect markers of neural activity by measuring vascular responses to heightened metabolic demands of active neurons [98] while sMRI provides information on various brain tissue characteristics. Neuroimaging parameters derived for prediction from fMRI and sMRI can be subdivided in two major groups: Parameters characterizing specific, localized processing of individual brain regions (“functional segregation”) and those reflecting the interplay or FC of activity among different regions respectively (“functional integration”). All methods described in this section are illustrated in Fig. 1.

Fig. 1
figure 1

Neuroimaging techniques and parameters utilized in the reviewed studies. (a)–(c) illustrate the basic layouts of the three fMRI tasks, i.e., CR (a), DD (b), and food CrvR (c). The panels (d)–(i) depict the different parameters derived from RS fMRI. In particular, 1d illustrates the ALFF method, (e) FCD mapping. (f) shows a component loading map for a RS-network extracted by independent component analysis. Moreover, (g) illustrates the seed-to-voxel FC approach. (h) shows a correlation (i.e., FC) matrix obtained for temporal and deep GM regions for RS fMRI data of a single subject and time point. FC depicted is thresholded at r =|0.5|. (Only) temporal and deep GM regions were selected to facilitate a better readability of the panel. The network depicted in (i) corresponds to the areas / FC depicted in the correlation matrix in (h). This network has a global efficiency of 0.84. (j) illustrates a PET scan using the [11C] raclopride radio-tracer. Finally, (k)–(m) depict the structural MRI measures. Specifically, (k) shows a brain voxel map of the GM (left) and WM (right) volume of a participant determined with VBM. (l) illustrates an approach to cortical thickness estimation that treats the distance between two closest vertices on the opposing WM/GM surface and the GM/pial surface as measure of cortical thickness for the corresponding cortex segment. (m) illustrates the fractional anisotropy determined with DTI for a single participant and time point on the left. In order to illustrate the directional information contained in DTI maps (and used for fiber tractography), the direction of the first tensor for a given voxel is depicted with a red–green–blue coding on the right. For further details, see text

3.1 Task-fMRI

Three task-based fMRI paradigms outlined below are currently used for treatment outcome prediction in obesity: CR, DD, and food craving regulation (CrvR). [97, 99101] used them to derive measures of FC, the rest exclusively computed markers of localized activity in individual brain coordinates (i.e., voxels).

In these task-fMRI studies, markers of localized activity (referred to as “Voxel CR, DD, or CrvR activity” in the following), are computed in a two-step process. First, three-dimensional maps of neural activity reflecting the targeted mechanism in individual voxels are determined for each participant and time point. Second, these parameters are entered into a predictive group-level analysis utilizing methods described in Sect. 4.

Task-related FC markers are also computed in a two-step procedure (“Seed-to-voxel CR FC” or “Seed-to-voxel DD FC”) in the majority of studies evaluating task-related FC [97, 99, 100]. First, a seed coordinate sensitive to the evaluated factor is selected based on prior knowledge and the association between its time series (potentially modulated by the time course of a condition of interest [99, 100]; see [102]) and all other voxels is computed. Second, the voxel-wise correlation/regression coefficients are entered into a group-level analysis.

3.1.1 Cue reactivity

The key experimental design to study incentive salience, which is applied in the majority of all reviewed studies (see Tables 1 - 3), is the food CR paradigm. Reflecting the notion that exposure to food-cues (e.g., pictures, taste, odor, or imagined food items) can trigger food-wanting/craving and subsequently food-intake [74], CR tasks typically present pictures of high-calorie food and control items such as pictures of neutral objects or low-calorie foods. Participant-specific voxel contrast maps reflecting activity related to incentive salience are then computed by subtracting activity during the control condition from activity during presentation of palatable foods to control for non-food related activation.

Table 1 Overview on existing lifestyle intervention studies. Studies are subdivided by neuroimaging technique. Studies may be listed more than once if more than one neuroimaging technique is applied. In order to ease comprehensibility of this and the two other tables, we occasionally simplified the presented study characteristics. In particular, if a study comprised the prognostic and associative modelling approach, only one was mentioned (in this case the study would have been classified as “Prognostic” in the column “Modelling: Outcome” as we considered this modelling approach more meaningful). Similarly, if a study comprised several outcome markers, only one was mentioned in the column “Modelling: Outcome”. In case changes in bodyweight or weight loss respectively was modelled in addition to other outcome markers, “bodyweight” was mentioned in the column “Modelling: Outcome” as we considered this outcome most relevant. Consistently, “Predictor” and “Significant prediction results” only list those predictors and results that relate to the parameters reported in column “Modelling: Outcome”. The codes for modelling parameters and outcomes in “Significant prediction results” (such as “CR_fMRI_T0 → OUT_T3” or “PET_T0 → OUT_T3”) report the predictor and the time point(s) the predictor was derived of on the left sign of the arrow, on the right side the time points for the outcome marker are reported. Despite slight potential inaccuracies resulting from this procedure, a period of four weeks was converted to one month in these time point codes. In column “Experimental design “, these time points are reported on a (higher) week-level accuracy. Consequently, for example, the code “CR_fMRI_T0 → OUT_T3” would refer to modelling an outcome marker measured after three months of treatment based on a food CR fMRI parameter measured immediately before treatment onset. Finally, the number of time points listed after each “T” in these codes is important: if only a single time point is listed after a “T” (such as in “OUT_T0” or “OUT_T0_T3”) than parameter raw values measured at the specific time point(s) were modelled with one model – in this case for the baseline time point alone (“OUT_T0”) or the baseline time point and a time point after 3 months (“OUT_T0_T3”). If, however, a “T” is followed by two time points (such as in “OUT_T0_3_T3_9”) than temporal difference markers for the respective parameter were modelled with one model – in this case differences between baseline and after 3 months, and after 3 vs. 9 months. Thus, for example, the code „CR_fMRI_T0_T3_T15_T27 → OUT_T0_3_T3_15_T15_27_T27_39 “ refers to one model were temporal differences in an outcome marker between time points T0 & T3, T3 & T15, T15 & T27, and finally T27 & T39 were modelled on CR fMRI parameters sampled at T0, T3, T15, T27. Finally, in addition to the studies presented in the table, we want to mention an LI study of Hege et al. [195] who used magnetoencephalographic data to prognose future WL in 33 overweight or obese participants. They found that higher activity in superior temporal gyrus, fusiform gyrus, hippocampus, inferior temporal gyrus, insula, Heschl gyrus, fusiform gyrus, insula went along with successful weight loss, lower activity in middle occipital and inferior frontal gyrus went along with successful weight loss. We mention this study separately, as the measurement technique (magnetencephalography) and the task applied (1-back memory task) deviate strongly from the other studies presented in this work

3.1.2 Delay discounting

DD tasks are well established experimental designs for the study of goal-directed control in obesity (e.g., [18, 97, 103,104,105,106]) in which participants have to decide multiple times between immediately available smaller rewards and larger delayed ones. Several methods exist for computing participant-specific voxel contrast maps reflecting goal-directed control in DD tasks such as contrasting more immediate options to more delayed ones [87], or contrasting difficult (similar attractiveness of immediate and delayed choices) vs. easy trials (dissimilar attractiveness; e.g., [106]). Another method is to first determine a behavioral measure of goal-directed control that allows modelling the subject-specific value of options (rewards) based on their reward magnitude and delay and to then compute the voxel-wise association between this model function and local activity (e.g., [97, 107]).

3.1.3 Food craving regulation

Paradigms requiring their participants to actively regulate affective states induced by generic emotional stimuli have either been applied without changes in obesity research or were slightly varied to study craving regulation induced by food stimuli. For example, a study investigating emotion regulation during presentation of generic affective stimuli found that obese persons have more emotion regulation difficulties assessed via questionnaires than controls. In addition, higher vmPFC activity during regulation is associated with less regulation difficulties [108]. Food CrvR paradigms (used for treatment outcome prediction in [68]) evaluate the effect of regulation strategies on food-cue elicited craving. Trials typically start by presenting a strategy word (e.g., “permit” or “regulate” [69]), which is followed by a high- or low-calorie food picture that should either be perceived in a permissive fashion (i.e., allowing oneself to perceive the potentially induced craving) or during application of a regulation strategy. Finally, participants rate their desire to consume the depicted food. Contrasting signals emerging during high-calorie food & permit (high-calorie food & regulate) vs. low-calorie food & permit (high-calorie food & permit) enables computing voxel activity maps for incentive salience (goal-directed control). Thus, this paradigm might be seen as a mixture of an incentive salience and goal-directed control task.

3.2 Resting-state fMRI

RS fMRI measures spontaneous low-frequency brain activity under task-free conditions and has revealed fundamental aspects of how the brain is organized and works, i.e., its intrinsic organization in separate networks (i.e., the RS networks [109]) or that RS network activity impacts task-related activity of RS network [110].

3.2.1 Amplitude of low frequency fluctuations

The Amplitude of low frequency fluctuations (ALFF) method is applied in three reviewed BS studies [111113] and characterizes spontaneous low-frequency brain activity by estimating the magnitude of these fluctuations in a small frequency band (e.g., from 0.01 to 0.08 Hertz [114]) for each voxel coordinate. Initially, the average square root of the power in this frequency band of a given voxel’s time series divided (i.e., standardized) by the average of this parameter across all voxels was used as voxel ALFF measure [115]. The improved fractional ALFF method uses the square root averaged across the full power spectrum for a given voxel as a standardization method [116]. It was suggested that frequency sub-bands within 0.01 to 0.08 Hertz reflect spontaneous low-frequency activity of different neural tissue types and that several diseases other than obesity induce alterations in ALFF (see [117] for an overview).

3.2.2 Functional connectivity density mapping

Functional connectivity density (FCD) mapping (employed in one reviewed study [118]), estimates the degree of FC for each voxel, and primarily aims to reveal areas of dense local FC (so-called “hubs”; [119]). Specifically, “local FCD” reflects the number of voxels in a cluster surrounding a center voxel having at least a predefined FC. “Global FCD” corresponds to the number of voxels having a suprathreshold FC with the center voxel irrespective of neighborhood minus its local FCD. The clinical relevance of FCD was e.g. supported by findings of an altered local FCD in schizophrenia [120] and a relation of FCD and severity of subclinical depressive symptoms in healthy elderly [121].

3.2.3 Independent component analysis

Independent component analysis is a statistical method that identifies RS networks by computing so-called independent components (ICs). These are transformations of the multivariate voxel input data which are stochastically independent (and not only mutually uncorrelated as in principal component analysis) and can be understood as characteristic RS voxel time series (e.g., [122]). One IC reflects the activity time course of one RS network. After the ICs are identified, they can either be related to treatment outcome directly as in [123] or relations between ICs/RS networks of interest and voxel-wise RS fMRI time courses are determined using participant-specific voxel-wise regression analysis. The resulting correlation/regression coefficients are finally handed over for a group analysis [124].

3.2.4 Seed-to-voxel functional connectivity

The seed-to-voxel FC for RS fMRI (“Seed-to-voxel RS FC”) applied by [125127] is technically identical to its task-related counterpart described above.

3.2.5 Functional connectivity between anatomical atlas regions

FC for prediction has also been computed based on time series averaged across voxels located in anatomical atlas regions [111, 128132]. Using averaged regional time series requires less priori knowledge as one can simply compute the FC between all atlas regions. Another advantage might be the method’s relative robustness to outlier voxels through spatial averaging. However, the method does not make full use of the spatial resolution fMRI is offering.

3.2.6 Functional connectivity network-analysis

This (group of) technique(s) aims at characterizing the structure of connected units interacting in complex social, economic, genetic, or neural networks [133]. One major aim is to assess the efficiency of network information flow [134]. Independent of the domain (e.g., social or biological) these techniques have shown that networks with a “small-world” structure (i.e., having a dense local and a sparse long-range connectivity) are both globally and locally efficient with regard to information flow because the average distance between any pair of units (here: brain regions) in such a network is small [135]. One of the reviewed studies [101] utilized a network technique for treatment outcome prediction. Specifically, these authors computed FC using the technique described in 3.2.5 for individual participants first and then determined the global efficiency (see [134]) for each participant-specific FC pattern for prediction.

3.3 Neurotransmission assessed with Positron-Emission-Tomography

Positron-Emission-Tomography (PET) is a technique allowing to measure biochemical and physiological activity across biological tissues on a voxel-level by applying radio-tracers (e.g., [11C] raclopride and [11C] carfentanil) (employed in two reviewed studies [72, 73]). This method can be used in a task-related or RS fashion and has been applied extensively in obesity research to measure transmission of DA and other neurotransmitters (e.g., [15, 5760]). Steele et al. related Roux-en-Y gastric bypass (RYGB)-induced D2R availability changes to weight loss in a 6-week period after surgery and report a positive association [72]. Karlsson et al. related presurgical μ-opioid receptor and D2R availability to post-BS weight [73]. While no associations were found for D2R, especially amygdala μ-opioid receptor availability was negatively associated to future body weight.

3.4 Structural neuroimaging

3.4.1 Brain tissue volume

One of the most frequently evaluated tissue properties in structural neuroimaging in general and in the reviewed structural studies specifically [136143] is voxel-wise tissue volume. Except for Best et al. [138], Voxel-Based Morphometry (VBM; [144]) was used in these studies for computation. VBM is implemented in SPM12 (Wellcome Trust Centre for Neuroimaging, Institute of Neurology, UCL, London UK ­ http://www.fil.ion.ucl.ac.uk/spm). In VBM, anatomical brain images are spatially registered to an anatomical reference space and segmented into the three tissue types grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF). By additionally considering the amount of local deformation applied during spatial registration, the method produces markers of the voxel-wise volume for each of the three tissue types (referred to as “Voxel GM or WM volume” in the following). Contrary to VBM, the method used by Best et al. [138] computes volumes for larger regions included in an anatomical atlas and is implemented in FreeSurfer [145, 146]. For a method comparison, see Guo et al. [147].

3.4.2 Cortical thickness

Another structural brain property evaluated frequently in structural neuroimaging in general and in one reviewed studies [148] is cortical thickness (implemented e.g., in FreeSurfer [146]). In short, this parameter is computed by determining the WM/GM transition surface and the GM/pial transition surface in a first step and by determining the distance between these two surfaces for small spatial units (“vertices”) in a second.

3.4.3 Brain diffusion

Diffusion MRI enables evaluating brain fiber characteristics by assessing directions of water molecule diffusion and was applied in two of the reviewed studies [99, 139]. Specifically, utilizing the fact that water can diffuse equally into any direction in unstructured spaces such as CSF but only in directions predetermined by biological structures and their integrity in neural tissues, measurement and modelling of water molecule diffusion allows evaluating axon bundle orientation and integrity [149]. A method frequently used for this purpose is Diffusion Tensor Imaging (DTI). In DTI, water molecule diffusion is measured for a predefined number of diffusion orientations. Subsequently, participant- and time point-specific voxel maps reflecting different diffusion properties are determined by fitting an ellipsoid to the three-dimensional diffusion information. The two most important fiber characteristics derived thereof are Fractional Anisotropy, which can be understood as the degree of diffusion directedness, and Mean Diffusivity, a measure of overall diffusivity. Finally, a method complementing DTI is fiber tractography which aims at tracing WM tracts based on the directional information provided by diffusion-weighted MRI. One of the reviewed studies applied tractography [99]. Soares et al. provides an overview on DTI and tractography including a list of available software packages [150], Maier-Hein et al. highlights pitfalls in tractography [151].

4 Computational prediction approaches

This section describes the methods used for assessing treatment outcome on the group level across the reviewed studies. Except for ordinary least squares (OLS) regression, these methods are illustrated in Fig. 2.

Fig. 2
figure 2

illustrates computational approaches used for treatment outcome prediction on the group level. In particular, (a) illustrates the LMM regression approach and is taken from [18]. (b) depicts an application of support vector classification for a hypothetical classification task in which a classifier has to learn the differences between voxel GM patterns belonging to very successful dieters and less successful dieters in the training stage. In the next step, the classification boundary estimated from the training data is used to predict the class of an unknown test person based on their GM pattern. (b) is derived from Weygandt et al. [172]. Finally, (c) shows a hypothetical structural equation model (in part derived from [169])

4.1 Linear regression

4.1.1 Ordinary least square regression

The technique used for treatment outcome prediction on the group level in the majority of reviewed studies is voxel-wise OLS regression. This technique is implemented in a variety of software packages such as SPM, FMRIB Software Library (FSL) [152], Analysis of Functional NeuroImages (AFNI; http://afni.nimh.nih.gov/afni) [153], or BrainVoyager (http://www.brainvoyager.com/) [154]. OLS regression identifies an optimal set of regression coefficients by minimizing the sum of squared differences between true and predicted values for the dependent variable (e.g., weight loss obtained in a certain interval or brain activity in a certain voxel). Given that multiple voxels are tested, methods correcting for family-wise error (e.g., the Random Field Theory, Bonferroni, or the maximum statistic method [155,156,157]) have to be applied to evaluate the significance of individual voxels’ tests. A drawback in a longitudinal framework is the method’s sensitivity to drop-out, as participants have to be excluded completely once a single time point is missing (for further points, see e.g. [158, 159]).

4.1.2 Linear mixed model regression

Linear mixed models (LMMs; implemented in Freesurfer [160, 161] are a newer regression method [162] that has been applied in four reviewed studies [18, 136, 138, 163]. LMM regression models the variation in the criterion (e.g., weight loss) as a linear combination of fixed and random effects. The former correspond to parameters that can be defined freely by the researcher (e.g. group membership), the latter to parameters for which this is not possible because they vary in an endogenous, participant-specific fashion (e.g., participants’ signal means or trends). LMM regression has several properties that makes it a suitable candidate for the analysis of longitudinal study designs. First, it can model a signal sampled across an arbitrary number of time points simultaneously. Second, it can handle designs which are unbalanced due to participant drop out [162] and has thus has higher statistical power than alternative methods (e.g., repeated measures analysis of variance based on OLS regression) because participants with partially missing data need not to be excluded [161]. It has been shown that even the inclusion of participants with only a single data point can improve the accuracy of LMM regression [161]. Finally, unlike repeated measures analysis of variance, LMM regression is able to cope flexibly with varying data covariation across time.

4.2 Support vector classification

Support vector classification (SVC) is a supervised classification approach used in the majority of reviewed studies employing machine learning (i.e., [132, 141]). A third study employed a combination of Twin networks and k-nearest neighbor clustering [130]. Given the rare use of this approach in neuroimaging, we would like to point the reader directly to this study for details. In supervised classification, a machine learning algorithm tries to learn characteristic properties from (e.g., brain activity) patterns representative of different classes (e.g., successful vs. non-successful dieters) in a training stage. In its basic form, the SVC algorithm does so by identifying a linear class boundary (“classification model”) that separates the training patterns of two classes and which optimizes the trade-off between the number of non-separable patterns and classifier complexity [164]. In the test or model validation stage, the model is evaluated by computing a classification accuracy measure for unseen test data. Model validation techniques range from leave-one-out cross validation (LOO-CV) to out-of-sample validation. One important aspect arising from this variety and other factors such as a putative sensitivity of supervised classification to sample size [165] is the use of resampling techniques for inference. These consider the conditions under which the empirical classification accuracy is obtained. Parametric procedures do not possess this property (e.g., [166, 167]).

4.3 Structural equation modelling

Structural equation modelling [168] is a multivariate analysis technique applied in one reviewed study [169] evaluating whether/how well the relations among hypothesized constructs fit to relations among (“latent” or unobservable) mathematical factors representing these constructs extracted from a set of (“manifest” or observable) empirical data. In structural equation modelling, relations between manifest variables and latent constructs (i.e., the “measurement model”) and among latent constructs (“latent variable model”) have to be specified first. It is possible to specify directed effects in the latent variables model and thus to assume causal relations among variables (see below). Although structural equation modelling is a very flexible and powerful tool, several aspects have to be taken into consideration. Structural equation modelling cannot be used to test causal relations among constructs. Instead, structural equation modelling tests whether associations in a specific empirical data set fit to the causal assumptions held by the researcher. Poor model fits will strongly question the validity of these assumptions. However, good model fits will increase their plausibility, but not prove them, and require replication on independent data sets [170]. The sensitivity of structural equation modelling to variations in sample size and to violations of distributional assumptions remains a limiting factor. Methods to deal with these problems have e.g., been presented by Kock & Hadaya [171] and Hox et al. [172]. See [172] for an overview.

5 Study overview

Here we provide a tabular overview of the reviewed studies. Table 1 gives an overview on LI-based studies, Table 2 on BS-based studies and Table 3 on PI-based studies.

Table 2 Overview on existing bariatric surgery studies. Studies are subdivided by neuroimaging technique. Studies may be listed more than once if more than one neuroimaging technique is applied. For the interpretation of parameters reported in columns “Modelling: Outcome”, “Predictor”, and “Significant prediction results”, please see Table 1
Table 3 Overview on existing pharmacological intervention studies. For the interpretation of parameters reported in columns “Modelling: Outcome”, “Predictor”, and “Significant prediction results”, please see Table 1

6 Discussion

In this study, we review current work on computational approaches to predicting treatment response in obesity using neuroimaging. We started by outlining key CNS mechanisms thought to affect treatment outcome and then described the neuroimaging techniques and parameters as well as computational approaches used for prediction. Lastly, we gave an overview on existing studies.

This overview provided a consistent picture on the role of CNS mechanisms for treatment outcome in obesity. For example, the importance of dopaminergic reward areas was underlined by CR studies showing that cue-evoked activity in these areas is negatively related to treatment outcome [173177]. The relevance of goal-directed control regions comprising frontal and parietal areas as well as insula was demonstrated in DD tasks directly designed to study goal-directed control [18, 97, 104, 106] and in CR tasks by showing that cue-related activity of these areas has a positive effect on treatment outcome [86, 169, 178180]. Task-derived FC and RS FC studies showed that higher FC between fronto-parietal and insular goal-directed control areas on one hand and incentive salience areas on the other is accompanied by better treatment outcomes [18, 97, 99, 100, 126]. Obesity-related regions involved in incentive salience and goal-directed control as identified by task fMRI strongly overlap with obesity-related regions as identified via RS FC [181]. These results were complemented by structural neuroimaging studies [99, 137140, 142, 143] showing that higher GM volume of goal directed areas was associated with better treatment outcome and higher GM volume of incentive salience regions with worse outcome.

Some studies did not support a link between incentive salience or goal-directed control and treatment outcome. The fMRI studies of Bach et al. and Ten Kulve et al. did not show such a link [85, 86]. In addition, associations between cortical thickness of the superior frontal gyrus and weight loss reported by Liu et al. did not reach a multiple comparison corrected significance level [148]. Similarly, a couple of behavioral studies (not properly controlling for well-known nuisance factors) failed to identify reduced goal-directed control in obesity (e.g., [94, 95]). Thus, due to these null results and the possibility of a publication bias [182], the large number of consistent findings mentioned have to be viewed in a critical light. Nevertheless, given that the concepts of incentive salience and goal-directed control are derived from basic neuroscience on reward and motivation (e.g., [75, 76]), it can be assumed that these two CNS mechanisms play an important role for treatment outcome in obesity.

Do the presented findings also allow us to conclude that the evaluated neuroimaging predictors yield suitable biomarkers for obesity treatment outcome in a real-world precision medicine approach? Such a conclusion might be premature. An important counter argument is that the majority of studies applied correlational techniques evaluating all available data in one step (instead of applying model validation techniques) to analyze many predictors for one criterion assessed in small to moderately sized samples. This approach is sensitive to over- (and under-) fitting [183]. If the number of predictors is high (e.g., as in voxel-wise analyses) and the number of participants small, a statistical model for a criterion can occasionally fit this criterion well (poorly) because the predictor is not very reliable or varies substantially across different measurements. Thus, although such correlational analyses yield valid statistical inference on the group level if the analyses adequately controlled for multiple comparisons and were not circular [184, 185], their results might not be generalizable to unseen data and would not yield suitable biomarkers. Consequently, suitable biomarkers have to have a high retest-reliability.

How reliable are which neuroimaging parameters? What can be done to improve data reliability? How can a high generalizability of a prognostic model to unseen data be ensured? Meta-analyses provide answers to the first question. Studies assessing the reliability of task fMRI (typically via the intraclass correlation (ICC; [186]) for which a poor reliability was defined as ICC < 0.4, a fair as 0.4 ≤ ICC < 0.59, a good one as 0.6 ≤ ICC < 0.75, and an excellent as ICC ≥ 0.75 [187]) reported an average ICC of 0.5 ([188]; N = 15) or 0.397 ([189]; N = 56). A meta-analysis assessing retest-reliabilities for RS FC found an average ICC of 0.29 ([190]; N = 25). Consistent with Han et al. [191], retest-reliabilities computed for volume- and surface-based structural neuroimaging parameters by Elliott et al. showed primarily excellent ICCs [189]. Thus, together with the fact that structural neuroimaging parameters were significantly related to treatment outcome, these findings show that structural neuroimaging parameters might provide the most suitable biomarkers.

However, it is unclear to which degree the reliabilities reported for task fMRI and RS FC can be generalized. First, clinical studies applying specific tasks due to disease-related theoretical or empirical reasons were severely underrepresented in these meta-analyses. Second, especially for Elliott et al., the average retest interval was quite long (four months) given that Bennett & Miller found that studies with three or more months had reduced retest-reliability [188, 189]. The latter finding is consistent with the fact that retest-reliability is affected by a biomarker’s noise as well as by noise-independent physiological alterations occurring over time [192].

Irrespective of whether or not these meta-analyses provide extremely accurate estimates of retest-reliability for fMRI-derived markers, there is space for improvement. Consequently, one might ask what to do to improve reliability? Circumstances enabling an accurate reliability estimation are an important prerequisite. Given that retest-reliability is negatively associated with the duration of the retest-interval [192], an unbiased estimate requires a short time interval. A cross-sectional estimation procedure with zero interval length might be optimal in this regard which computes reliability based on several data subsets of a single scanning session [193]. Given that fMRI data reflect a highly complex process and are sensitive to a broad range of confounding factors, improvements should aim at adequately reducing factors such as head motion, breathing, heart rate, hydration, satiation, neuromodulators including caffeine or nicotine (e.g., [183]). Specific improvements for FC comprise increasing the number of acquired fMRI scans [189, 190] and combining RS FC data acquired across extended scan sessions with complementary task-fMRI data [189]. Additionally, FC computation based on a natural viewing tasks yielded fair to excellent reliability in a study of Wang et al. and was significantly higher than that derived from RS [194].

Another step to optimize neuroimaging-based treatment outcome prediction for clinical application would aim at maximizing the generalizability of a prognostic model to unseen data via application of model validation techniques. In this approach, different models (e.g., derived from neural networks, support vector classifiers, etc.) would be trained on a set of training data. Selection of features yielding a high prognostic accuracy could then be performed for each model separately using independent evaluation data. The prognostic performance is then tested once for each model based on independent test data and the best is selected for application (e.g., [183]).

Consequently, access to highly reliable biomarkers derived from adequately powered studies using model validation techniques is an important prerequisite for using neuroimaging-derived biomarkers for applied treatment outcome prediction in clinical practice – a prerequisite that might not be met today. Besides these application-related aspects, further improvements could entail a more elaborate modelling of “treatment-outcome”. This would take multiple neuroimaging, hormonal, and outcome markers (see e.g., [18, 163, 179]) into consideration. In this regard, the study of Szabo-Reed and colleagues might be pioneering as complex associations between brain activity, caloric restrictions, program attendance, physical activity and weight loss were modeled within a single structural equation model in this work [169]. This approach does not only promise to reveal a more fine-grained picture of contributing factors but also to facilitate a comparison of prediction accuracy obtained by different biomarker compilations.

In conclusion, the reviewed studies provide consistent support for the importance of incentive salience and goal-directed control as central nervous mechanisms mediating treatment outcome in obesity. Despite these findings, larger studies using statistical methods optimized with regard to real-world outcome prediction are needed to determine whether the approach is sufficiently accurate for application in a personalized medicine framework.