Introduction

The worldwide prevalence of Alzheimer’s disease (AD) continues to grow. The Alzheimer’s Association estimates that by 2030 almost 8 million people in the United States aged 65 years and older will have this neurodegenerative mental disease [1]. The challenge of distinguishing preclinical AD from changes associated with normal aging has led to several attempts at clinical classification [2]. It is estimated that, once clinical symptoms are reported, physicians are able to diagnose the disease more than 90 % of the time, on the basis of a detailed medical history, plus mental status testing, physical and neurological examinations, blood tests, and brain imaging. But neuropathological studies in AD have demonstrated that cerebral pathological changes commence decades before the onset of clinical symptoms [36]. AD is associated with progressive accumulation of abnormal proteins [beta amyloid (Aβ) and hyperphosphorylated tau] in the brain, which leads to progressive synaptic, neuronal and axonal damage [7]. Therefore, a biomarker model paralleling the hypothetical pathophysiological sequence of AD was recently proposed (15). It includes: biomarkers of (i) brain Aβ amyloidosis, (ii) AD-related synaptic dysfunction and (iii) AD-related neurodegeneration, respectively, (i) reduction in cerebrospinal fluid (CSF) (Aβ)42 and elevated CSF tau, and increased amyloid tracer retention on positron emission tomography (PET) imaging; (ii) decreased 18F-fluorodeoxyglucose (FDG) uptake on PET with a temporoparietal pattern of hypometabolism; (iii) medial temporal lobe atrophy as assessed by structural magnetic resonance imaging (MRI) [7]. New biomarker-based diagnostic criteria have also been proposed to enhance the clinical detection of AD even in its early prodromal stages [8, 9]. The importance of using biomarkers in diagnosis is that these biological tests may enable clinicians to detect the underlying disease and to determine whether mild cognitive impairment (MCI) symptoms are due to AD and may therefore represent prodromal AD [8, 10, 11]. Such an etiological classification is invaluable for clinical trials of disease-modifying drugs currently being developed with a view to preventing or slowing down the clinical manifestation of AD.

There is a growing body of literature investigating the use of these biomarkers alone, or in combination, to predict conversion from MCI to dementia, and quantitative analytical techniques, currently under development [12], have already achieved a considerable degree of standardization [13].

Temporoparietal hypometabolism on 18F-FDG PET is one of the core biomarkers of neurodegeneration for use in the biomarker-based diagnosis of AD [14, 15]. As functional alterations precede structural changes in AD [16], functional imaging with 18F-FDG PET could play an increasingly important role in providing pathophysiological information on the distribution of neuronal death and synapse dysfunction in vivo, which cannot yet be detected by structural imaging [17]. It was recently shown, in patients with pathologically verified disease that progressive metabolic reductions occur years before clinical AD symptoms [18], suggesting that PET could have enhanced power in early AD diagnosis. Moreover, impaired cerebral glucose metabolism in the left temporoparietal area of the brain, measured with FDG PET, combined with visuospatial functioning was found to predict, with 90 % accuracy, future deterioration in MCI patients followed up after an average of 36.5 months [19]. In addition, 18F-FDG PET has been shown to be one of the strongest individual positive predictive biomarkers of short-term incident AD dementia in MCI patients [20]. Finally, in a very recent study evaluating the possibility of combining MRI, FDG PET, and CSF data with routine clinical tests to significantly increase the accuracy of predicting conversion from MCI to AD status compared with clinical testing alone, Shaffer et al. [21] demonstrated that while imaging, CSF biomarkers significantly improved prediction of conversion from MCI to AD compared with baseline clinical testing, FDG PET appeared to add the greatest prognostic information.

In recent years, PET has become more and more popular because of its high resolution, sensitivity and accuracy [22, 23] and has become widely employed. Since FDG is a labeled glucose analog, FDG PET can be used to measure brain glucose uptake, which reflects cortical metabolism and may be useful, for instance, in distinguishing frontotemporal dementia (FTD) with its anterior functional defects from Alzheimer’s dementia with its temporoparietal cortex defects [10, 24]. In a multicenter study which examined FDG PET scans taken from 110 healthy elderly individuals, 114 MCI, 199 AD, 98 FTD, and 27 patients affected by Lewy body dementia (LBD), Mosconi et al. [25] measured the power of FDG PET hypometabolism to provide differential diagnoses and explored the relationship of disease-specific hypometabolic patterns to MCI. These authors found that disease-specific PET patterns correctly classified 95 % of the AD, 92 % of the DLB and 94 % of the FTD patients, versus 94 % of the normal subjects. An AD PET pattern was observed in 79 % of the MCI patients with deficits in multiple cognitive domains and in 31 % of those with prominent memory deficits, while FDG PET hypometabolic patterns in MCI patients with non-memory deficits were found to be heterogeneous, ranging from absent hypometabolism to hypometabolism patterns typical of FTD and DLB. Moreover, using FDG PET to predict which patients with MCI would convert to dementia at 18 months, Chételat et al. [26] found that converters had lower FDG uptake in the right temporoparietal cortex. Soon afterwards, Anchisi et al. [27] used FDG PET and memory test scores to identify which members of a sample of 67 amnestic MCI patients would convert to AD after 1 year, and found that clinical stability was associated with a pattern of hypometabolism in the dorsolateral frontal cortex and a score of 7 or higher on the California Verbal Learning Test-Long Delay Free Recall, while conversion was significantly associated with scores less than 7 on the California Verbal Learning Test and with bilateral hypometabolism in the inferior parietal, posterior cingulate, and medial temporal cortex. Drzezga et al. [28] found that 11/13 MCI patients with baseline FDG PET suggestive of AD converted to dementia by 16 months, as opposed to only 1/17 with baseline FDG PET not suggestive of AD. The remaining 16/17 FDG PET-negative patients were still classified as MCI at the end of the study. Moreover, in a very large meta-analysis of neural correlates of AD and MCI, conducted on 40 studies involving 1,351 patients and 1,097 healthy control subjects reporting either atrophy or decreases in glucose utilization and perfusion, Schroeter et al. [29] reported that early AD functionally affected the inferior parietal lobules and the precuneus. As regards future conversion of MCI to AD, the authors found that atrophy in the transentorhinal hippocampal area and hypometabolism in the inferior parietal lobules were the most reliable predictors of progression. Similar results were reported in Herholz’s 2010 review [30], in which substantial impairment of FDG uptake in the temporoparietal association cortex emerged as a reliable predictor of rapid progression to dementia in MCI patients, while frontal and temporoparietal metabolic impairment was closely related to disease progression in many longitudinal studies. This role of FDG PET as a sensitive marker for monitoring the progression of early AD was ultimately confirmed by the longitudinal study of brain metabolic changes from amnestic MCI to AD conducted by Fouquet et al. [31]. The authors of this study measured metabolic per cent annual changes in MCI patients followed for 18 months or until their conversion to AD and found that converters, with respect to non-converters, typically showed a greater metabolic decrease in ventral medial prefrontal areas; therefore, metabolic changes in these cortical areas made it possible to discriminate completely between the two groups.

In recent years, a number of imaging tools of varying technological sophistication have been developed to rate functional changes taking place in the brains of patients with AD. These range from simple subjective visual rating scales to sophisticated computerized algorithms. Efforts are under way to make readouts standardized and as operator-independent as possible [32].

Moreover, a set of automated tools for computer-assisted diagnosis based on PET images has been validated and has already achieved a considerable degree of standardization [13, 33, 34]. These tools can be categorized into two classes: statistical maps and summary metrics of temporoparietal hypometabolism [35]. The strengths and weaknesses both of visual rating and of these two classes of tools should be carefully analyzed and cautionary warnings noted. The aim of this paper is to review the principal approaches to FDG PET reading, underlying their strengths and limitations (summarized in Table 1).

Table 1 Strengths and weaknesses of qualitative and quantitative tools for FDG PET reading

Visual rating and ROI-based methods

Positron emission tomography has been used in the investigation of functional brain metabolism for decades, while the technique has been becoming increasingly important for AD diagnosis and treatment since about 1990 [36, 37]. The impact of early PET studies was undoubted. They showed, indisputably, that the earliest sites of functional impairment in AD were in the temporal lobes [38]. Accordingly, this technique started to become a substantial supporting element in the differential diagnosis of AD and other dementia conditions [39]. However, in the visual inspection of PET images (black and white or colored), the lack of defined thresholds for distinguishing between normal and pathological brain metabolism conditions (Fig. 1a) meant that judgments depended entirely on the rating expertise of nuclear medicine physicians (Table 1). To overcome this limitation, region of interest (ROI)-based methods were developed, which involved the tracing of ROIs on MRI images and the within-subject co-registration of PET and MRI scans [40].

Fig. 1
figure 1

The output shows the processing of an FDG PET scan of an AD patient using visual rating and different automated tools. a PET visual rating, relying solely on the nuclear medicine physician’s rating expertise. The output shows a PET visual inspection of glucose uptake metabolism: the white arrow denotes an area of mild-to-moderate hypometabolism; b ROI-based semi-quantitative analysis: this method can be used to investigate single-subject metabolism in a priori defined ROIs and compare it to that of normal controls. Computerized brain atlases have been developed for automating the identification and segmentation of ROIs [4548]. A priori left mesial temporal ROI (in red) has been generated by the PickAtlas tool [48], which is based on the Talairach Daemon database [76]. The ROI thus generated has then been overlaid on the PET images of the AD patient and of 148 normal elders [20] and glucose uptake metabolism in this ROI has been computed. In this ROI, the AD patient’s metabolism is 2.74 z-scores lower than the mean for the normal group; c voxel-based analysis: this voxel-by-voxel approach analyzes PET data without requiring a prior hypothesis. In this figure, SPM analysis shows, on the right, regions of significant decrease in glucose uptake metabolism (yellow to red), overlaid on a standard brain atlas, and, on the left, a patient’s “glass brain” image of hypometabolic regions in neurological orientation (i.e., the patient’s right is on the reader’s right-hand side) compared with a group of 21 scans of normal elders [61]; d summary metrics are the voxel-by-voxel summary measures of AD-related hypometabolism based on the comparison of individual images with a dataset of normal scans in the specific brain regions involved in AD. Their output is a single figure summarizing the metabolism information of all the analyzed brain regions. A threshold is given to dichotomize values into normal/abnormal. 1 PALZ score is computed as a voxel-by-voxel sum of t-scores in a predefined AD-pattern mask overlaid on an AD subject’s PET image (in red) [57]. The algorithm performs an age correction of the measured glucose uptake, and then compares the corrected glucose uptake in each PET image pixel with the predicted uptake based on a group of 110 scans from normal elders [57]. The resulting deviations are expressed as t-values. All abnormal t-values within the AD patient’s specific mask are summed, resulting in an abnormal score, whose cutoff value is between 11,089 [57] and 13,481 [20]; 2 HCI considers the whole brain without having to specify a pre-defined region of interest [71]. The figure shows the HCI hypometabolic mask covering the AD subject’s whole brain (in light orange). HCI provides a single measurement of the extent to which the pattern and magnitude of cerebral hypometabolism in an individual’s FDG PET image corresponds to that of probable AD patients and it is thresholded at the value of 1,055 [20]; 3 MetaROI is a global index computed as the average of the mean glucose uptake metabolism counted in five pre-defined ROIs (three of them, overlapping the AD patient’s PET image, are depicted in red), developed on the basis of literature findings about AD hypometabolism [72]. The output of MetaROI reports abnormal age-corrected z-scores (w-scores) in voxels showing significantly different glucose uptake between the AD patient and that of normal controls, thresholded for abnormality at −2.60 [20] (Colour figure online)

ROIs identified on PET scans can be used both to visually inspect metabolism in specific regions (ROI-based qualitative evaluation) and calculate the mean metabolism within them, and to extract regional metabolic values from ROIs identified ad hoc (ROI-based semiautomatic quantitative method) (Fig. 1b) [41]. These ROI-based approaches have high anatomical resolution and specificity. Indeed, based on manual outlining of specific structures and areas on each MRI scan slice, they combine the superior anatomical resolution of MRI with the physiological information provided by PET [42]. These methods have been considered to be the gold standard for extracting data from PET images for research purposes [43]. However, ROI-based approaches are operator-dependent, given that each region must be manually drawn by a trained expert with specific neuro-anatomical knowledge; moreover, drawing MRI ROIs demands considerable skill and expertise, because tracer uptake might have been absent or decreased in some brain regions, while defining ROIs based on PET images according to the predefined gyral and sulcal landmarks could be subject to bias and uncertainty [44]. Moreover, ROI analysis is limited by the need to make prior assumptions regarding the regions of particular interest to be considered and retained for further analyses (see Table 1) [44]. As ROI-based methods are labor intensive and time consuming, requiring manual regional demarcation and segmentation of individual patient data, a number of different computerized methods have been developed to partially or fully automate the process. Computerized adjustable brain atlases have been developed, into which individual PET images can be transformed, partly automating the process of identifying ROIs [45]. Later, Collins et al. [46] presented a fully automated method for segmentation and identification of gross neuroanatomical structures. More recently, a number of studies have been proposed using a template-based or model-based ROI method, which uses ROIs defined on a template to which all subjects have been normalized or which has been normalized to each individual subject [47, 48]. This template-based ROI method makes the process of manually drawing subject-specific ROIs less time consuming and subjective, while retaining the possibility of performing hypothesis-driven analyses for specific locations [44]. Because spatial normalization is a critical step for these procedures, the output is conditioned by the accuracy and precision of the normalization process [49]. Together with computerized brain atlases, structural probabilistic brain atlases have been developed for spatial transformation purposes. These have provided the framework for new analytical methods that are capable of combining anatomical information with the statistical mapping of functional brain data [50], derived from large populations, and of providing information on anatomical, geometrical and functional aspects of the brain systems under study [51]. These atlases have been implemented in FDG PET studies, coupled with ROI-based methods [52, 53].

An issue that should be considered is the high cost of performing MRI coupled with PET scans. To overcome this problem, PET/MRI hybrid machines have recently been developed, which allow the acquisition of both PET and MRI scans in a single session, thereby minimizing patient discomfort while maximizing the clinical information and optimizing registration of both modalities [54]. Future clinical applications of hybrid PET/MRI machines in neuroimaging should be carefully evaluated.

Statistical maps

To overcome the limitations of ROI-based methods, fully automated, voxel-based analysis (VBA) techniques have been developed. The voxel-by-voxel approach requires that all images be spatially transformed to a template space. This process, known as stereotaxic normalization, assumes that each voxel corresponds to the same anatomical region across subjects. Statistical analyses are therefore performed for every voxel across all subjects (Table 1). Software tools currently used with VBA techniques are statistical parametric mapping SPM (Wellcome Department of Neurology, London, UK, based on Matlab package http://www.mathworks.it/products/matlab/ [33]), and NEUROSTAT (University of Michigan, Ann Arbor, MI, USA) [34], both available freely (http://128.95.65.28/~Download/) and through commercial vendors.

Voxel-based analysis with SPM (Fig. 1c), initially developed for group statistical comparisons and only later for single-case analyses, relies initially on a smoothing procedure to reduce inter-subject variability of brain shape and activity and then on a spatial normalization, which consists of an anatomical reshaping of the subject’s brain onto a standard brain atlas (or template) taking care to maintain voxel activity throughout the spatial registration process [55]. Statistical comparisons are performed on a voxel-by-voxel basis, resulting in the creation of statistical parametric maps of significant effects. Recently, Chen et al., using SPM, introduced a method involving the use of an empirically pre-drawn statistical ROI in assessing AD-slowing treatment effects with improved statistical power, defining ROIs from a set of voxels associated with twelve-month declines in regional-to-whole brain cerebral metabolic rate for glucose [56]. The technique is currently under validation [56].

The other tool, called NEUROSTAT, initially developed with a clear clinical intent and offering, since its first release, the possibility of single-case analysis, includes a three-dimensional stereotactic surface projection routine (3D-SSP) that became a standard for PET image analysis in aging and dementia [34]. In 3D-SSP, predefined surface projections are established for brain surfaces. The maximal gray matter activity, projected onto the surface, can be subjected to further statistical analyses or interpreted visually. The algorithm is based on an a priori knowledge of the directions in the brain of major neuronal fiber bundles along which brain shape is nonlinearly warped.

The strengths of statistical mapping techniques are well known. VBA allows the detection of even slight changes in the functional signal, and is less biased than qualitative visual inspection of images [57]. Otte et al. [32], giving an overview of brain imaging tools in neuroscience, pointed out that the observer- and training (experience)-dependent visual inspection, albeit remaining the first diagnostic step for the physician, should be supplemented with an observer-independent SPM method or, at least, with an ROI-based technique in cases of uncertainty. Furthermore, as shown in a post-mortem confirmation study, visual interpretation of statistical maps may be more reliable and accurate in distinguishing different dementia conditions than clinical methods alone or than traditional visual inspection of images [24]. Moreover, this approach to PET data analysis is more exploratory than the ROI approach because it requires no prior hypothesis about the expected location of the effect [44].

Some limitations should be considered when discussing VBA techniques. First of all, leaving aside spatial normalization, a critical step in these procedures too [31], even though it has been demonstrated that automatic detection of abnormal brain metabolism on individual PET studies can be achieved by adapting the use of VBA to single-subject statistical studies [58], the so-called single-case procedure (the comparison of single cases with a reference group) has actually been used only for research purposes [59] and is undergoing validation for use in clinical practice, for both AD and MCI patients [60]. A partially unresolved issue for single-case analyses is that of how to maintain sufficient power to detect actual metabolic abnormalities in a PET scan while at the same time controlling for false detection. To overcome this problem, visual PET inspection, looking for specific patterns of regional abnormalities rather than for some scattered abnormal metabolic foci which, on VBA, could be falsely interpreted as hypometabolic patterns, is strongly recommended. A GridSPM service (GridSPM), specifically designed to allow remote VBA of PET brain images, has recently been developed to automatically perform SPM single-case analysis in clinical practice [61, 62]. GridSPM consists of a grid and an optimized version of SPM, and was developed during the Diagnostic Enhancement of Confidence by an International Distributed Environment (DECIDE) project (www.eu-decide.eu). The single-case procedure implemented in GridSPM has been made available, by the DECIDE project, to the global neuroscience and medical community, thus allowing all medical experts throughout the world to run single-case analyses and to visualize brain hypometabolism on their patients’ PET scans, without needing any other technical or software knowledge. GridSPM has been evaluated for usability and for the quality of the tools by two physicians and one physicist with SPM expertise and has been validated by comparing the results of the original SPM and of GridSPM on ten neurological patients. All evaluators gave the maximum scores to GridSPM for “ease of use” and quality of results; moreover, the results of the statistical comparisons between original SPM and GridSPM maps showed excellent agreement for all patients and visual inspection of all SPM results confirmed this agreement [63]. Della Rosa et al. [60] found that the single-case procedure in GridSPM showed greater than 90 % specificity and sensitivity for detecting the typical hypometabolic AD patterns. However, the validation of GridSPM is still in progress. It is necessary to bear in mind that these algorithms were specifically developed, at the outset, for comparing the brain activation of two or more scan groups, and therefore that tweaking them for single-case analysis in a clinical rather than a research setting could be dangerous.

Second, VBA estimates the topographical departure of metabolism from a dataset of normal scans on a voxel-by-voxel basis [64]. The resulting maps provide detailed anatomical localization of dysfunctional brain regions that, however, require an expert interpretation or, at least, comparison with a neuroanatomical atlas [65]. Moreover, the use of VBA is limited by the lack of a substantial number of control PET studies suitable for statistical comparison with patients (i.e., control studies representing several conditions of normality to be compared with the patient on a single-subject basis) [61, 63]. This is probably because of ethical issues related to risks linked to the PET acquisition procedure (radiation exposure and catheterization)—although acquisition of proper controls is actually an ethical requirement for accurate scientific research and diagnostic procedures—and to the costs of storing and sharing a large set of in vivo neuroimaging studies [63]. Indeed, no reference standard dataset of normal FDG PET images has, as yet, been created [66]. So far, studies using single-case analysis include several normal control groups [60] and their results are dependent on the accuracy and precision of the normalization process, as well as on the longitudinally confirmed controls [67]. Mosconi et al. demonstrated that using a PET database of longitudinally stable controls reduced the number of false negatives without increasing the number of false positives in diagnosing MCI and AD [67]. The diagnostic sensitivity achieved with this longitudinal database was 100 % for AD and MCI, whereas using a standard database of normal subjects not confirmed longitudinally sensitivity was 95 and 68 %, respectively for AD and MCI. This highlights the critical importance of longitudinal follow-up of controls to improve sensitivity for detection of MCI.

As a final cautionary warning, the objective VBA techniques should be regarded purely as tools specifically designed to improve detection capacity and thus to enable the nuclear medicine physician to aid the referring physician in early diagnosis and subsequent treatment decisions [59].

Summary metrics

In the last few years, several global indices of AD-related hypometabolism, based on different image processing procedures of varying degrees of complexity, robustness and automation, have been developed and many scientific articles have evaluated the improvement of diagnostic accuracy obtained by adding fully automated tools to visual interpretation of PET scans [68, 69]. In this review, we focus on the so-called PALZ algorithm [57, 70], an AD-related hypometabolic convergence index (HCI) [71] and a meta-ROI average [72].

The PALZ score, originally developed by K. Herholz et al. [57], combines the virtues of voxel-based parametric mapping with diagnostic information on brain regions that are typically affected in AD, and can be automatically computed using the commercially available PMOD software (PMOD technologies, www.pmod.com). Each FDG PET image is compared to a fixed database of scans from normal elderly subjects through a voxel-wise t test, including age as a confounding variable, and the PALZ score is computed as voxel-by-voxel sum of t-scores in a predefined AD-pattern mask (Fig. 1d). PALZ has been developed to distinguish AD from healthy elders, with a threshold for abnormality set at 11,089 (a t-sum higher than 11,089 meaning abnormal metabolism on FDG PET). In its validation study on 110 normal controls and 395 patients with probable AD recruited in eight participating centers, PALZ was shown to identify mild-to-moderate AD with 93 % sensitivity and specificity, and very mild AD with 84 % sensitivity and 93 % specificity [57]. Being accurate, fully automatic and fast, as well as designed for single-patient analysis, PALZ is a powerful tool that can aid FDG PET-based AD diagnosis (an example of PALZ outcome is shown in Fig. 1). However, PALZ score computation requires the commercially available PMOD software, and this may limit its widespread use in clinical practice.

The AD-related HCI was developed by Chen et al. [71]. It provides a single measurement of the extent to which the pattern and magnitude of cerebral hypometabolism in an individual’s FDG PET image corresponds to that in probable AD patients, and it is generated using fully automated voxel-based image analysis algorithms based on SPM (http://www.fil.ion.ucl.ac.uk/spm/) (Fig. 1d). Each individual PET scan is compared with the scans of healthy subjects from a predefined normative database through a voxel-wise t test, and the HCI is calculated as the inner product of the resulting individual t-score map (converted to a z-score map) and a predefined AD z-score map, first converted into vectors. The performance of the HCI has been assessed and validated in terms of (i) discrimination between AD patients versus MCI patients who converted to AD versus stable MCI patients versus normal controls, and (ii) rates of progression from MCI to probable AD. It was found to be potentially able to help in characterizing AD and in predicting subsequent rates of clinical decline [71]. However, the HCI has heavy requirements: a specific software (Matlab) package (http://www.mathworks.it/products/matlab/) for its computation and a predefined normative dataset; furthermore, age correction for HCI final score is not yet fully implemented.

Meta-ROI volumes were created by W.J. Jagust and S.M. Landau on the basis of a meta-analysis of studies carrying out direct whole-brain comparisons of FDG PET data and reporting z-scores or t-values in voxels showing significantly different FDG uptake between patients (AD or MCI) and controls: z-scores were mapped to the space of the MNI template brain as intensity values; the resulting images were smoothed, normalized to (0–1) intensity range and binarized using a threshold of 0.5, resulting in five binary masks in MNI space (left angular, right angular, left inferior temporal, right inferior temporal, bilateral posterior cingulate meta-ROIs). The global index (meta-ROI average) is computed on spatially and intensity-normalized PET scans as an average of the mean counts in these meta-ROI volumes [72] (Fig. 1d). Meta-ROI average performance has been assessed in terms of sensitivity to detect longitudinal change in both cognitive and functional measurements within AD and MCI patients [72]. The results provided strong evidence that lower baseline FDG PET metabolism consistently predicted subsequent cognitive decline, and that longitudinal FDG PET metabolism change was associated with concurrent cognitive decline [72] making meta-ROI a reliable tool that could exceed the power of standard clinical outcome measures [72]. However, meta-ROI requires Matlab and SPM software and a set of publicly available pre-defined ROIs to run properly and, so far, its processing procedure does not account for age.

The strengths of these summary metrics relate, first of all, to the fact that, in general, they do not require expert interpretation or time-consuming user training. For instance, once FDG PET images have been imported into PMOD, PALZ score computation is completely automated and takes about 2 min per subject. HCI computation, instead, is automatically performed by the HCI package, after the normalization to the default PET template, and takes about 5 min per subject. Meta-ROI computation is the most time consuming (about 15 min per subject) and the least automated procedure, because it requires the application of different SPM subroutines.

Second, at the end of their computation, these indexes of global hypometabolism provide a measure of the depth and extent of metabolic abnormality in brain regions that are typically affected in AD, representing the severity of functional impairment significantly related to dementia severity, and discriminating between normals and patients with very high accuracy [13]. Moreover, for each of the described summary metrics, thresholds are provided to dichotomize values into normal/abnormal. The final results of these metrics, condensing regionally spread abnormalities on the PET images into a single figure, are, in effect, like a laboratory value indicating the degree of metabolic abnormality in brain regions that are relevant to AD. They are highly eloquent [70], but are informative only in the context of a dichotomous diagnostic question (e.g., Alzheimer’s versus normal condition). On the other hand, summary metrics are scalar numbers whose value is proportional to temporoparietal hypometabolism. Voxel-based estimates of the departure from a dataset of normal scans similar to those of voxel-based mapping tools are summed over the volume of interest in the temporoparietal region. However, summary metrics do not provide information beyond the volume of interest [13]. As all these metrics have been built to distinguish between normal and Alzheimer’s condition or to follow and to predict conversion from the MCI state to full-blown AD dementia, they would not, at the present state of the art, allow discrimination between different types of dementia, such as FTD, DLB or vascular dementia, as all of these are associated with some degree of metabolic impairment in the brain areas on which the metrics are based [70, 71]. The risk of false negatives is present and should be considered; a possible solution could be, for instance, that of coupling summary metrics with multivariate techniques, such as principal components analysis, which may yield multi-dimensional discriminant functions [73].

Another problem with summary metrics is the uncertainty of the diagnostic value around the provided cutoff as each metric is highly dependent on the control group on which it has been built [13]. As a result, normative database variations could move thresholds up or down slightly; furthermore, there are no unanimously recognized normative databases [67, 74]. The automation of summary metrics furnishes a binary answer in terms of normal/abnormal condition, without considering the gray zone of values around the cutoffs. For instance, the AD t-sum cutoff ranges from 11,089 [57] to 13,481 [20] when using different reference control groups, thus generating subjects whose hypometabolism could be considered, at the same time, both normal and abnormal. Coupling visual inspection of maps by expert raters with fully automated summary metrics could help to reduce this problem.

There are many cautionary warnings to be considered when using fully automated summary metrics: first, clinicians could be forced to follow the binary normal/abnormal result provided by the tool, disregarding their traditional visual experience. To overcome this danger, it is strongly recommended always to visually inspect the hypometabolic map supplied by summary metrics and to fragment automatic processes into steps that can easily analyzed by raters, as automatic segmentation can sometimes fail, giving a false final answer [57].

Second, raters who fail to visually inspect hypometabolic maps supplied by summary metrics or PET images, trusting only the final normal/abnormal output, could be running a very high risk of neglecting non-AD hypometabolic patterns (see Table 1 for strengths and weaknesses of all the aforementioned approaches to FDG PET reading) [75].

Third, but very important, none of the fully automated metrics discussed in this review has been approved by regulatory authorities for diagnostic purposes in clinical settings. Indeed, in the case of commercially available tools (like PALZ, for example), the manufacturer provides a disclaimer which strongly recommends their use for scientific and research purposes only (http://www.pmod.com/technologies/pdf/brochures/palz.pdf), leaving the legal responsibility for their use for diagnostic classification to the operator. Therefore, these tools should be used only with extreme caution in clinical settings, as generally, they have been developed primarily as scientific rather than clinical tools.

Conclusions

Temporoparietal hypometabolism on 18F-FDG PET is one of the core biomarkers for the biomarker-based diagnosis of AD; therefore, its translation into clinical practice is of crucial interest for the purpose of improving early diagnosis and treatments. The present review shows that all the automated tools developed to overcome limits of traditional visual rating of PET hypometabolism have the potential to help detect AD, even though none of them have yet been approved by regulatory bodies for diagnostic use in clinical settings. As different tools have different technical requirements and levels of automation, the choice among them should be driven by available resources. Additional efforts are needed to clarify the ability of computer-aided diagnostic reporting of FDG PET to address particular scientific and clinical questions (e.g., differential diagnosis of dementia, predicting subsequent decline over different time points, reducing the number of patients needed for a clinical trial using clinical or biomarker endpoints). The incremental diagnostic value of these tools over other imaging and biological markers (e.g., hippocampal atrophy on MRI, amyloid-β 1–42, total tau and phosphorylated tau in the CSF or amyloid load) should be carefully evaluated. Finally, crucial requirements such as the proper software release, documentation and correction of software bugs, coherence of components and modules, and a complete description of features, should always be met by any automated tool being developed for use in clinical or research settings.