Introduction

Quantitative measures in medicine are mostly used to determine whether a physiological parameter is within normal range, or to provide an estimate of the severity of an abnormality. They are also useful for longitudinal assessment (monitoring) of disease progression or therapeutic effects. For all this to be possible, comparability between individuals is needed. Measurement error and bias, as well as differences due to variations in the actual implementation of measurement methods should be small as compared to the spread of the biological signal that is being measured.

Positron emission tomography (PET) is an imaging technique that can provide a large number of quantitative measures simultaneously as voxels within a three-dimensional image data set. Primarily, the signal reflects the local activity of a given radioactive tracer at each image voxel. To qualify as a quantitative biomedical technique, these values need to be converted into a physiologically relevant parameter in a way that is comparable between individuals. Ideally, they should provide a measure of an established physiological parameter in units that allow direct comparison with measures of the same physiological parameter obtained by other techniques and such measurements are often referred to as “fully quantitative”.

The following overview describes the state of the art and the limitations of the main quantitative PET techniques that are clinically available for assessment of dementia.

Quantitative measures provided by PET

Due to the physical characteristics of the paired 511 keV annihilation photons produced by positrons, PET provides superior sensitivity and accuracy as compared to most other imaging methods, including magnetic resonance imaging (MRI) and single-photon emission computed tomography. With proper calibration and corrections for attenuation, scatter and random coincidences, PET images provide quantitative data of local tracer activity in absolute units, i.e., kBq/ml.

With PET/CT scanners and most dedicated brain PET scanners corrections for attenuation are based on measured attenuation maps and are highly accurate. Various techniques have been implemented by manufacturers to correct for scatter and random events, and they tend to be somewhat dependent on how well the actual scanning circumstances match the underlying assumptions. At its current stage of development, MRI/PET, while providing attractive features for direct data co-registration and even simultaneous acquisition, has yet to reach the same level of accuracy. This is because routine MRI techniques do not provide a bone signal that can be used for attenuation correction [1, 2]. It is hoped that ongoing technical developments will eventually overcome this limitation [3].

Initially, filtered back projection was the standard technique for the reconstruction of PET images from measured coincidence events. With the improved calculation capacity of modern computers, iterative reconstruction algorithms that can accommodate most accurate corrections and various detector geometries have become the usually preferred option [4].

The spatial resolution of PET is intrinsically limited by some physical factors, including the path length of positrons in tissue before annihilation, the minor deviation of the annihilation pair angle from precisely 180°, and the detector size. Dedicated research brain scanners can reach approximately 2 mm resolution [assessed as the full width of measured activity at half maximum (FWHM) from a point source], while most current scanners in clinical use provide a spatial resolution in the order of 5 mm. Because of the potentially large differences in tracer activity between brain regions and, for most tracers, generally between grey and white matter, there may be substantial spill-over of signal between adjacent brain regions. This is known as the partial volume effect (PVE), and it can impair accurate assessment of regional activity in areas that are smaller than twice FWHM resolution [5]. Even in normal subjects, cortical thickness is in the order of 3–4 mm or below, while in patients with atrophic brains (as frequently encountered in neurodegenerative disease) it can be reduced to 1 or 2 mm in some areas [6]. For tracers with high cortical activity, this can lead to substantial underestimation of cortical activity, while for tracers with high accumulation in white matter and low cortical activity, overestimation of cortical activity is to be expected.

Multiple algorithms have been proposed for partial volume correction (PVC) [7]. For a given set of brain regions, the extent of mutual spill-over can be estimated quite accurately from their geometry and the width of the point-spread function, if one assumes homogeneous distribution of activity within these regions [8]. Other algorithms aim to correct the estimated cortical activity [9] and may involve iterative deconvolution [10]. They typically require accurate segmentation of cortex from white matter by co-registered MRI and often assume homogeneous tracer activity within white matter. Unfortunately, these algorithms are typically quite sensitive to small deviations from accurate co-registration and segmentation [11]. The latter poses a considerable challenge to MRI; especially, in patients with cortical atrophy, in whom it would be most needed, while the spatial resolution and signal strength even of 3 T MRI scanners do not usually reach the required accuracy of 1 mm or better in individual subjects under clinical conditions. Correction algorithms are likely to reduce the PVE-induced bias in regional activity in group studies, but there is insufficient evidence that they improve quantification accuracy in individual scans.

Iterative reconstruction also opens up the possibility of recovering spatial resolution and thus of providing local values that are closer to actual activity even in very small regions or thin cortex [12]. Like most PVC algorithms, such approaches also increase measurement noise, unless they are combined with particular noise-reduction filters. Although it is difficult to assess the actual improvement of quantitative accuracy obtained through resolution recovery, simulation studies suggest that resolution recovery may be as efficient as post-reconstruction PVC, without depending on accurate segmentation and co-registration.

Quantification of cerebral glucose metabolism by FDG PET

The interest in cerebral glucose metabolism is based mainly on its relationship with synaptic activity. Oxidation of glucose involves interaction between neurons and astrocytes [13] and it provides the energy that is required to maintain the ion gradients needed for neuronal activity. It is also linked, via the citrate cycle, to the recycling of glutamate, which is the main excitatory neurotransmitter. Because of these mechanisms, it is correlated with synaptic function, but there is no firm basis to support the translation of quantitative measures of glucose metabolism into quantitative measures of synaptic function.

FDG is transported across the blood–brain barrier (BBB) and into brain cells by glucose transporters and then phosphorylated by hexokinase in the same way as glucose. Because phosphorylation is essentially irreversible and phosphorylated FDG becomes trapped in brain tissue, the cerebral metabolic rate of glucose (CMRglc) can be calculated by the following equation:

$$ \text{CMR}_{\text{glc}} = \, (\text{CPL}_{\text{glc}} /\text{LC} ) \times K_{i} $$
(1)

where CPLglc is the plasma concentration of unlabelled glucose, LC is the “lumped constant” that provides an adjustment for the differences in the affinities of glucose and FDG for the transporter enzymes and hexokinase, and K i is the rate of unidirectional transfer of FDG from blood to brain.

Acquisition of an input function, which describes tracer delivery by blood plasma to the brain, is a general requirement for accurate measurement of CMRglc. It can be achieved by taking multiple arterial blood samples (typically approximately 20 samples) over the entire tracer uptake period (typically approximately 60 min). As this is an invasive procedure associated with considerable discomfort to the patient and a small risk of serious complications, it is usually avoided unless absolutely necessary, and surrogate techniques to obtain an approximate input function are used. Options include sampling from a dorsal hand vein that was first “arterialised” by heating of the hand, use of population-derived standard input functions scaled to a small number of venous blood samples, or use of an image-derived input function. However, the accuracy of these techniques is difficult to verify and likely depends on details of the technical implementation.

By fitting rate constants K 1 for tracer transport from blood to brain, k 2 for the reverse transport, and k 3 for the phosphorylation step, using kinetic tissue activity measured by PET and the input function according to the standard compartmental model of brain glucose metabolism [14], it is possible to obtain a direct calculation of K i  = K 1 × k 3/(k 2 + k 3). Alternatively, the Patlak plot approach can be used to measure K i without making assumptions about metabolic compartments, although kinetic PET data and an input function are still required [15].

Under most circumstances, the transfer rates of glucose (and FDG) across the BBB and the phosphorylation rates are correlated. Thus, at high metabolic rates both are increased, while both are reduced if glucose metabolism is reduced. Under these conditions, brain uptake of FDG at 20–60 min after intravenous injection is approximately proportional to its unidirectional transfer rate K i . Thus, K i can be approximated by a single measure of FDG activity in tissue at a specific time interval after injection (CTFDG) and dynamic measurement of uptake over the entire accumulation period is not required. However, we still need an estimate of the cumulative input (CINP) calculated from an actually measured input function and including a correction based on estimated transport rates as specified by Hutchins et al. [16]:

$$ \text{CMR}_{\text{glc}} = \, (\text{CPL}_{\text{glc}} /\text{LC} ) \times (\text{CT}_{\text{FDG}} /\text{CINP} ) $$
(2)

The input function term in the Patlak plot and in Eq. (2) is based on the integral of the arterial input function, which is less sensitive to errors in timing and height of the peak immediately after injection than compartmental fitting of individual rate constants. Thus, reasonably good approximations of the input function terms can be achieved by population-based or image-derived input function estimates [17, 18]. Still, some blood sampling is usually required for proper scaling.

Because of the resources needed for the handling of blood samples and the uncertainties associated with surrogate techniques, there has always been a strong motivation to get rid of this approach entirely in clinical practice. One alternative approach is based on the close relationship between CINP and the injected dose, normalised by body weight (BW), which is used to calculate standardised uptake values (SUVs) [19]. In subjects without severe renal impairment or other severe metabolic abnormalities CINP is approximately proportional to the injected dose (ID) relative to the BW. With LC also being a constant, the SUV corrected for blood glucose levels, SUVglc, is thus approximately proportional to CMRglc:

$$ \text{SUV}_{\text{glc}} = \, \text{CPL}_{\text{glc}} \times \text{SUV} \, = \, \text{CPL}_{\text{glc}} \times \text{CT}_{\text{FDG}} /(\text{ID} /\text{BW} ) $$
(3)

Most commonly, it is assumed that some brain region or some measure derived from global FDG brain uptake can be used as a reference and that we are only interested in regional metabolism relative to that reference. If this is the case all systemic parameters, such CPLglc, CINP, ID and BW, are cancelled out because they are constants related to subjects rather than regions. Assuming that the LC also is a constant within each individual across the entire brain, it too is cancelled out and we are left with CTFDG as direct measure of relative CMRglc. Thus, a static image of FDG is actually a good approximation of the regional distribution of CMRglc relative to a reference region in each individual.

Quantification of FDG PET studies in dementia

Ideally, it would be preferable to obtain fully quantitative values of CMRglc (according to Eqs. 1, 2) for dementia studies. However, even with the most advanced data acquisition methods it has so far not been possible to eliminate uncertainties associated with the LC [20]. In addition, the coefficient of variation (among individuals) associated with regional CMRglc is typically twice as large as the variation coefficient of regional CMRglc relative to a reference region. Thus, in most studies, even when full quantification has been done, the final statistical analysis is often based on relative CMRglc only in order to maximise the power of the study. This raises the question of whether the effort and resources required to obtain fully quantitative values is justified.

The main problem with relative values is the fact that a reference is required, either a brain region or some other value extracted from the brain data. Some regions in Alzheimer’s disease (AD) brains, including the cerebellum, pons, putamen, visual cortex and motor cortex, are clearly less affected than the association areas and have therefore been used, either as single regions or as a composite [21], as reference. The pons has been identified in one study [22] as the region that is least affected even at late stages of AD, but it is relatively small and located deep in the skull and is thus subject to potentially high measurement variation. Metabolism in the visual cortex depends very much on the state of the subject (e.g., eyes open or closed) during tracer uptake, which is difficult to control precisely and therefore the occipital lobe is one of the brain areas with the highest interindividual variation. The cerebellum has often been used, but in moderate AD cases it is possible to detect cerebellar diaschisis even visually, when the observation of substantial cortical asymmetry points to the presence of functional cerebellar impairment [23]. The motor cortex is often spared at early stages, but not at later stages. The putamen appears to be unimpaired functionally, but it is affected pathologically by early deposition of amyloid.

Global cortical metabolism, the default option in some statistical parametric mapping programs, would be a suitable reference only if the region affected by metabolic decline was small enough to have a negligible effect on global metabolism, which clearly is not the case in AD. Alternatively, one could extract some upper quantile from the distribution of cortical metabolism under the assumption that the only pathological change to be expected is a decline and that a small proportion of brain areas (wherever they are located) will be preserved in AD. These areas could then be identified by histogramming or by selection of brain areas that appear to be relatively increased when using global intensity normalisation. This approach is reasonable and has been shown to provide superior sensitivity for detection of abnormal areas as compared to global intensity normalisation in AD [24].

Despite their limitations, reference tissue techniques have worked generally well for analysis of regional metabolic changes in AD, demonstrating a close relationship with cognitive symptoms and dementia severity, and even allowing robust multicentre analyses [25]. However, identification of sufficiently robust reference regions is more difficult or impossible in certain other diseases that can cause dementia, most notoriously in microcerebrovascular disease, which affects all parts of the brain and causes a global decline of metabolism [26]. Thus, full quantification of CMRglc appears to be required for proper assessment of the effects of cerebrovascular disease [27]. Also, pervasive diseases, such as HIV encephalopathy or multiple system atrophy, or impairment secondary to metabolic disorders or chemotherapy do not allow relative quantification of CMRglc because of the lack of sufficiently robust reference regions. The lack of full quantification has also caused a controversy with regard to the interpretation of FDG PET scans in Parkinson’s disease [28], as putaminal metabolism appears to be relatively increased (which could potentially be secondary to a lack of inhibitory dopamine effects in the putamen), and it is therefore difficult to assess the significance of relative cortical metabolic reduction.

The regional pattern of metabolic reductions is very similar to the regional distribution of cortical atrophy in AD. Most studies suggest that metabolic reduction precedes atrophy, while in patients with manifest dementia both are typically present, raising the question of the degree to which apparent metabolic reduction is secondary to PVEs. The issue has been addressed by studies using PVC, and some have come to the conclusion that metabolic reduction in neocortical association areas exceeds the amount that can be explained by atrophy, while the reverse may be true in the hippocampus, where atrophy is typically more prominent than metabolic reduction [29]. The issue has still not been completely settled because most PVC techniques are very sensitive to the accuracy of MRI segmentation, which is difficult in atrophic cortex, and to underlying assumptions about tissue homogeneity. Data acquired on a high-resolution scanner suggest that metabolic reduction in the hippocampus, too, may exceed the effects explained by atrophy [30]. Fortunately, these uncertainties are of little relevance for the practical use of FDG PET in AD because metabolic impairment and atrophy are both indicators of neurodegeneration and their combination increases the sensitivity of FDG PET. Thus, PVC would be of interest only in studies aiming at obtaining accurate quantification of metabolism in remaining tissue and separation of pathophysiological events (e.g., for accurate distinction between true metabolic deficits and age-related atrophy), while it is usually not required (and may even be counterproductive) when using FDG PET as a diagnostic tool and indicator of disease severity.

There are typical regional patterns of metabolic abnormalities associated with AD and other neurodegenerative diseases. Pattern recognition techniques can therefore be used to extract these patterns and to quantify their salience. This approach can be based on summing or averaging indices that measure deviation from normal (e.g., z or t scores) in brain regions that are typically affected in AD [31, 32], or by multivariate techniques employing principal component analysis, independent component analysis, or machine learning [3336]. Some of these measures have already been validated for use in multicentre studies (e.g., NEST-DD, ADNI) [37, 38]. They can also provide measures of clinical disease progression (Fig. 1). However, a recent comparative analysis of such numerical disease indicators derived from FDG PET also demonstrated some variation among these indicators with respect to sensitivity and specificity, especially at the clinical stage of mild cognitive impairment (MCI) [39]. Quantitative scores derived from FDG PET are potential imaging biomarkers for clinical trials [40], although they have not yet been accepted by regulatory authorities as criteria for inclusion of patients or as primary outcome markers in clinical trials.

Fig. 1
figure 1

Regression plot demonstrating the association (r = 0.47) of changes in cognitive impairment (ADAS-cog score) and metabolic impairment (FDG PET score) over 24 months in patients with MCI, as described by Herholz et al. [76]

Amyloid deposition

Fibrillary plaques containing amyloid beta are an essential feature for a definitive pathological diagnosis of AD [41]. Amyloid PET provides in vivo confirmation of amyloid beta deposition with the aim of enhancing the accuracy of clinical diagnosis [42]. In contrast to FDG PET, amyloid PET is not closely related to clinical features of the disease but provides independent information on molecular pathology. Deposition of beta-amyloid plaques in the human brain is related to age and genetic factors, and may begin several decades before the manifestation of dementia [43]. Thus, in old age a significant proportion of individuals who are cognitively healthy or who may be suffering from non-AD brain disorders may have amyloid beta plaques in their brains and thus exhibit a positive amyloid scan. This raises the issue of the clinical utility of positive scans [44], and also the question of whether quantification of amyloid PET could provide information about the disease stage and distinguish between individuals with high and low amyloid load, presuming that the former would be associated with manifest AD, while the latter may be associated with at an early, asymptomatic stage. So far, this assumption does not seem to be the case (Fig. 2).

Fig. 2
figure 2

Progression to AD in patients with MCI, as described by Nordberg et al. [77]. While none of the patients with a negative scan progressed to AD in this cohort, the probability of progression did not differ between those who showed high cortical 11C-PIB binding when compared with those with low abnormal binding

Histological quantification of amyloid load is usually based on the relative area of plaques stained histochemically within cortical tissue sections. Within individuals, the regional distribution of histological measures of amyloid load corresponds well with amyloid tracer binding [45]. However, across individuals amyloid load does not translate directly into the intensity of tracer uptake with currently available PET amyloid ligands. For instance, Clark et al. [46] demonstrated an excellent association between postmortem and amyloid PET main diagnostic categories, but only a moderate overall correlation between florbetapir SUV ratios (SUVRs) and plaque density on immunohistochemistry (r = 0.64; Fig. 3). This correlation was mainly dependent on the comparison between subjects with probable or definite AD and the rest, while it was very low (ns) within each of these subgroups (r = 0.28 in subjects with a pathological diagnosis of no or possible AD, r = 0.22 in probable or definite AD).

Fig. 3
figure 3

Scatter plot of cortical 18F-florbetapir SUVR versus cortical amyloid plaque load determined by immunohistochemistry in the post-mortem validation study by Clark et al. [46]. Most data points are located in the left lower or right upper quadrant, demonstrating excellent qualitative correspondence and absence of quantitative correlation within the quadrants

Attempts to interpret quantitative amyloid tracer binding in physiological terms are also compromised by the fact that binding is due to the physicochemical properties of the amyloid beta-sheet formation [47] and, in contrast to neuroreceptor studies, does not relate to a single receptor. A group at Karolinska [48] described at least three binding sites with different properties, and recent data suggest that binding involves a ganglioside-amyloid beta complex [49]. Possibly tracer binding to enzymes, such as oestrogen sulfotransferase [50], also contribute to a non-specific signal background.

Non-specific binding to white matter, which may partially due to beta sheet formation of basic myelin protein [51], also exhibits some variation which may be related to blood flow and other non-specific factors [52]. The variation may occasionally cause problems with visual image reading; loss of contrast between the non-specific white matter signal and cortex is the main feature indicating cortical amyloid deposition. High white matter uptake makes quantification difficult due to PVE with substantial spill-over of white matter signal to grey matter and resulting overestimation of grey matter binding. Studies have shown that this may cause substantial bias [53], and that accurate correction is difficult. Thus, it might be expected that next-generation amyloid tracers, such as 18F-NAV4649 (alias AZD4694) [54], which suffer much less from non-specific white matter binding, will provide more accurate quantification for precisely that reason. Whether they may actually overcome the current lack of a quantitative relationship between in vivo tracer binding and actual amyloid load, as discussed above, remains to be determined.

Tracer binding is usually measured as SUVRs or as distribution volume ratios (DVRs) determined by compartmental modelling or Logan plot [55]. When compared with compartmental modelling and metabolite-corrected arterial input function, simplified reference tissue models provided superior test–retest accuracy while avoiding bias [56, 57]. Static uptake ratios (SUVRs) compared well with respect to test–retest, but are subject to a systematic bias that depends on measurement time. Correlations between cortical SUVRs (static scans) and DVRs (dynamic scans) for amyloid tracers are generally high in cross-sectional studies [56, 58, 59], and assessment of SUVRs has therefore generally been accepted as a practical standard for the use of PET as diagnostic biomarker that does not require dynamic scanning [60]. However, this notion has recently been challenged, especially when using amyloid PET for monitoring of disease progression, because SUVRs are likely to be biased by changes in regional blood flow [61].

The cerebellum is often used as a reference region, both for static scans (SUVRs) and for calculation of DVRs from dynamic studies without blood sampling. This is based on the observation that, though there may be amyloid deposition in AD, such deposition does not generate fibrillary plaques and therefore does not bind amyloid tracers specifically [62]. Ideally, the cerebellar cortex is used also in order to avoid the non-specific tracer binding in cerebellar white matter, but this does require accurate placement of cortical cerebellar ROIs, which is difficult without coregistered MR scans. The pons or whole cerebellum, both of which include some non-specific white matter signal, have therefore been used instead in some studies. Actual SUVR values, and thus thresholds for discrimination between positive and negative scans, depend on the choice of the reference region while the correlation between measures derived from any of these reference regions is usually very good [63]. Accuracy of analysis also depends upon whether template-based cortical regions are defined by matching them directly to amyloid PET, or whether they are adjusted by individually coregistered and segmented MR scans. The latter is preferred for quantitative research [64], but novel adaptive template methods used with PET only may also provide accurate results [65].

The literature comparing in vivo binding among various tracers with high affinity for amyloid is still limited, but publications generally demonstrate a very close relation with correlation coefficients above 0.8 of relative uptake values [66, 67]. As is to be expected, tracers having similar affinity for amyloid and tau deposits, such as 18F-FDDNP, produce different results with respect to regional distribution and uptake values [68]. Developing specific tracers for tau imaging remains a considerable challenge because of the physicochemical similarities of beta sheets and the much higher abundance of amyloid [69].

Amyloid tracers are also being used to assess the effect of therapeutic interventions [70]. However, interpretation of such studies is difficult given the lack of a reliable quantitative relationship between tracer binding and amyloid load. In addition to the variation in physicochemical properties of amyloid among individuals and during progression of the disease, uptake values may potentially be biased by limitations of blood flow and other factors affecting tracer delivery to tissue. Therefore use of kinetic modelling with calculation of DVRs is regarded as the most accurate method to assess tracer binding [61], but it remains to be demonstrated through comparison with an independent method whether DVRs would provide more reliable measures of amyloid plaque load than SUVRs. A small published series comparing the use of the PIB DVR with post-mortem assessment also found considerable variation [71].

Perspectives

PET is the most accurate in vivo technique for measuring local tracer activity in tissue. For clinical and research use in dementia, measured tracer activity is most often transformed into measures of regional glucose metabolism or amyloid deposition. This overview examined the main factors that currently limit the quantification accuracy of these parameters. In spite of these limitations they already improve clinical diagnosis and can predict whether patients with MCI are likely to progress to AD. Reliance on a reference region, which cannot actually be guaranteed to be unaffected in neurodegenerative disease, is one of the main obstacles to substantial progress. It is hoped that further improvement and standardisation of techniques for accurate non-invasive recording of input functions will overcome that hurdle. New tracers are being developed that may provide better quantification of deposition of pathological proteins than currently available tracers do. Further improvement of statistical methods and multimodal image processing techniques are expected to provide more accurate prediction of disease progression [72, 73] and a better understanding of the multifactorial pathophysiology of dementia [74]. Ultimately, these techniques should contribute to an emerging new concept of neurodegenerative diseases that is no longer restricted to categorical diagnostic classification but identifies a cascade of interacting molecular processes [75], which might be treated by personalised and targeted intervention at an early stage before onset of dementia.