Introduction

Positron emission tomography (PET) quantitation of cancer response to therapy is the focus of this review. Functional and molecular imaging modalities such as PET are increasingly being studied as biomarkers for cancer response to treatment with the goal of providing end points for clinical trials that quantitate response to treatment and, importantly, might allow early discontinuation of ineffective and expensive treatments with potentially harmful side effects. If selective PET radiotracers become accepted as biomarkers of patient outcome for clinical trials, then these same biomarkers could also prove useful for image-guided therapy, where early response assessment could be used to guide selection of the most effective therapy and, in the process, reduce harmful side effects and expense by halting cancer treatment that is unlikely to be efficacious. The concentrations of radioactive PET tracers are measured on either PET-only scanners or PET computed tomography (PET/CT) hybrid scanners or PET magnetic resonance (PET/MR) hybrid scanners. The aims of this review are to provide suggestions for selecting the method of measuring PET tracer uptake and to define the role of PET radiotracers such as 18F-fluorodeoxyglucose (FDG) and 18F-fluorothymidine (FLT) in monitoring cancer response to treatment.

Biochemistry and kinetics of uptake of the PET tracers FDG and FLT

An understanding of the biochemistry and kinetics of individual PET tracers is required to be able to select appropriate methods for quantitation of PET tracer uptake. PET imaging of the uptake of 18F-radiolabeled glucose, FDG, is an established method for cancer diagnosis, staging, and monitoring of response to treatment. FDG is an effective metabolic radiotracer for targeting glycolysis because FDG is trapped inside cells after undergoing the same phosphorylation by the enzyme hexokinase that is the first, practically irreversible, step in glycolysis. Selective concentration of FDG often occurs in cancer cells due to the Warburg effect, a term used to refer to the observation that cancer cells often have an altered energy metabolism that favors less efficient anaerobic glycolysis over the more efficient aerobic respiration used by normal cells [1]. A second PET tracer increasingly being used in clinical trials is FLT, whose target is cellular proliferation, which is another key biological process that is often upregulated in cancer cells [2]. FLT is considered a measure of proliferation because FLT is trapped in cells after undergoing phosphorylation by the enzyme thymidine kinase-1 in the effectively first irreversible metabolic step in the salvage pathway for incorporating exogenous thymidine into DNA [35]. The interpretation of FLT uptake as a measure of cellular proliferation is made complex by its reliance on the thymidine salvage pathway (and not the competing de novo pathway of thymidine synthesis into DNA [2, 4]), as recently confirmed by animal studies [5]. Nevertheless, within a given patient and tumor any changes in FLT uptake appear to reflect changes in tumor proliferation [2, 69]. Both FDG and FLT have been used in new cancer drug pharmacokinetic and efficacy evaluation, with FDG used as a biomarker for altered energy metabolism and FLT as a biomarker for cellular proliferation.

Compartmental models describing the kinetics of uptake of FDG [10] and FLT [4] have been well described elsewhere. A generic two-compartment model with a driving blood input function is shown in Fig. 1. In the two-compartment FDG model, the first compartment represents the changing concentration of FDG without a phosphate group within a cell and the second compartment is defined as the dynamic concentration of FDG with an attached phosphate group that prevents FDG from leaving the cell. Parameters that can be estimated from this two-compartment model include the rate of transport of PET tracer from blood into a cell, K 1, and the rate of the reversible return of FDG without a phosphate group from the cell to the blood, k 2, and the rate of phosphorylation of FDG that leaves FDG trapped within the cell, k 3, and the limited dephosphorylation rate of phosphorylated FDG within a cell, k 4, and, finally, the blood volume fraction that helps to quantitate the contribution of activity from FDG within capillaries to tissue time–activity curves. The K 1 rate of transport of PET tracer into a cell is capitalized while the other internal compartmental rates (k 2, k 3, and k 4) are indicated in lower case to highlight that K 1 transport is considered a macroparameter that can often be estimated independently of the other internal compartmental rates. The FLT compartmental model is similar to the FDG model with an important difference, namely presence of an additional compartment within blood that represents the changing concentration of the FLT metabolite, FLT-glucuronide, which is the only labeled FLT metabolite observed in human plasma [4]. FLT is glucuronidated primarily in the liver and its metabolite remains in the blood until clearance through the kidneys [4]. In practice, kinetic analysis of FLT uptake can account for the additional blood compartment by measuring the increasing concentration of FLT metabolites in the blood by chromatographic analysis and using this information to correct the blood time–activity curve to only include the activity concentration of the parent FLT [4]. Kinetic analyses of the compartmental models of FDG and FLT cellular uptake provide an opportunity to quantitate FDG and FLT uptake using a method that is relatively insensitive to the variable background activity present during the study [11], which can confound static measures of PET tracer uptake as discussed in more detail in the following section.

Fig. 1
figure 1

Two-compartment model with a driving blood input function with rates of transport of tracer into the cells (K 1) and out of the cells (k 2) and with rates of tracer conversion into a form that is unable to leave cells (k 3), and conversion from the trapped form back into an exchangeable variant of the tracer (k 4)

Qualitative PET assessment of cancer response to treatment

Lymphoma is an example of a cancer whose response to treatment is typically qualitatively assessed using PET images of FDG uptake as shown in Fig. 2. In the setting of a highly responsive tumor, such as lymphoma, post-therapy qualitative assessment of tracer uptake provides a highly predictive method of assessing response that may be adequate for both clinical trials and clinical practice [12]. Quantitative PET assessment can be more important for less responsive cancers where classifying partial metabolic response (versus stable disease or progression) is informative, or where a continuous measure of response provides more power to predict downstream outcomes such as survival.

Fig. 2
figure 2

Baseline FDG PET scans of two non-Hodgkin’s lymphoma patients (on the left) and subsequent post-treatment FDG scans (on the right) with tumor locations indicated by arrows, showing a complete metabolic response in the top patient and a partial metabolic response in the bottom patient

Methods for quantitating uptake of PET tracer

In clinical practice, FDG PET scans are normally interpreted visually, with quantitative maximum standardized uptake values (SUVs) [13, 14] used after detection, as needed, for lesion characterization [15]. This approach is illustrated in Fig. 3 in which before- and after-treatment images demonstrate response to therapy with moderate reduction in both the extent and FDG avidity of osseous breast cancer metastases. Near confluent involvement of the right ilium (thick horizontal arrows) before treatment, with a max SUV of 14 g/mL, improved to heterogeneous involvement after therapy, i.e., to a max SUV of 7 g/mL, while focal increased FDG avidity in the left proximal humerus (thin vertical arrows), with a max SUV of 15 g/mL, decreased to a max SUV of 4 g/mL after therapy. A moderate decrease in FDG avidity in visualized spine and ribs can also be noted. Decreases of the two highest max SUVs following therapy facilitate characterization of metastases as responding to therapy. Intuitively, quantitative evaluation might seem superior to visual assessment in the categorization of neoadjuvant therapy response. Unfortunately, almost no studies have compared the accuracy of visual assessment and semi-quantitative methods for response prediction. One such study in patients with lymphoma found that SUV measurements only mildly improved the predictive accuracy of PET over visual assessment, from 65 to 76 % [16]. The European Organization for Research and Treatment of Cancer (EORTC) PET study was the first to classify specific quantitative changes in FDG uptake into four response categories (progressive or stable metabolic disease or partial or complete metabolic response) that could be compared to traditional clinical trial end points such as overall survival. The PET response criteria in solid tumors (PERCIST) subsequently provided more comprehensive guidelines for using changes in quantitation of FDG uptake to categorize solid tumor response to treatment [17] and are currently the most widely accepted PET guidelines for quantitating response.

Fig. 3
figure 3

Coronal plane FDG PET and fusion PET/CT images in a patient at baseline (a) and after 12 weeks of fulvestrant therapy (b) showed max SUVs decreasing from 14 to 7 g/mL in the right ilium (thick horizontal arrows) and from 15 to 4 g/mL in the left proximal humerus (thin vertical arrows) (color figure online)

Static quantitative techniques

Static quantitative measures of response, such as SUV, are important for consistency in multicenter trials evaluating FDG and new radiotracers, especially since the serial images are often sent to a single processing center for additional analysis. Quantitative measures of static PET uptake are often reported as a target-to-background ratio of the activity in the region of interest to background activity in corresponding normal tissue, to provide a background-normalized quantity that facilitates comparisons with similar measurements made in different patients or on different days in the same patient. In cancer patients, however, it is not always possible to identify background regions in which PET tracer uptake is known to be normal and unchanged between serial scans of the same patient. The solution to this potential difficulty in identifying an appropriate background region is the SUV, where the activity concentration in the cancer is normalized by the amount of tracer injected in the patient and some measure of the patient’s body habitus. SUVs are calculated by the dividing the PET activity concentration expressed in units of activity per unit volume by the ratio of the activity of injected PET tracer to some measure of body habitus, such as weight, lean body mass [18], or body surface area [19]. For example if you measure the weight-based SUV inside a region that uptakes five times more PET tracer than the rest of the available 70-L volume for PET tracer, the weight-based SUV will be 5 g/mL and independent of the amount of injected tracer while a simple activity concentration measure of the same region will be either 13.2 kBq/mL or 26.5 kBq/mL, depending on whether the activity of the injected PET tracer is 185 or 370 MBq. The resulting SUVs are expressed in units of g/mL when using patient weight or lean body mass or in units of cm2/mL when using body surface area to describe the patient’s body habitus. If SUV measures appear to be an order of magnitude too high or low when viewing patient images on an unfamiliar image workstation, it can be helpful to check the SUV units to ensure that the workstation is displaying SUV using the desired measure of body habitus. Radiologists often prefer SUVs to simple activity concentrations since SUVs quantitate differences in relative target-to-background uptake of PET tracer between patients while minimizing the quantitative impacts of differences in patient size and amount of injected tracer, which may be helpful in identifying disease.

Dynamic quantitative techniques

Quantitation of radiotracer uptake from kinetic analysis of dynamic PET images takes advantage of additional information provided by changing PET radiotracer concentrations in both the tissue and the blood supply of radiotracer [20]. Dynamic PET imaging can provide more sensitive measurement of FDG tracer uptake because quantitation from kinetic analyses of FDG images is insensitive to variable background of FDG uptake in surrounding normal tissue and blood supply [11, 21, 22]. PET data can be categorized most accurately with kinetic analysis of dynamic PET images [11], and this should be performed first for new radiotracers [10, 23]. Dynamic PET data can be analyzed graphically [24] or using nonlinear techniques, such as compartmental analysis or spectral methods [20], to estimate the net uptake of PET tracer, called flux or Ki. There are many comparisons of the results of quantitation from static images versus dynamic PET images available in the literature [11, 2527]. The disadvantages of the graphical (Patlak) method [24] are its assumption that no bound radiotracer is released during the imaging session and its inability to estimate the reversible transport of radiotracer into the cell. The advantage of the graphical method is that calculation involves application of a simple linear fit that requires no special software, unlike full kinetic analyses that use nonlinear optimization methods, which require more sophisticated software and more highly trained operators. As well as providing uptake measures that are independent of background signal, nonlinear kinetic analyses have the additional benefit of providing estimates of the rate of radiotracer delivery (K 1), which were found to be the only independent predictor of disease-free survival and overall survival in multivariate models of response of locally advanced breast cancer to chemotherapy in 75 patients [22]. Table 1 provides a comparison of methods for quantitating cancer uptake of PET tracers like FDG and FLT.

Table 1 Comparison of methods for quantitating PET tracer uptake

Attenuation correction of PET images

Attenuation correction of PET images is critical for quantitative PET measurement with this correction usually having the largest impact on quantitative values. The PET scanner radiation source for attenuation correction can impact on PET quantitation. PET-only scanners using PET sources for attenuation correction are the gold standard since a PET attenuation source radiation has the same attenuation properties as the PET tracer radiation. PET/CT scanners use X-ray sources for attenuation correction and X-rays are the next most widely accepted source for attenuation correction. Quantitative measures from PET images using magnetic resonance imaging (MRI) for attenuation correction are not yet widely accepted although ongoing research is seeking both to improve the accuracy of activity concentration measurements from PET/MR scanners and to evaluate any measurement biases between PET/CT and PET/MR scanners.

PET tracer test–retest reproducibility

Image quantitation is prone to test–retest variability. Quantitative PET measurements are affected by attenuation, scattered and random coincidences, and dead time correction algorithms and user-defined factors, including image acquisition settings such as duration of PET acquisition, thickness of PET slice, acquisition mode (3-dimensional versus 2-dimensional with use of lead septa between PET detectors), reconstruction algorithm, and other PET instrumentation considerations [28]. Other factors that can impact on PET quantitation include the algorithm used to define the tumor boundaries [29], the time interval between injection and scanning [30], as well as metabolism and plasma clearance of the radiotracer [23]. Bone marrow uptake due to stimulating drugs can also lower SUVs [31], although in some cases the additional uptake of FDG by bone marrow between serial FDG SUV measurements does not impact on the ability of SUV changes to predict response of breast cancer to chemotherapy [21].

To quantitate instrumental uncertainty, multicenter test–retest studies using nine-month, half-life 68Ge-germanium epoxy phantoms have found that the variability of single PET measures can range from an ideal variability of 4 % [32] to a variability of 8–23 % with central analysis of multicenter results, or as great as 43 % without central analysis of local multicenter results [33]. There is understandably more error in test–retest of quantitation of FDG uptake in humans due to sources of variability related to biological factors and variance in patient preparation and imaging protocol components [15]. Reported variability of FDG SUVs in patient test–retest studies ranged from the ideal of 10 % [25, 29, 3436] to 46 % [37]. The largest SUV repeatability study of 62 patients with gastrointestinal malignancies observed an intrasubject coefficient of variation decrease from 16 % from local site reported SUVs to 11 % after applying centralized quality assurance and analysis [38]. A recent meta-analysis looking at test–retest reproducibility of SUVmax and SUVmean found that SUVmean had mildly better repeatability than SUVmax with better reproducibility in larger lesions [39]. However, a recent study comparing SUVmax versus SUVmean, SUVpeak (SUVmean from a volume of 1 cm3 in a tumor’s region of highest average radiotracer concentration), and SUVtotal (sum of all SUV values from every pixel in a tumor segmented by an experienced nuclear medicine radiologist) found that different SUV definitions yielded 20 % variation of values for individual tumor response and variation of up to 90 % for a single SUV measure [40]. Therefore, appropriate selection of the method used to quantitate radiotracer uptake for the SUV calculation is very important.

FDG and FLT measures correlate with cancer patient outcomes

Quantitation of FDG and FLT uptake to measure cancer response to therapy is an active area of research and the last several years have seen the publication of results from numerous single-center prospective trials. In 2010, the Canadian Agency for Drugs and Technologies in Health (CADTH) did a systematic review of clinical effectiveness and found support for using FDG PET to monitor response to treatment in metastatic breast cancer and lymphoma [41]. Results from studies performed in a variety of malignancies including breast [22, 42], head and neck [6, 7, 43], pancreatic [44], lung [8, 45, 46], metastatic colorectal [9], and rectal cancer [4750] demonstrate that quantitative PET is a powerful tool for predicting progression and/or overall survival. Standardization of the timing of PET scans and of quantitation methods [15, 17, 51, 52] for different cancer subtypes and treatments would facilitate multicenter trials to determine the sensitivity and specificity of FDG and FLT PET measures to detect and monitor treatment response in different cancers. Multicenter prospective randomized trials are needed to provide high-quality evidence for or against use of PET in both the diagnostic work-up and the monitoring of response to treatment.

Limitations in PET quantitation

Differential PET measurement error and bias for patient studies at local sites participating in multicenter trials is difficult to measure or predict due to potential longitudinal changes in measurement bias due to PET scanner instrumentation [32, 53] and longitudinal biological changes within patients that are not related to the cancer or the treatment. Preliminary guidelines for designing multicenter clinical trials that use PET measures as end points have been published to enable trial design to account for differences in PET quantitation methods and measurement error at different centers [11, 54].

The partial volume effect refers both to image blurring due to scanner spatial resolution limitations (detector design and reconstruction algorithm) and to inadequate expression of tumor heterogeneity due to multiple signal intensities averaged over the examined volume [23, 55]. Correction for partial volume effect and normalization by blood glucose yielded the highest diagnostic accuracy in differentiating between benign and malignant tissue in small lung nodules [56] and breast lesions [57]. Additionally, partial volume correction (PVC) can increase the correlation between the Ki-67 score, a marker of proliferation, and FDG uptake [58]. In some cases, the difference after using PVC has been shown to be sufficient to change the EORTC classification of metabolic response [59]. Research supporting the use of PVC has led to the development of sophisticated algorithms, some tailored for oncologic applications [60].

There is some evidence that PVC does not lead to more accurate prediction of tumor response. A recent study examining baseline SUV in patients with esophageal cancer found that PVC did not improve the prediction of therapy response or the prognostic value of PET [61], although it should be noted that the tumors included in this study were larger (40 ± 30 cm3), and partial volume effects due to spatial resolution limitations would have a lower impact on PET measurement error in tumors of this size. SUV normalization and PVC did not influence the predictive value of PET imaging in an additional large study of esophageal cancer patients [62]. One recent study compared the performance of eight metabolic indices for the early assessment of tumor response in patients with metastatic colorectal cancer treated with chemotherapy. The metabolic indices included four SUVs without PVC, two SUVs with PVC, metabolic volume and total lesion glycolysis measurements. The SUVs without PVC accurately predicted the tumor response. Neither the use of SUVs with PVC nor measuring total lesion glycolysis improved the assessment of tumor response compared to SUVs without PVC [63]. One reason why PVC may not always be helpful is that SUV without PVC correlates with both metabolic activity and metabolically active tumor volume. Instead, SUV with PVC correlates only with metabolic activity without benefiting from the potentially informative effects of tumor volume change in response to treatment. As a result, some investigators have suggested that it would be opportune to consider both SUV with PVC and metabolic volume in lesion assessment, either separately or in combination (e.g., SUV with PVC × volume = PVC total lesion glycolysis) [59].

Tumor heterogeneity refers to molecular characteristics, such as variation in receptor expression and proliferation rate, as well as macroscopic characteristics such as central necrosis and perfusion. The partial volume effect may be worsened by tumor heterogeneity, which can also impact on compartmental models. Models that assume a homogeneous tumor region of interest may not be ideal for some tumor types.

Movement is another confounding factor that can interfere with PET/CT co-registration and quantitation [64, 65] and needs to be addressed with a data analysis technique such as one that includes information from respiratory gating. This is an active area of PET instrumentation research that is beyond the scope of this review.

Future directions in quantitative PET imaging may include textural features such as SUV combined with multiple parameters such as fractal dimension and tumor volume, as well as multiscale computational modeling at the subcellular and cellular level [23].

Summary

More studies showing that PET measures of cancer response to treatment correlate significantly with patient outcomes are required to convince oncologists and insurers to accept PET measures of response as biomarkers that can serve as clinical trial end points and direct the treatment of individual patients. To ensure maximum power to measure significant changes in cancer response to treatment, we encourage the use of more sensitive methods of quantitating PET tracer uptake, such as kinetic analyses of dynamic PET images, when first using a novel PET tracer or when studying a new disease or patient cohort with existing tracers such as FDG or FLT. We support the subsequent use of simplified quantitation methods such as SUVs in later phase studies testing correlations between PET measures of cancer response and patient outcome, as long as the ratio between tracer uptake by the studied cancerous process and background uptake is high and there is only low production of PET-labeled metabolites. If the PET tracer is expected to have a moderate target-to-background ratio or substantial amounts of PET-labeled metabolites, then one should consider quantitating PET tracer uptake using kinetic analysis of dynamic PET images, or at least ensure that any clinical trial design compensates for the lower sensitivity of using a static PET measure such as SUV. If PET measures become accepted as prognostic biomarkers, then clinical trials and image-guided therapy for individuals can use PET measures to potentially improve patient survival and quality of life by ending ineffective therapy early in the course of treatment.