The power of FDG-PET to detect treatment effects is increased by glucose correction using a Michaelis constant
- First Online:
- Cite this article as:
- Williams, S., Flores-Mercado, J.E., Baudy, A.R. et al. EJNMMI Res (2012) 2: 35. doi:10.1186/2191-219X-2-35
- 2.8k Views
We recently showed improved between-subject variability in our [18F]fluorodeoxyglucose positron emission tomography (FDG-PET) experiments using a Michaelis-Menten transport model to calculate the metabolic tumor glucose uptake rate extrapolated to the hypothetical condition of glucose saturation: , where Ki is the image-derived FDG uptake rate constant, KM is the half-saturation Michaelis constant, and [glc] is the blood glucose concentration. Compared to measurements of Ki alone, or calculations of the scan-time metabolic glucose uptake rate (MRgluc = Ki * [glc]) or the glucose-normalized uptake rate (MRgluc = Ki*[glc]/(100 mg/dL), we suggested that could offer increased statistical power in treatment studies; here, we confirm this in theory and practice.
We compared Ki, MRgluc (both with and without glucose normalization), and as FDG-PET measures of treatment-induced changes in tumor glucose uptake independent of any systemic changes in blood glucose caused either by natural variation or by side effects of drug action. Data from three xenograft models with independent evidence of altered tumor cell glucose uptake were studied and generalized with statistical simulations and mathematical derivations. To obtain representative simulation parameters, we studied the distributions of Ki from FDG-PET scans and blood [glucose] values in 66 cohorts of mice (665 individual mice). Treatment effects were simulated by varying and back-calculating the mean Ki under the Michaelis-Menten model with KM = 130 mg/dL. This was repeated to represent cases of low, average, and high variability in Ki (at a given glucose level) observed among the 66 PET cohorts.
There was excellent agreement between derivations, simulations, and experiments. Even modestly different (20%) blood glucose levels caused Ki and especially MRgluc to become unreliable through false positive results while remained unbiased. The greatest benefit occurred when Ki measurements (at a given glucose level) had low variability. Even when the power benefit was negligible, the use of carried no statistical penalty. Congruent with theory and simulations, showed in our experiments an average 21% statistical power improvement with respect to MRgluc and 10% with respect to Ki (approximately 20% savings in sample size). The results were robust in the face of imprecise blood glucose measurements and KM values.
When evaluating the direct effects of treatment on tumor tissue with FDG-PET, employing a Michaelis-Menten glucose correction factor gives the most statistically powerful results. The well-known alternative ‘correction’, multiplying Ki by blood glucose (or normalized blood glucose), appears to be counter-productive in this setting and should be avoided.
KeywordsFDG-PETGlucose correctionMichaelis-MentenResponse to treatmentGlucose bias
Quantitative 18F]fluorodeoxyglucose positron emission tomography (FDG-PET) is increasingly relied upon to measure pharmacodynamic responses in controlled trials, bringing a greater need for accurate and reproducible scans to minimize the number of subjects needed for a successful trial. Glucose levels have long been recognized as a factor modulating FDG uptake [1–8]; but even so, there has been some debate regarding how best to compensate for changing glucose levels when comparing scans. Some investigators have eschewed glucose corrections altogether after observing increased rather than decreased statistical noise in ‘corrected’ PET measurements, attributing this, perhaps, to error in the glucose measurement itself [9, 10]. However, avoiding glucose correction poses a conundrum of interpretation when a treatment may induce a systematic change in blood glucose levels. Such treatments are known, and FDG-PET may be used to assess their impact; they include some potentially important new drugs still under clinical investigation, such as certain Akt and PI3K inhibitors [11, 12].
The seminal work of Sokoloff et al.  described the Michaelis-Menten kinetics of glucose and tracer transport and showed how the radioactive tracer uptake rate constant (Ki) could be used to estimate the tissue glucose uptake in physiological units, i.e., the metabolic rate of glucose (MRgluc = Ki*[glc]/LC μmol glucose per 100 g tissue per min). Under steady-state conditions, the half-saturation Michaelis constants (KM) and the maximal velocities (Vmax) for tracer and glucose are factored into the lumped constant (LC) which summarizes the differential properties of tracer and glucose. Scans obtained under different blood glucose levels will almost inevitably indicate different metabolic rates of glucose, and one must decide how to detect changes in tumor glucose metabolism that are not merely due to changes in blood glucose.
We recently demonstrated  that in untreated animals, both tumor Ki values and MRgluc values were, on the average, strongly correlated with blood glucose, showing that an appropriate form of blood glucose correction might facilitate the identification of treatment effects under changing glucose conditions. We sought to understand this glucose effect so that an appropriate compensating correction could be made, expecting that this would improve the power to detect treatment effects.
The Michaelis-Menten relationship between glucose concentration and transport [13–19] was used as the basis of the proposed correction. With it, we showed that, on the average, there was less variability in untreated animals when estimating the hypothetical glucose-saturated limit to the tumor metabolic rate of glucose rather than the tracer rate constant (Ki) or the actual scan-time metabolic rate of glucose (MRgluc). is the asymptotic limit to the plot of uptake rate versus [glucose]. KM is a half-saturation Michaelis constant such that .
Treatment studies, cell lines, and drug substances
Because limited experimental studies alone were inadequate to explore with any certainty the power relationships in (relatively noisy) FDG-PET data, we have supplemented these experiments with statistical simulations and with analytical derivations that are presented in Additional file 3.
The experimental setting
Our laboratory experiments employed dynamic FDG-PET to measure the tumor uptake rate constant for FDG, Ki, as a function of tumor treatment with tyrosine kinase inhibitor drugs. The experiments contained two or more groups of animals: one control group administered vehicle alone, and at least one treatment group administered an active drug in the same dosing vehicle. We analyzed data before and after 7 days of treatment, expecting that there would be no difference between the groups before treatment and that some treatment effect would be evident after 7 days. We compared Ki with two alternative PET metrics that account for blood glucose in some way, MRgluc and , to study the relative merits of each metric at detecting a true tumor treatment effect as seen in the two-sample two-sided t-test. This is also the scenario the simulations (below) and power calculations ( Additional file 3) are designed to represent.
We considered that a true treatment effect altering tumor glucose uptake was one based on a physiological change in the tumor tissue per se. Thus, for our purposes, changes in tumor glucose uptake caused merely by alterations in blood glucose were not true treatment effects but fall into our definition of false positive results.
Animal handling and imaging
Experimental details were as described previously [14, 25]. All animals were fasted overnight with access to water ad libitum. Mice were induced and maintained under light anesthesia using isoflurane in air (GDC-0879 study) or sevoflurane in air (G00033054 and GDC-0973 studies). Body temperature was maintained at 37°C with warm air flows while the eyes were protected from dehydration with ophthalmic ointment. All studies were conducted under the approval of Genentech's AAALAC-accredited Institutional Animal Care and Use Committee. All animals underwent 30-min dynamic FDG-PET scans with X-ray computed tomography (CT)-based attenuation correction just prior to starting their treatment regimen. FDG doses were infused via the lateral tail vein over a 1-min period in a volume of 100 μL.
Blood glucose measurements
At every scan, blood glucose measurements were taken twice: once approximately 5 min before and once shortly after the PET/CT scan approximately 35 min later. The glucose value used in subsequent calculations is the mean of the pre- and post-scan measurements. Data were collected with the commercially available Contour glucometer (Bayer Healthcare, Tarrytown, NY, USA) using blood freshly obtained by pricking the saphenous vein. Test-retest reproducibility measurements using this instrument in our hands showed a coefficient of variation of 3.7% .
Prior use of the experimental data
Animal models and number of mice
Number of cohorts
Number of mice
BT474 in SCID Nude Beige
HCT116 in Nu/Nu
PC3 in Nu/Nu
FaDu in CB17 SCID
H292 in CB17 SCID
H596 in huHGF transgenic
537-Mel in Nu/Nu
A2058 in Nu/Nu
A375 in Nu/Nu
Colo205 in Nu/Nu
H2122 in Nu/Nu
Tumor treatment models with established drug effects on tumor glucose uptake
Table 1 describes the subset of studies from Table 2 in which there was additional non-imaging evidence of a true treatment effect on tumor glucose uptake independent of blood glucose levels. Athymic nude mice were implanted in the right flank with a Matrigel/Hanks Balanced Salts medium containing 10 million melanoma (A375, A2058) or 5 million colorectal (HCT116) cancer cells. Tumors reached a group median volume of at least 250 mm3 prior to beginning the study. The blood glucose and FDG-PET data (Ki, MRgluc, ) are presented for these studies in Additional file 1. Cell culture experiments were used to show direct drug effects on FDG uptake, and immunofluorescence was used to show an apparent loss of GLUT-1 at the cell membrane both in cells and tumor tissue (see Additional file 2 for descriptions of and results for those experiments).
Statistical power in experimental data: p-values as a function of sample size
Two-sample two-sided t-test p-values were calculated in these three true treatment studies: A, B, and C described in Table 1. This was repeated using, , MRgluc, and Ki. We examined the p-values at baseline, where the null hypothesis should be accepted, and on treatment at day 7, where the null hypothesis should indeed be rejected based on our knowledge of drug action on tumor cell and tissue glucose handling ( Additional file 2).
False positive rates in experimental data: relation to sample size
Mice were randomized into nominal control and treatment groups, each containing n = 6 to 12 mice (Table 2), allowing 42 comparisons of two-sample two-sided t-tests to be performed on FDG-PET data collected before any treatment was administered. At this timepoint, a statistically significant result was considered to represent a false positive result. A particular study was flagged as having a high rate of false positives whenever the t-tests rejected the null hypothesis (p < 0.05) more often than the theoretical false positive rate (α) of 5%, measured across all the combinations of individuals tested. Meta-analysis of progressively smaller subsets as described above was used to assess how the false positive error rate would behave in smaller, less powerful, studies. This was repeated using , MRgluc, and Ki.
GDC-0879 is a B-RAF  selective kinase inhibitor [26, 27] that has been demonstrated to be effective against cancers carrying the V600 mutation . MEK is one of the three enzymes of the mitogen-activated protein kinase (MAPK) cascade involved with RAS/RAF signaling . G00033054 and GDC-0973 are potent and selective MEK inhibitors that have been efficacious in treating KRAS and RAF mutant cells .
All drug substances were dosed daily in 100 μL of excipient. GDC-0879, GDC-0973, and G00033054 were dosed for 7 days at 100 mg/kg, 10 mg/kg, and 25 mg/kg, respectively. All animals were dosed through oral gavage (per os). Control groups were subjected to the same regimen but received no active drug in their dosing solution.
Derivations, statistics, and simulations
We studied the properties of the two-sample two-sided t-test comparing sample means of Ki and between control and treatment groups, respectively, in analytical derivations (presented as Additional file 3) and in simulations which are described below. Data were simulated assuming either no treatment effect or assuming a treatment effect of 10% to 50% change in the glucose-saturated limit to the tumor glucose uptake rate, , specified in each simulation. As a function of the involved parameters, our study evaluated the test statistics under both the null and alternative hypotheses by estimation of false positives (including significant test results caused merely by changes in blood glucose) and the power to detect true differences in the tumor glucose uptake rate limit. Simulations were run in the statistical programming language R .
We assumed that the relationship between the FDG rate constant Ki and glucose [glc] followed the Michaelis-Menten (MM) form [14–19] and that observations of the rate constant were corrupted by noise. That is, the observed rate constant was given by , where ε is the zero-mean Gaussian with variance , here denoted as . Let represent the sample average FDG uptake rates across n observations in the control and treatment groups, respectively, and let and be the sample averages of the quantity Ki * (KM + [glc]) in the two groups. Under these assumptions, we compared the statistical properties of the t-test comparing and with the t-test comparing and .
The analytical derivation of the power functions relating to Ki and follows standard developments based on the Gaussian distribution  and is presented for the interested reader in Additional file 3. To illustrate the validity of the derivation and to delineate when provides significantly improved statistical properties vis-à-vis Ki, we simulated observations from the joint process (Ki, [glc]) as follows. Given the parameters , a single draw of (Ki, [glc]) was obtained by first sampling and , and then by evaluating . For each simulation iteration, the preceding was repeated n times each in the control and treatment groups, respectively, and two-sided t-tests were used to test for equality of means at α = 0.05 level of significance. A total of 4,000 simulation iterations were used in each setting.
For simulations under the null hypothesis, the maximal uptake rate was set the same in the control and treatment groups, and we evaluated the effect on the false positive rate (i.e., concluding that there is a treatment effect when in fact there is none) caused merely by a change in mean blood glucose. Mean blood glucose changes of 10%, 20%, and 30% were assessed.
Simulations under the alternative hypothesis compared the power of the t-tests to detect treatment effects (δ) corresponding to an approximate 20% to 30% reduction in the tumor glucose uptake rate limit while keeping the glucose distribution the same. Sample sizes were chosen between n = 6 and n = 12.
The robustness of to errors in [glucose] and KM was also investigated by simulations. For errors in the measurement of blood glucose, we replaced the quantity Ki (KM + [glc]) by Ki (KM + [glc]*), where [glc]* = [glc] + N(0, 42). That is, the Ki values were generated using the correct (uncorrupted) glucose values [glc], while was estimated using observed (corrupted) glucose [glc]*. A similar process of substitution was used with KM, using scenarios (KM = 100 mg/dL, KM* = 130 mg/dL) and (KM = 130 mg/dL, KM* = 100 mg/dL).
Results and discussion
Statistical and blood glucose-induced false-positive error rates
False positive error rates (%)
The error rates are expressed as percentages for a two-sided t-test at level α = 0.05 based on , and MRgluc as a function of glucose bias. Glucose bias is defined as the percent change in mean glucose between the control and treatment groups. Here, .
As predicted by the derivations, all three metrics (Ki, MRgluc, and ) correctly accepted the null hypothesis at baseline in the 42 comparisons of the control with treatment groups in the full experimental data (Table 2). Also as expected, false-positive results began to appear as the data were resampled at smaller sample sizes. At sample size n = 8, for example, only one comparison showed high false positive rates by Ki and , at which point MRgluc gave false positives in 6 out of the 42 studies (14%).
Elimination of MRgluc from further consideration
Because results based on MRgluc were highly influenced by relatively modest levels of glucose bias (Table 3), results that we considered to be false in terms of treatment response, we judged that the most suitable alternative to was the (uncorrected) Ki. We henceforth simplify the presentation of simulation results and analytical derivations by restricting them only to Ki and . The performance of MRgluc in the experimental data is, however, shown alongside Ki and ( Additional file 1 and Figure 1).
Statistical power in theory and in simulation
As shown in the analytical power derivations presented in Additional file 3, an improvement in power for , Pm, relative to the power for Ki, Pk, occurs whenever the coefficient of variation (CV) in Ki evaluated at the mean glucose level is less than 1. That is, with Pk, Pm the power curves for a test of means of Ki and , respectively, then, whenever CV = σε/Ki(μg) < 1, where , we have Pm > Pk. Moreover, through manipulation of Equations 1 and 2 in Additional file 3, we see that the difference Pm − Pk is monotonic, increasing with decreasing CV. Further, the difference Pm − Pk grows as increases (holding CV constant). We now detail these facts by simulation.
The third simulation case, S3, representing very noisy data where , has a maximum improvement in power of 2.2%, occurring for δ = 0.55 (plot not shown). This indicates that with low signal-to-noise ratios in the Ki measurement, there is no meaningful improvement in power from using . However, cases with high coefficient of variation inevitably have low power and require either very large treatment effects or very large sample sizes to detect a difference in means. Indeed, for case S3, we would require n = 40 for 80% power to detect a treatment effect of δ = 0.25.
Congruent with the main result outlined in the derivations presented in Additional file 3, the improvement in power is strongly dependent on the coefficient of variation in Ki, with the largest power improvement reaching approximately 25%. Moreover, the greater the coefficient of variation for Ki, the less we can discern the effects due to glucose; however, as noted, no test performs well with excessively noisy data.
Statistical power in experimental data
On the average and in agreement with the simulations, gave greater power than Ki or MRgluc in detecting the known direct on-tumor drug effects in all three tumor treatment models studied (Table 1 and Figure 1). As expected, all metrics progressively lost power as the sample size decreased. For example, in Figure 1A at eight mice per group, was able to reject the null hypothesis in 93% of the 4,000 combinations of control vs. treatment groups, while Ki did so in only 52% of the sample combinations. In Figure 1B, MRgluc completely misses the treatment effect at all sample sizes, but Ki and correctly identified it. Lastly, in Figure 1C, looking at six mice per group, we observe that detected a statistically significant difference between the groups, 89% of all the sample combinations, while MRgluc did so in only 62% of the cases. However, caution must be exercised in drawing fully general conclusions from these limited and somewhat noisy experimental data alone.
Application of MRgluc
The original intent behind the multiplication of Ki by [glucose] was to estimate the metabolic rate of glucose (MRgluc) in tissue under given blood glucose levels based on rate constants derived from monitoring a radioactive glucose-like tracer in blood and tissue [13, 32]. The estimation implies the assumption that MRgluc depends on substrate concentration, i.e., [glucose] in blood. It follows that MRgluc is unsuitable for our particular task of quantitatively compensating for changing glucose levels when comparing scans collected under different glucose conditions. Our results show that even seemingly small differences in blood glucose, such as the natural variations within a group of similar individuals, are sufficient to warrant careful attention to glucose correction when making quantitative comparisons.
The lumped constant
Measurement of the lumped constant (LC) is not trivial, and thus, the (ideal) per-patient or per-lesion values are rarely measured and reported with FDG-PET treatment studies. Instead, a common constant value of LC is applied to all scans. This approach was employed in this study too with an assumed LC value of 1, and as previously noted , the chosen value of LC simply behaves as a scaling factor common to every data point and thus makes no difference to calculated group statistics such as the coefficient of variation, t-test p-values, or correlations with blood glucose levels. The statistical results presented remain equally valid at all (non-zero) values of LC.
Glucose bias and false positive test results
All three metrics performed correctly in terms of the false positive rate in the absence of any systematic glucose difference between the treatment groups. The fact that the t-tests based on Ki and MRgluc suffer an increased false positive error rate under a glucose shift (Table 3) renders these tests admissible and useful only if one is certain that a treatment can have no systematic effect on glucose. Since blood glucose levels may vary, we suggest that makes a more robust and useful default metric for FDG-PET data.
Statistical power in the absence of any glucose bias
Figure 4 (left hand side) shows the simulated improvement in power for a modest treatment effect of 20% and a sample size of n = 8. As can be seen, the power improvement can be as large as 25% and is highly dependent on CV. As predicted by the analytical derivations, the benefit of using is most pronounced at low CV. Conversely, for values of CV greater than 35%, the power benefit is negligible even though the benefit of reduced glucose bias remains. However, for data that is very variable (relative to the mean), larger treatment effects or sample sizes are always required for adequate power, a fact that is detailed in the right hand plot of Figure 4.
Figure 4 (right hand side) shows the required sample size for Ki and as a function of the coefficient of variation in order for a study to have 80% power with a treatment effect size of 30% (δ = 0.3). As expected, for both Ki and , the required sample size is an increasing function of the CV value. We see that a CV of 22% (the average in our experiments) requires a sample size of n = 10 per group for Ki and n = 8 per group for . To further describe the results, we can assume a fixed sample size and consider what proportion of our 66 experimental cohorts represented adequately powered groups for a treatment study: For the sample size of n = 8, we see that 48% were adequately powered using , whereas only 26% were adequately powered with Ki. For a sample size of n = 12 there are more adequately powered groups, of course, but still a benefit to using : 76% using and 59% using Ki. Independent of CV, the sample size savings achieved through the use of in this simulation setting is approximately two mice; in (relatively rare) situations where a CV as low as 10% can be anticipated, we see that studies can be adequately powered with only a handful of animals per group.
Understanding this behavior has practical value in designing appropriately powered preclinical FDG-PET experiments and, perhaps, in permitting a futility analysis to be conducted after beginning a study with baseline scans and before expending further significant effort in drug dosing and repeated scanning.
Glucose ‘normalization‘ and errors in the measurement of blood glucose
Glucose sampling errors have been postulated as a source of variability experienced [9, 10] when applying the common [glucose]/constant normalization method  which is analogous to estimating MRgluc at the population mean glucose measurement (the value of the constant), typically given as 5 mM or 100 mg/dL.
We suggest that the problem with this normalization scheme lies not with the glucose measurements, but with the linear nature of the algorithm. Rather than linear scaling to the population mean glucose value, asymptotically follows the Michaelis-Menten extrapolation to a hypothetical saturating glucose level. Simulations showed that results were robust even with relatively large 10% errors in the glucose measurements (full results not shown). This can be intuited by noticing that the KM term is on the order of the [glucose] term, making the glucose measurement error, εglc, a small part of the total correction factor, KM + [glc] + εglc. We also note that the algebraic form of this correction factor, i.e., [glucose] + constant, appears as a solution in analytical derivations that simply start with the very general assumption that Ki is negatively correlated with [glucose] over a limited range of glucose values. This is presented in Additional file 3 for the interested reader.
Optimal group comparisons with linear regression
We note that is optimally estimated by regressing Ki on the quantity 1/(KM + [glc]) under the Michaelis-Menten model assumptions specified, with the noise process ε following the Gaussian distribution and with a fixed value for KM. Here, we condition on the glucose measurements and set the intercept to zero. Given our setup, in the regression framework, the t-test of equality of the maximal uptake rates and is a likelihood ratio test and the uniformly most powerful unbiased test . Moreover, statistically speaking, the regression estimator is best linear unbiased under non-Gaussian assumptions . We also note that the variance of the regression estimator and that of the sample average are close provided that the spread in the term (KM + [glc]) is low relative to its mean. In our setting, since , the linear regression and sample average solutions are very close to each other, and either may be used when testing for a treatment effect. Thus we expect that the familiar and straightforward use of sample means (averaging data from multiple individuals) will be satisfactory when using in practice, just as it is for Ki.
Quantitative comparisons of FDG-PET scans across time or between animals are subject to an elevated risk of erroneous results when they ignore blood glucose levels. Multiplying PET data by blood glucose levels or ‘normalizing’ the blood glucose to a common reference value (100 mg/dL, for example) offers no protection; in fact, it is frequently counterproductive. However, by calculating the hypothetical value of the maximum glucose uptake rate under saturating glucose conditions, , we see reduced problems of glucose bias and gain increased statistical power to detect treatment effects. Based on the average properties observed across 66 preclinical cohorts, the power improvement for was equivalent to reducing the sample size by 20% compared to the next best option, which was using the uncorrected Ki data.
These benefits were realized in our preclinical studies of tyrosine kinase inhibitors by computing using a KM of 130 mg/dL. The analytical derivations and simulation methods described in this work should facilitate the exploration and assessment of our method in other settings. Because it is superior to making no glucose correction and its benefits are easily obtained and come with no penalty, we highly recommend the use of (KM + [glc]) rather than [glucose] or [glucose]/(100 mg/dL) as the glucose correction factor in quantitative FDG-PET studies.
The authors would like to acknowledge the contributions of Annie Ogasawara, Alex Vanderbilt, Karisssa Peth, Leanne McFarland, Darlene DeLa Cruz, Jeff Tinianow, and Herman Gill for their help in executing the PET studies analyzed here.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.