Background

The standardized uptake value (SUV) currently is the nearly exclusive means for quantitative evaluation of clinical [18F]fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) whole-body investigations. However, the SUV methodology has well-known shortcomings such as uptake time dependence of the SUV, unsatisfactory test/retest stability, susceptibility to errors in scanner calibration etc. [16] all of which adversely affect the reliability of the SUV as a surrogate of the metabolic rate of FDG (and ultimately of glucose consumption).

In this context, it has been recognized repeatedly that at least part of the mentioned problems can be reduced or eliminated if tumor SUV is normalized to the SUV of a suitable reference region [7]. Especially, the liver has drawn considerable attention as a useful reference region since the liver does not irreversibly trap the FDG and maintains a roughly constant SUV level during the time window relevant for whole-body FDG PET (about 60–120 min p.i.) [813]. In fact, the liver is the only reference region which so far has been studied and used extensively.

Using the tumor-to-liver-ratio (TLR) obviously removes some of the SUV limitations, i.e. possible inaccuracies regarding actually injected dose, scanner calibration, and patient weight index (either actual body weight, lean body mass [14], or body surface area [15]).

However, TLR exhibits an uptake time dependence comparable to that of tissue SUV itself (without a generally accepted means of quantitatively correcting for this effect in either case). Possibly more important, liver SUV (SUVliver) will exhibit an inter-individually (and possibly also intra-individually in case of aggressive treatment such as chemotherapy) variable relation to the given arterial tracer supply. On the other hand, it is the latter—expressed in SUV units (SUVblood)—which determines a given lesion’s observed SUV. Usefulness of the liver as a reference might be further compromised in the presence of liver disease or pharmacological intervention [16, 17]. Last but not least, depending on the investigation, the liver might simply not be routinely included in the field of view of the PET scan (e.g. at the participating sites in head and neck investigation, the liver is not always included in the FOV while a sufficiently large part of the aorta still is). For all these reasons, the liver cannot be considered an ideal reference region.

In recent publications, we have systematically investigated the tumor-to-blood SUV ratio (SUR) for normalization of tissue SUVs which in our view offers principal advantages in comparison to TLR. For one, the SUR approach by definition eliminates the influence of the persisting residual variability of SUVblood on lesion SUV and ensures that SUR is superior to lesion SUV itself as a surrogate parameter of the metabolic rate of FDG [18]. Additionally, we were able to show that it is possible to reliably correct SUR for variations of the 18F-FDG uptake period under rather general and empirical well-fulfilled assumptions regarding the shape of the arterial input function (AIF) [19]. These advantageous properties of the SUR can be ultimately traced back to the empirical fact, that the AIF after FDG bolus injection exhibits an essentially invariant shape, following a simple inverse power law starting immediately after the bolus phase. Finally, we found strong evidence in a survival analysis of 130 patients with esophageal carcinoma that the superior properties of SUR also translate into a higher prognostic value [20]. While there is thus rather strong theoretical and empirical evidence for the superiority of SUR over SUV, it is so far an open question how performance of SUR compares to that of TLR.

The primary aim of the present investigation, therefore, was accurate determination of the degree of correlation between TLR and SUR. A secondary goal was to perform a first direct comparison of the performance of TLR and SUR as predictor of therapy outcome. For this purpose, we have utilized the patient group previously investigated in [20].

Methods

Patient group

In this retrospective study, 424 patients (358 men, 66 women) with mean age (range) 63 (37–85) years and different tumor entities (head and neck cancer N = 36 (HNC), non small cell lung cancer N = 178 (NSCLC), esophageal carcinoma N = 210 (EC)) were included. This patient group incorporates 130 patients with esophageal carcinoma treated with definitive radio(chemo)therapy previously investigated in the already mentioned study by Bütof et al. [20]. This subgroup is utilized in the present study for comparison of the prognostic value of TLR and SUR. In 84 out of 424 patients, two PET scans were performed at different days, where the first scan was before radio(chemo)therapy and the second scan afterwards. Time between first and second scan was on average 39.1 days (range 10–76). These data were included to study the intra-subject variability. In 49 out of 424 patients, dual time-point measurements were performed, and the respective late scans were included to extend the range of covered uptake times (up to 120 min). Altogether, 557 18F-FDG PET/CT scans were performed at University Hospital, Technische Universität Dresden (Site A) and at the University Hospital, Otto-von-Guericke University Magdeburg (Site B). Only scans where the liver as well as the aorta was in the FOV were included. Scan characteristics are summarized in Table 1. All scans besides the above mentioned were performed before radio(chemo)therapy and/or surgery. All patients had fasted for at least 6 h prior to 18F-FDG injection. The serum glucose concentration measured prior to injection was 5.9 mmol/L on average (range 3.3–10.7).

Table 1 Scan characteristics

Image analysis

ROI definition and ROI analyses were performed using the ROVER software, version 2.1.20 (ABX, Radeberg, Germany). Here and in the following, “ROI” is used synonymously with “VOI” for denoting a three-dimensional volume of interest.

The metabolically active part of the primary tumor was delineated by an automatic algorithm based on adaptive thresholding taking the local background into account [21]. The result of the automatic delineation was inspected visually by an experienced observer (one observer at each site) and corrected manually in case of obvious segmentation failure. For the resulting ROIs, SUVmax was computed. In the following, the index “max” is omitted, since only the maximum of lesion SUV and derived quantities (TLR, SUR) was considered in the evaluation.

The arterial blood SUV was determined by defining a roughly cylindrical aorta ROI in the attenuation CT data which than was transferred to the PET data. To exclude partial volume effects, a concentric safety margin was used in the transaxial planes, centering the ROI in the aorta. Planes showing high tracer uptake close to the aorta (pathological or otherwise) were excluded. The aorta ROI was positioned in the descending aorta, and the minimum volume was 5 ml. For the determination of the SUVliver, a spherical 3D ROI with a diameter of approximately 3 cm (14 ml) was placed on the normal inferior right lobe of the liver. TLR (SUR) was computed as ratio of maximum lesion SUV and mean SUV of the liver (aorta) ROI. In the following, we omit the index “mean” for liver (aorta) SUV. Scan time corrected SUR values were computed as described in [19]:

$$ \begin{aligned} \text{SUR}_{\text{tc}} & = \frac{T_{0}}{T} \times \left({\text{SUR}} - V_{r}\right) +V_{r}\\ &= \frac{T_{0}}{T} \times {\text{SUR}} + \left(1 - \frac{T_{0}}{T} \right) \times V_{r}\,, \end{aligned} $$
((1))

where T is the actual scan time p.i. and T 0 is the chosen standard scan time to which the SURs are normalized (60 min in the present work). V r =0.53 ml/ml is an estimate of the apparent volume of distribution, corresponding to the y-axis intercept of a Patlak plot, previously derived in dynamic investigations [22]. Note, that for not too small SUR values, the influence of V r is small and might be neglected, simplifying the correction formula to \(\text {SUR}_{\text {tc}} = \frac {T_{0}}{T} \times \text {SUR}\).

As our previous work [20] demonstrates, the scan-time correction distinctly improves the prognostic value of the SUR, and it is thus the scan-time corrected value SURtc which should be compared against TLR. Of course, TLR is scan-time dependent as well but usually no attempt is made to correct for this effect, so the primarily relevant comparison is that between SURtc and this (scan time uncorrected) TLR. But, for completeness sake, we also compared SURtc with a scan-time-corrected TLR as follows. For scan time correction of TLR, we note that the SUVliver is nearly time-independent in the relevant time window (≈ 60−120 min p.i.) so that the fractional change of TLR over time is essentially identical to the corresponding change of lesion SUV. In [19], we have demonstrated that scan time correction of lesion SUV is possible—although somewhat less accurate than for SUR—but in principle requires knowledge of SUVblood. However, an approximate correction is possible without this knowledge. When using the TLR approach instead of SUR (i.e. in absence of SUVblood determination) this approximation would be the only feasible approach which we have thus used in the present investigation. The resulting correction formula is

$$ \text{TLR}_{\mathrm tc} = \left(\frac{T_{0}}{T}\right)^{1-b} \times \text{TLR} $$
((2))

where b=0.313 is a parameter describing the shape and decrease of the arterial input function over time (see [19] for details).

Statistical analysis

Inter-subject variability of SUVblood and SUVliver was analyzed in the whole patient group where for patients with two PET scans only the first scan was used (N = 424). Inter-subject variability was assessed as standard deviation (SD) of the distribution of the respective SUV. Intra-subject variability of SUVblood and SUVliver was analyzed in the subgroup of 84 patients that received two scans on separate days. It was assessed as SD of the distribution of ΔSUV (= paired difference of the respective SUV in the second and first scan). Inter- and intra-subject variabilities were compared using a two-sided F test of the corresponding variances (squared SDs) testing the null hypothesis that they are equal.

Linear correlation analysis of liver vs. blood SUV and of TLR vs. SURtc (N = 557), respectively, was performed and visualized through scatterplots. Linear correlation analysis was also performed for LBR vs. SUVblood as well as for SURtc/TLR and SURtc/TLRtc, respectively, vs. SURtc. Variability of the respective ratios was assessed via histogram analysis and quantified by mean ± SD and 90 % confidence interval (CI).

Survival analysis was performed in the patient group already analyzed in [20] where the prognostic value of several PET parameters and of clinically relevant parameters for overall survival, locoregional tumor control, and distant metastases-free survival (DM) was investigated. In the present study, we investigate the prognostic value of TLR and TLRtc for DM (for which the largest effect size was found in our previous study) using univariate Cox regression. For comparison, we also show the already published results for SUV and SURtc. Hazard ratios were compared using the bootstrap method (random re-sampling with replacement; 105 samples) to determine the statistical distribution of (HR1−HR2) from which the relevant P value than was derived. Statistical significance was assumed if P<0.05. Statistical analysis was performed with the R language and environment for statistical computing [23] version 3.1.2.

Compliance with ethical standards

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.

Results

Voxel intensities of blood and liver ROIs exhibited comparable average standard devitations of 10.7 % (liver) and 11.3 % (blood), respectively, corresponding to standard errors of the mean values of 1.36 % (scan start p.i. 60–75 min: 1.22 %, >75 min 1.49 %) (liver) and 1.43 % (60–75 min 1.23 %, >75 min 1.60 %) (blood), respectively.

The mean values of SUVblood and SUVliver across all 424 investigated patients were 1.79 ± 0.36 and 2.56 ± 0.55, respectively. The mean intra-individual paired differences, ΔSUVblood and ΔSUVliver, in 84 patients receiving two PET scans on different days were 0.05 ± 0.32 and 0.24 ± 0.42, respectively. This demonstrates that the inter- and intra-subject variability (i.e. the respective standard deviations) of both SUVs are of very similar magnitude (although the small positive difference between the inter- and intra-subject SUVliver variability actually reaches statistical significance [ P=0.003]).

Correlation analysis revealed a pronounced linear correlation of SUVliver and SUVblood (R 2=0.83) and of TLR and SURtc (R 2=0.92). Corresponding scatterplots are shown in Fig. 1. There were no notable differences between investigating sites, tumor entities, or tumor size (Table 2). For LBR, we obtained 1.47 ± 0.18 (90 % CI 1.2–1.78). Corresponding scatterplot and histogram are shown in Fig. 2. For the SURtc/TLR ratio, we obtained 1.14 ± 0.21 (90 % CI 0.82–1.48) and for the SURtc/TLRtc ratio 1.38 ± 0.17 (90 % CI 1.12–1.65). Corresponding scatterplots and histograms are shown in Fig. 3. Obviously, time correction of TLR reduces the fractional variability of the ratio (from about 18 to 12 %). For all ratios, there was no notable difference between investigating sites, tumor entities, or tumor size (Table 3).

Fig. 1
figure 1

a Correlation between SUVliver and SUVblood. b Correlation between TLR and SURtc. Black lines represent the least squares straight line fits to the data. Red lines depict the 95 % CI

Fig. 2
figure 2

a Correlation between LBR and SUVblood. b Frequency distribution of LBR. a Black line represents the least squares straight line fit to the data. Red lines depict the 95 % CI

Fig. 3
figure 3

a Correlation between the SURtc/TLR ratio and SURtc. b Corresponding frequency distribution. c Correlation between the SURtc/TLRtc ratio and SURtc. d Corresponding frequency distribution. Black lines represent the least squares straight line fits to the data. Red lines depict the 95 % CI

Table 2 Correlation (R 2) of SUVblood vs. SUVliver and TLR vs. SUR. All correlations were significant (P < 0.001)
Table 3 Variability of the ratios LBR, SURtc/TLR, and SURtc/TLRtc

Survival analysis (N = 130) revealed TLR and TLRtc as significant prognostic factors for DM without being significantly different from each other (HR = 3.3 and HR = 3, respectively). These hazard ratios are to be compared with the previously reported results from this patient group [20] for SUV (HR = 2.2) and SURtc (HR = 4.1). Further details can be found in Table 4. Corresponding Kaplan-Meier curves are shown in Fig. 4.

Fig. 4
figure 4

Kaplan-Meier curves with respect to DM (N = 130 patients with esophageal carcinoma). Results for SUV and SURtc have been taken from our paper [20]

Table 4 Univariate Cox regression with respect to DM (N = 130 patients with esophageal carcinoma)

According to bootstrap resampling, HRs of TLR and SURtc were both significantly higher than the HR of SUV (P=0.019 and P=0.048, respectively) while the HR difference between TLRtc and SUV was not significant (P=0.17). The HR difference between SUR and TLR or TLRtc was also not significant (P=0.31 or P=0.16).

Discussion

Figure 1 a demonstrates a pronounced but far from perfect linear correlation between SUVliver and SUVblood. Indeed, a stronger correlation of both quantities might be expected since in any single investigation, tracer uptake at a given time point in any given target region (the liver included) is proportional to the overall scale of the AIF and, consequently, to its value at the chosen time point. Thus, in view of the fact that the AIF exhibits an essentially invariant shape across different investigations [18, 19] and presuming the metabolic state of the liver could be considered sufficiently similar with respect to uptake and release of FDG across different investigations/patients, a near-perfect linear correlation (actually, a proportionality) would result in Fig. 1 a, at least for sufficiently standardized uptake time. However, this is not the case.

Considering the possible explanations, it is easily verified that the deviations from a perfect straight line are not a consequence of statistical errors due to the given signal to noise ratio of the corresponding ROI averages [24]. Systematic errors due to regionally variable accuracy of attenuation or scatter correction, too, would not be able to disturb the linear correlation to such an extent.

Excluding measurement-related effects, two obvious possible explanations remain for the sizable deviations from perfect linear correlation. First, the correlation might be adversely affected by differences in uptake time (color-coded in the scatter plots) since the time activity curves in liver and blood have different shapes and the LBR, thus, is time-dependent (slowly increasing over time). It is obvious from the color-coding of the data points according to uptake time in Figs. 1 a and 2 a that this effect at most is responsible for a minor part of the scatter, driving LBR to somewhat higher values at late times (which on average correspond to lower SUVblood values, explaining the small but significant negative correlation of LBR and SUVblood in Fig. 2 a).

The only remaining plausible explanation in our view is to attribute the scatter to non-negligible inter- and intra-individual quantitative differences of FDG kinetics in the liver between different patients or scans.

Regarding the degree of intra-subject variability of SUVliver and SUVblood separately, our results are in good quantitative agreement with [4]. Our data furthermore demonstrate that inter-subject variability of both quantities is very similar to the respective intra-subject variability (although the difference reaches statistical significance in case of the liver where inter-subject variability is slightly larger than the intra-subject variability). We believe this to be an important observation in itself; intra- and inter-individual fluctuations of SUVliver and SUVblood do have very similar magnitude.

While our data thus essentially confirm and augment existing data regarding inter-scan variability of SUVliver and SUVblood they, furthermore, provide to our knowledge the first comprehensive investigation of the degree of correlation between both quantities. Regarding utilization of the liver as reference region for lesion SUV normalization, our data demonstrate that the liver in fact cannot be considered a highly accurate substitute for actual arterial tracer supply; from the data shown in Fig. 2, we derive an LBR of 1.47 ± 0.18 with a 90 % confidence interval of 1.2–1.78 whose limits differ by 48 %. These fluctuations directly translate into spurious fluctuations of the derived TLR values which would erroneously be interpreted as being due to changes in lesion metabolism.

The magnitude of this effect is demonstrated in Fig. 1 b where TLR is compared to SURtc. While the correlation coefficient is larger than that in Fig. 1 a (ultimately a consequence of the much higher dynamic range of SUR and TLR in comparison to SUVblood and SUVliver), the SURtc/TLR ratio in fact exhibits a fractional variability that is distinctly higher than that of LBR (about 18 vs. 12 %) which also is apparent from a comparison of Fig. 2 and Fig. 3 a, b. This increased variability is caused by the fact that we use SURtc here, rather than the scan-time uncorrected SUR for the reasons explained in the introduction.

Since uptake time correction of TLR is currently not applied in clinical routine, one thus actually faces a variability of TLR in comparison to SURtc of 1.14 ± 0.21 (90 % CI 0.82–1.48) if actual scan times are as variable as in our study group.

Also performing uptake time correction for TLR approximates the situation where uptake times would be strictly standardized (to 60 min in the present case). This leads to the results shown in Fig. 3 which indeed demonstrate a very similar mean and SD of the SURtc/TLRtc ratio in comparison to the LBR data in Fig. 2. This should be expected if the scan-time correction performs well since the time dependence of LBR itself is rather weak as already discussed above. The bottom line here is that the SURtc/TLR ratio exhibits variability which is at least as large as that of LBR but will be substantially higher under typical clinical conditions where uptake times can vary considerably [25, 26].

Accepting our point of view that SURtc for principal reasons should be considered to represent the best available surrogate of lesion glucose consumption (since it uses the “correct” way of normalizing directly to the actual arterial tracer supply and accounts for time dependence of both; lesion uptake and AIF) the stated variability of SURtc/TLR represents a principal limitation of the TLR as a surrogate of lesion glycolysis.

Of course, even if this conjecture is correct, the real question is how TLR performs in comparison to SUV and SUR regarding its prognostic value. In comparison to SUV, it has been repeatedly shown [12, 13, 27] that TLR is capable of improving the prognostic value of the PET investigation. It thus is unquestionably a valuable concept. On the other hand, the much more recently proposed SUR has not yet seen wide-spread evaluation and a comparison to TLR has been completely missing so far. We therefore consider the results presented in Fig. 4 and Table 4 of special interest. They clearly demonstrate that TLR as well as SURtc are superior to SUV as predictors of DM in the investigated patient group. Uptake time correction of TLR (TLRtc) did not improve the prognostic value as described by the hazard ratios in Table 4 in comparison to TLR. This was an initially somewhat unexpected result since uptake time correction reduces the deviations from a constant SURtc/TLRtc ratio. This finding might indicate that in our patient group the improved prognostic value of SURtc is caused mainly by the beneficial influence of normalization to SUVblood rather than by scan-time correction. However, further investigations will be necessary to clarify this question.

The already previously reported that HR of SURtc is distinctly higher than HR of TLR (HR[ SURtc] = 4.1, HR[TLR] = 3.3). In the given study group with its limited group size, though, the increase of HR is not large enough to reach statistical significance in the bootstrap resampling analysis. This indicates that the principal advantages of SURtc over TLR (consideration of actual arterial tracer supply and accurate uptake time correction) are not decisive at the given level of statistical accuracy available in our study group. Nevertheless, we believe that the observed very weak indication of superiority of SURtc over TLR is a sufficient incentive to further investigate the relative performance of TLR and SUR in other patient groups. Personally, we believe it very likely that ultimate superiority of SUR over TLR will be demonstrated since the latter parameter does not allow to fully account for the inter- and intra-individual variability of arterial tracer supply (and thus remains subject to spurious changes which are unrelated to differences in lesion glycolysis). In any case, both parameters are clearly superior to SUV and in practical terms might be viewed to some extent as complementary concepts (rather than competing ones) since the blood pool (aorta) will frequently be covered in the FOV even when the liver is not (or when the presence of liver disease precludes use of the TLR approach).

Overall, it seems worthwhile and promising to further investigate the relative performance of SUV, TLR, and SUR in other patient groups with the ultimate goal of deciding whether SUR can be considered as generally superior to SUV and TLR. If this turns out to be true, it would constitute a strong incentive to use SUR as a drop-in replacement for the current SUV and TLR methodology (or at least as an attractive alternative to the latter one) in clinical whole body FDG PET.

Conclusions

Suitability of the liver as a surrogate of arterial tracer supply for SUV normalization via TLR computation is limited due to the less-than-perfect correlation between blood and liver SUV, and the SUR approach remains attractive for principal as well as practical reasons. Regarding their respective prognostic value, both, TLR and SUR significantly outperformed SUV. Further studies in sufficiently large patient groups are required to better characterize the relative performance of SUV, TLR, and SUR in different settings.