Background

A whole body (WB) dynamic PET acquisition enables 18F-FDG parametric imaging. Full kinetic modeling analysis of 18F-FDG using WB dynamic PET requires tissue time-activity curves (TACs) measured by PET and the arterial input function (AIF). The Patlak plot model [1, 2] can then be applied to these data to compute the net influx parameter, Ki, which is proportional to the glucose metabolic rate.

The AIF is obtained by collecting arterial blood samples and measuring the radioactivity concentration in the arterial plasma; these data are generally considered to be the gold standard. This invasive measurement can be associated with patient discomfort and additional exposure to personnel. Additionally, serial arterial blood sampling is not typically feasible in a clinical environment. Therefore, an alternative to arterial blood sampling for estimating the input function (IF) is desired for routine use. Several alternative methods have been proposed to replace the AIF: arterialized venous blood sampling [3], image-derived input function (IDIF) estimation [4,5,6], and population-based input function (PBIF) modeling [7,8,9,10]. Venous blood sampling is more convenient than arterial blood sampling, but it is still invasive, especially with arterialization, i.e., sampling blood from a hand immersed in 44 °C water [11]. Heating the hand causes a vascular dilatation and increases the blood flow to the hand, so that venous samples are similar to arterial samples [12].

Measures of blood activity can be obtained by WB PET scans that typically cover large arterial blood regions such as the left ventricle and aorta; however, the accuracy of IDIFs will be affected by body motion and partial volume effects. Furthermore, the injection must be performed with the patient on the bed in order to measure the early phase of the IDIF, further compromising a clinically established workflow. The PBIF method starts with the generation of a normalized average of measured arterial blood data from several subjects (template PBIF). The PBIF method assumes that the shape of the IFs of all subjects is the same. This assumption may be violated in some patients if tracer absorption differs. The PBIF method also requires the determination of an appropriate factor to scale the template PBIF for each patient, which is another possible source of error.

In this paper, we applied both IDIF and PBIF methods to 18F-FDG WB PET data of oncologic patients and compared the performance of these methods with the gold standard of arterial blood sampling denoted as AIF in this paper, by assessing the Patlak Ki values. To generate the template PBIF, we applied two normalization methods. These template PBIFs were normalized for each subject using several scaling factors: (1) a scaling factor consisting of injected dose (ID) and initial distribution volume (iDV) of 18F-FDG [10] and (2) the area under the curve (AUC) of the IDIF using several time windows. While there has been substantial literature over many years developing IDIFs and PBIFs, this paper has a number of unique characteristics: (1) use of a modern PET system to extract IDIF and assess tumor quantification, (2) comparison to gold standard arterial samples, (3) use of commercial algorithms to define the aorta region of interest (ROI), and (4) comprehensive evaluation of scaling methods for the PBIF.

Material and methods

The abbreviations are listed in Table 1.

Table 1 Abbreviations

Human subjects and PET scan procedure

A total of 35 subjects were recruited for this study (Table 2). All subjects provided written consent. The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and federal guidelines and regulations of the USA for the protection of human research subjects contained in Title 45 Part 46 of the Code of Federal Regulations (45 CFR 46).

Table 2 Demographics and injection parameters

The subjects were divided into 2 groups: a PBIF generation group (n = 23; 11 healthy controls (HCs) and 12 clinical subjects (post-traumatic stress disorder (n = 6), epilepsy (n = 3), cocaine addiction (n = 3))) and a PBIF validation group (n = 12; oncologic subjects). In the validation group, tumors or hypermetabolic nodes were located in palate, neck, thyroid, esophagus, axilla, lung, mediastinum, inguen, and femoral shaft.

18F-FDG was injected by pump using a 1-min infusion and arterial blood sampling was performed for 90 min in all subjects except for 1 subject (60 min). Discrete blood samples were manually drawn every 10 s from 0 to 90 s, every 15 s from 90 s to 3 min, and then at 4, 5, 6, 8, 10, 15, 20, 25, 30, 45, 60, 75, and 90 min post-injection. Samples were centrifuged to obtain plasma and then counted with a cross-calibrated well counter to produce the AIF in units of Bq/mL decay corrected to injection time.

PET scans were acquired for 90 min on a 4-ring Biograph mCT PET/CT scanner concurrently with arterial blood sampling for the PBIF validation group (n = 12). A single bed cardiac PET scan was acquired for the first 6 min, followed by continuous bed motion dynamic whole body scans (2 min × 4 passes, 5 min × 15 passes). The subjects were scanned from top of the head to the knee. The dynamic data were reconstructed using OSEM (2 iterations, 21 subsets) using point spread function recovery and time of flight information, with a matrix size of 400 × 400 and 5 mm full width at half maximum Gaussian post-reconstruction filtering. The data were corrected for attenuation, randoms, and scatter, but not for motion. The CT scan was not co-registered to PET since it was acquired immediately before the 18F-FDG injection. However the quality of the alignment was visually checked.

Normalization of AIF

The first step to generate a template PBIF curve is to normalize the amplitude of each AIF. The AIFs from the PBIF generation group were normalized in two ways. The first method used the AUC from 0 to 60 min of the AIF. For the PBIF generation group, each AIF was divided by its AUC. The second method was to use the method proposed by Vriens et al. [10], denoted as the iDV (initial distribution volume) method. The AIFs were normalized with the extrapolated initial plasma concentration of 18F-FDG (CP*(0)). CP*(0) is the expected plasma concentration under the assumption of instantaneous mixing of 18F-FDG at t = 0 [13]. CP*(0) was obtained by fitting a portion of the curve (5 ≤ t ≤ 30 min) with an exponential function (CP*(t) = CP*(0)exp(-αt)) [14]. Each AIF was divided by its estimated Cp*(0).

The iDV is the ratio of the injected dose (ID) to the initial FDG concentration, CP*(0 )[14] and is effectively the volume of blood that accounts for the early distribution of tracer throughout the body. The value of iDV can be approximated noninvasively using the subject body weight and height as follows:

$$ iDV\;\left[L\right]=c{\left(\mathrm{height}\left[\mathrm{m}\right]\right)}^h\kern0.5em {\left(\mathrm{weight}\left[\mathrm{kg}\right]\right)}^w $$
(1)

where c, h, and w are pre-determined coefficients. These three coefficients were estimated from the individual values of iDV (=ID/CP*(0)), height, and weight of the subjects in the PBIF generation group. Specifically the coefficients h and w were first determined by minimizing the coefficient of variation of c (COVc) [8, 10]. Then, the coefficient, c, was determined as the mean of iDV/[(height)h(weight)w] among subjects.

Creation of PBIF

In the next step to generate a template PBIF curve, the normalized AIF (by AUC and iDV methods) was modeled using a compartment model that describes tracer behavior in the circulatory system proposed by Feng et al. [9].

$$ {C}_P(t)=\left\{\begin{array}{c}0\;\mathrm{if}\kern0.5em t<\tau \\ {}\left[{A}_1\left(t-\tau \right)-{A}_2-{A}_3\right]{e}^{-{\lambda}_1\left(t-\tau \right)}+{A}_2{e}^{-{\lambda}_2\left(t-\tau \right)}+{A}_3{e}^{-{\lambda}_3\left(t-\tau \right)}\kern0.1em \mathrm{if}\;t\ge \tau \end{array}\right. $$
(2)

where λ1, λ2, and λ3 are the eigenvalues of the model; A1, A2, and A3 are the coefficients; and τ is the delay constant.

Since Feng’s model describes the plasma as an impulse response function, i.e., from a true bolus injection, the model was convolved with a rectangular function (f(t) = 1, 0 ≤ t ≤ 1; f(t) =0, otherwise) to take into account our injection protocol (1-min bolus). Feng’s model was applied twice. First, nonlinear least square fitting was applied to obtain the 7 parameters for each subject of the PBIF generation group. Each model-fitted normalized AIF was corrected for its estimated delay (τ) and then averaged. Next, Feng’s model was again applied to the average curve to obtain a final parameter set. The fitted PBIFs using both normalization methods are thereafter denoted as PBIFAUC and PBIFiDV. In the PBIF generation group, the shapes of two PBIFs were compared as follows. First, the parameters (λ1, λ2, λ3) and the ratios of scale parameters (A2/A1, A3/A1) were compared between PBIFAUC and PBIFiDV. Next, the Patlak Ki values were compared using PBIFAUC and PBIFiDV that were scaled to have the same AUC.

IDIF

In the validation group, an IDIF was generated from descending aorta region automatically defined on the CT, which was used for PET attenuation correction, by a cylindrical ROI using the vendor’s ALPHA technology. The organ region of interest prediction was conducted using a learning-based algorithm [15] for automatic medical image annotation. Multiple focal anatomical structures were detected by a learning-by-example landmark detection algorithm and then inconsistent findings were eliminated through a robust sparse spatial configuration algorithm.

Subject scaling of PBIF for validation

The template PBIFs must be scaled for each individual subject, and the scaled PBIF is denoted as sPBIF. For PBIFAUC, the scaling factor was determined based on the tail part of IDIF (from 15 to 90 min post-injection) using 4 different time windows. The length of the time window for scaling was 30 min, i.e., the same as the length for Patlak plot computation (see below). Multiple time windows were used as it was likely that effects such as motion and partial volume effects would produce differences in bias. Four different time windows (15–45, 30–60, 45–75, and 60–90 min) were used to scale the template PBIFs by multiplication by the AUC of the IDIF in each window (sPBIFAUC(15–45), sPBIFAUC(30–60), sPBIFAUC(45–75), sPBIFAUC(60–90)). For PBIFiDV, the scaling factor was computed using the injected dose and the estimated iDV using each subject’s weight and height with Eq. 1. To evaluate the robustness of iDV estimates, iDV was estimated in 3 ways, using the coefficients c, w, and h from this study, and also with the coefficients from 2 previous studies [8, 10]. In addition, to evaluate the results that could be obtained with the “best possible” scaling factor (i.e., using the subject’s plasma data), we also computed the ratio of the measured plasma to PBIFiDV at 4 time points (30, 45, 60, and 75 min post-injection) for each subject. The average of these 4 ratios was used as a scaling factor to obtain sPBIFPLAS.

In total, 9 estimated IFs (1 IDIF, 3 sPBIFiDV, 1 sPBIFPLAS, and 4 sPBIFAUC) were obtained per scan for validation.

Comparison of the scaled PBIFs with IDIF and AIF

The performance of the 9 estimated IFs was compared in the validation group using the AIF as the gold standard. Two outcome measures were used to evaluate the performance: the AUC of the IF and the Patlak Ki. ROIs for tumors or hypermetabolic nodes were manually delineated on multiple slices of the summed (60–90 min post-injection) PET images. The size of ROI was 3.46 ± 2.21 mL (one ROI per subject). The ROIs were applied to generate time-activity curves (TACs). The net influx rate constant (Ki) and the exchangeable distribution volume (Ve, intercept of Patlak plot) were determined for the ROI TACs using each IF and Patlak analysis applied to the period of 60–90 min post-injection. Specifically, we used a multilinear analysis to estimate Ki and Ve using the following equation:

$$ C(t)={K}_{\mathrm{i}}{\int}_0^t{C}_{\mathrm{P}}\left(\tau \right)\; d\tau +{V}_{\mathrm{e}}{C}_{\mathrm{P}}(t),t>{t}^{\ast } $$
(3)

Effect of whole blood to plasma ratio

The PBIF curves generated here were created from plasma data. However, in the above assessment, the IDIF, which measures whole blood, was not corrected for the whole blood to plasma ratio, and PBIFAUC was scaled using the AUC of the uncorrected IDIF. In a separate analysis, we assessed the effect of the difference between concentrations of 18F-FDG in whole blood and plasma by determining the resulting bias in Ki. The whole blood to plasma ratio was computed from 40 s to 90 min post-injection in the PBIF validation group.

Statistical analysis

Correlations between the AUC and Ki with the estimated IFs and the AIF were assessed by Pearson r, mean bias, and standard deviation (SD) of bias. Statistical analysis was performed by Prism 8 (GraphPad Software). All kinetic modeling was performed with in-house programs written with IDL 8.0 (ITT Visual Information Solutions, Boulder, CO).

Results

Creation of PBIF

The parameters from fitting of the AIFs by Feng’s model using the AUC and CP*(0) (ID/iDV) normalizations are summarized in Table 3. The shape-related parameters (λ1, λ2, λ3) were very similar between PBIFAUC and PBIFiDV. The values of the relative amplitudes A2/A1 and A3/A1 were similar between the two PBIFs: 0.010 and 0.008 for PBIFAUC and 0.009 and 0.006 for PBIFiDV, respectively.

Table 3 Parameters of template PBIFs

To compare the two PBIFs, tests were performed with the two PBIFs scaled to have the same AUC. In that case, Patlak Ki values using PBIFAUC were almost identical to those using PBIFiDV (Ki(PBIFAUC) = 0.994 × Ki(PBIFiDV) − 0.002, R2 = 1.000), indicating that there is no meaningful difference between the shapes of the two PBIFs.

Comparing the contribution of the terms of Eq. 2 to the PBIF, the third term (\( {A}_3{e}^{-{\lambda}_3\left(t-\tau \right)} \)) accounted for > 95% of the PBIF after 16 min post-injection.

IDIF

In the validation group, the volume of the aorta ROI was 1.55 ± 0.11 mL. Figure 1a shows a comparison of a typical IDIF and its corresponding AIF. The IDIF tends to undershoot the AIF at early times (t < 20 min) and overshoot it at late times (t > 30 min), with varying degree of under/overshoot among subjects (%difference, − 7% ± 8% (t < 20 min) and 13% ± 12% (t > 30 min)). Fitting Feng’s model to IDIFs and AIFs, the third eigenvalue λ3 of the IDIF was significantly smaller than that of AIF (IDIF, 0.008 ± 0.002 min−1and AIF, 0.011 ± 0.002 min−1, P = 0.008).

Fig. 1
figure 1

a Typical example of IDIF (black curve), AIF (red), and difference (IDIF − AIF; blue). b Patlak plots using IDIF (black) and AIF (red); solid lines show the portion of the plot used to estimate Ki. In this case, the bias of AUC was 0.3% and the bias of Ki was − 16%

Subject scaling of PBIF for validation

The median iDV was 13.1 L (mean ± SD = 13.0 ± 1.7), which corresponds to 0.14 L/kg body weight. Table 4 shows the three estimated coefficients (c, h, w) in our study (from the PBIF generation group) compared to previous references. Those coefficients were used to predict CP*(0) and compare to the actual values from blood samples in the validation group. Using values in this study, differences were acceptable (3 ± 8%). For the literature values, although the coefficients themselves were quite different, the percent bias of the estimated CP*(0) was reasonable, especially for the values from Vriens et al. [10].

Table 4 Comparisons of coefficients and CP*(0)

Comparison of the scaled PBIFs with IDIF and AIF

In the validation group, comparisons between AUC(0–90 min) and Patlak Ki with respect to the AIF values are shown in Tables 5 and 6, respectively.

Table 5 Comparison of AUC(0–90 min) between the estimated IFs (n = 12)
Table 6 Comparison of Ki between the estimated IFs

For AUC, the early time windows, 15–45 min or 30–60 min, for scaling PBIFAUC provided similarly good performance (0–90 min) in terms of Pearson r, bias, and SD (Table 5). Later time windows produced poorer correlation and overestimated the AUC(0–90 min). Typical sPBIFs are shown in Fig. 2 where the differences in scaling are best visualized in the tail of the curve. The correlation, bias, and SD were similar between IDIF, sPBIFAUC with the best time window, sPBIFiDV, and sPBIFPLAS (correlation, 0.90–0.94; bias, − 1 to 3%; SD, 5–6%).

Fig. 2
figure 2

Typical example of sPBIFs and AIF a for the full 90 min and b from 15 min to 90 min post-injection. These data are from the same subject used in Fig. 1

Figure 3 shows individual Ki bias values using the IDIF or any of the sPBIFs, with Ki estimated using the AIF as the gold standard. The %bias was particularly large (− 47 and − 60%; Fig. 3a) for small Ki values (< 0.01 mL/min/cm3) with the IDIF. Therefore, the Ki bias (Table 6) was calculated in two ways, i.e., with and without these two tumors. Unlike the IDIF method, the Ki bias using all PBIF values was not affected by the magnitude of Ki (Fig. 3b, c).

Fig. 3
figure 3

Individual values of Ki bias using different input functions compared to Ki estimated with the AIF. a IDIF. b sPBIFAUC. c sPBIFiDV. Each symbol represents the Ki derived from the tumor TAC of each subject

When AUC was overestimated, Ki was generally underestimated (Table 6). Patlak Ki determined by the IDIF was lower than the gold standard values (using the AIF) (− 9%), although the correlation was similar to those of other PBIFs (0.99–1.00). For sPBIFAUC, Ki was underestimated when using late time windows to scale the PBIFAUC (− 14% using 60–90 min). Conversely, using early time windows for scaling, the correlation, bias, and SD of sPBIFAUC was closest to those of sPBIFPLAS, which represents the best-possible outcome. For sPBIFiDV, using scaling coefficients from this study, the mean bias was low, the SD of the bias was similar to other methods, and the correlation lower than with sPBIFAUC. Using scaling coefficients from other published studies for sPBIFiDV led to larger mean bias and similar correlation and SD.

Effect of whole blood to plasma ratio

The whole blood to plasma ratio increased from a mean of 0.93 to 0.97 over 90 min (Fig. 4): The whole blood/plasma curve could be described by the function 0.97 − 0.06 × exp(− 0.08 × t). The mean ratio did not differ between 30 min (0.95 ± 0.05) and at 90 min post-injection (0.97 ± 0.05). The mean whole blood to plasma ratio was 0.97 ± 0.04 (15–45 min), 0.96 ± 0.03 (30–60 min), 0.97 ± 0.03 (45–75 min), 0.97 ± 0.04 (60–90 min), and 0.94 ± 0.03 (40 s–90 min). Applying the above mean whole blood to plasma ratio values for correction to the IDIF increased its value, so Ki values became even more underestimated: the mean bias of Ki became − 14% (IDIF), 0% (sPBIFAUC(15–45)), − 4% (sPBIFAUC(30–60)), − 9% (sPBIFAUC(45–75)), and − 16% (sPBIFAUC(60–90)) instead of the values in Table 6 (n = 10).

Fig. 4
figure 4

Mean and SD of whole blood to plasma ratio in PBIF validation group with the fitted curve. The mean values were fitted to a one phase decay model (ratio = − 0.06 exp(− 0.085 × time) + 0.97)

Discussion

This study compared the performance of PBIFs with different normalization and scaling methods for the purpose of measuring the Patlak uptake constant Ki for 18F-FDG. The PBIFs were compared to IDIF and AIFs, with the latter used as the gold standard.

Two forms of the PBIF were generated from arterial sample data using two normalization methods (AUC or CP*(0)) and were first compared. The Ki values using PBIFAUC were almost identical to those using PBIFiDV. This suggests that the PBIF shape was not affected by the different normalization methods. Therefore, the comparison among PBIFs was reduced to the comparison of scaling factors.

To apply the PBIFs without the need for blood sampling, we tested two scaling methods. We also scaled the PBIF using the measured plasma samples for each scan to define the best achievable results by PBIF. Four plasma samples at 30, 45, 60, and 75 min post-injection were used for scaling to reduce effects of measurement noise in the plasma. The sPBIFPLAS overestimated Ki by 2 ± 6 %, due to slight differences in IF shape between subjects. Thus, ideally, a blood-free PBIF method could achieve comparable results.

One scaling method used a part of the IDIF. In WB PET imaging, large blood pools are always available. As shown in Fig. 1, the estimated IDIF showed a consistent pattern compared to the AIF, with undershoot at early times and overshoot at late times, perhaps due to partial volume averaging, but the magnitude of under/overshoot was different among subjects. Therefore, the Patlak Ki was significantly underestimated using the PBIFs scaled by the late AUC values from the IDIF. The best time window for scaling (in terms of minimum bias) was 30–60 min (bias, − 1% and SD, 8%; Table 6). In that case, however, the required scan time would be 1 h, 30–60 min to measure the part of the IDIF used for scaling, and 60–90 min for Patlak Ki. Note that the SD of bias was very similar for all sPBIFAUC time periods; thus, if a mean bias was acceptable, e.g., if that bias was consistent across scans in the same patient, then later time periods could be used for scaling, providing a short scan.

The second scaling method used the estimated Cp*(0), the extrapolated initial 18F-FDG plasma concentration. This scaling approach has potential advantages since it does not require the IDIF for scaling and thus has a short scan and is not subject to effects of body motion and partial volume effect on the IDIF. Vriens et al. [10] reported a median iDV of 0.168 L/kg, slightly higher than the value in our study (0.144 L/kg). We fitted the iDV equation (Eq. 1) using the same method as Shiozaki et al. and Vriens et al. and found quite different values for the estimated coefficients (c, h, w). The estimated CP*(0) values using the injected dose and these coefficients were compared with the extrapolated CP*(0) values measured from the AIF. Not surprisingly, the bias of CP*(0) was smallest using our fitted parameters. The coefficient estimation might be affected by the study population or other methodological details. For example, the difference in body habitus of the study subjects at different sites might affect the results. Also, the estimation is affected by the correlation between height and weight which introduces instability in the parameters h and w. Patlak Ki estimated with this PBIF scaling method produced minimal bias and similar SD to the other scaling methods.

The mean biases of AUC(0–90 min) using IDIF, sPBIFAUC with early time windows, and sPBIFiDV were all minimal. However, a large negative mean bias of Ki with the IDIF was found, which was much larger than the other PBIF methods. Specifically, Ki with the IDIF was greatly underestimated (as a percentage) for small Ki values, while this was not observed for Ki with PBIF (Fig. 3). This difference in the Ki bias is due to the differences in the shapes of the IDIF and the AIF. The input function parameter λ3 (the terminal clearance rate) of the IDIF was much smaller than that of the AIF or the PBIFs, i.e., the IDIF showed slower clearance than the other IFs, resulting in large % underestimation of Ki for small Ki values.

To clarify this finding, we performed a simulation to assess the effect of λ3 on Ki estimates for large and small Ki values. Three IFs were computed using different λ3 values (0.012, 0.0084, 0.0048 min−1) (Figure S1-A) with all normalized to have the same AUC. Two TACs were computed using the input function with λ3 = 0.012 (Figure S1-B) having different Ki values (0.0077, 0.077 mL/min/cm3) but the same Ve (0.42). The Patlak plot was computed for these two TACs using three IFs, i.e., the correct IF and the two with slower terminal clearance (Figure S1-C and D). As shown in Table S1, Ki was underestimated, with much larger percent bias for small Ki values using the IFs with small λ3 values. The underestimated Ki was compensated by an overestimated intercept value, which has a larger error for larger Ki.

In several past reports [10, 16], the IDIF, which measures whole blood, was used as IF without correction for the difference between concentrations of 18F-FDG in whole blood and plasma, assuming these differences are small [17]. In our study, we also used the uncorrected IDIF for Patlak analysis (Table 6). To assess this effect, the whole blood to plasma ratio was computed. Mean whole blood to plasma ratio increased monotonically from 0.93 to 0.97 over 90 min (i.e., the mean plasma to whole blood ratio decreased from 1.09 to 1.03). Similar results were reported previously (1.09 to 1.04 [11] and 1.12 to 1.07 [18] over 90 min). When the whole blood to plasma ratio is taken into consideration, mean underestimation of Ki by the IDIF method worsened slightly.

Several 18F-FDG tumor imaging guidelines reviewed in [19] suggested that a static scan should start at 30~40 min or 50~70 min post-injection, but an ideal time window (length and starting time) for tumor Patlak analysis is not clearly defined. In a brain study using healthy subjects, Lucignani et al. [20] reported that Patlak Ki is stable using a 30-min window in the interval between 45 and 120 min post-injection. In our study, we used a 60–90-min time window for Patlak analysis; this time period can also be used to generate a static SUV image by appropriate image averaging.

Comparing the results of our scaled PBIF methods, sPBIFAUC(30–60) and sPBIFiDV produced similarly small bias and high correlation coefficients in Patlak Ki estimation. In the PBIFAUC method, no bias will be introduced due to an inaccurate dose calibrator cross-calibration to the PET scanner; however, errors in this calibration affect the PBIFiDV method. PBIFAUC(30–60) requires a 1-h scan when the Patlak time window is set from 60 to 90 min, while the PBIFiDV requires scan time for the Patlak analysis only. Also, measurement of body weight, height, and injected dose is simpler than obtaining IDIF curves, depending on the available tools in each clinical environment. Therefore, PBIFiDV would provide a simple protocol than PBIFAUC(30–60). Using the methodology shown here, both approaches showed acceptable performance. sPBIFAUC has slightly better performance, but sPBIFiDV should be easier to implement in clinical setting, although some site-specific tuning of the iDV coefficients may be necessary.

In addition to considering mean bias, the SD of bias (~ 9%) for all sPBIF methods was larger than the best possible attainable value using the subject’s own plasma data (sPBIFPLAS, 6%). Since variances add in quadrature, this difference in SD suggests that an additional error of 6–7% is introduced by the IDIF AUC and iDV scaling methods. While it is not clear how to improve the iDV scaling method, IDIF performance would likely be improved by changing the shape of the ROI, as well as applying motion correction and partial volume correction. Since the IDIF ROI was defined from the CT, we assessed the effects of misalignment between the CT and PET on the AUC of the IDIF. The IDIF ROI was shifted by 1 to 6 voxels (i.e., 2 to 12 mm) in the x (left-right), y (anterior-posterior), and z (superior-inferior) directions, and we determined the maximum misalignment in each direction leading to ≤ 5% decrease in the AUC (15–45, 30–60, 45–75, and 60–90 min) from the shifted ROI. The most sensitive directions to misalignment were y (5 to 7 mm) and x (6 to 11 mm); the z direction showed minimal effects, as expected. The earlier time window was more sensitive to misalignment due to the higher contrast between the aorta and background. Partial volume effects would be a major contributing factor to the overestimation of AUC, especially in later time windows, as seen in Table 5 (19% overestimation of AUC(0–90 min) using sPBIFAUC(60–90)). If the quality of the IDIF ROI is improved, e.g., with motion and partial volume corrections, so that the later part of the IDIF can provide an accurate value, then the bias of Ki using PBIFAUC(60–90) would be improved. In particular, in a typical clinical protocol, where the PET scan begins at 60 min, there will be less delay between CT and PET scans, so motion issues would likely be reduced. Also, we believe that using the imaging data to directly quantify the IF is of value, since day-to-day variation in the IF cannot be captured by the iDV method.

As described above, we assessed relative performance of the methods by calculating accuracy (mean % bias) and variability (SD of % bias). Both of these measures are relevant, although the relative importance depends on the clinical question. A small mean bias compared to the AIF means that the method is intrinsically accurate over the entire patient group. However, the SD of the bias across subjects and tumors should also be considered. If the SD is large, then the ability to reliably measure changes in tracer uptake between scans of the same patient may be poor. Alternatively, if large SD across patients is caused by subject-specific biases, e.g., due to IDIF ROI definition (excluding motion effects), which remain consistent across scans, then such variability may be clinically acceptable if the goal is to assess treatment response. Thus, the best way to fully assess the performance of PBIFs would be with test-retest data using the reproducibility of the estimated Ki as the key outcome measure.

Recent improved detector technology and clinical application demands led to the development of total body PET systems [21, 22], such as the uEXPLORER [23, 24] and PennPET Explorer [25]. Access for arterial blood sampling site is challenging in these systems. However, since the aorta is always in the field of view and the acquired dynamic data will have lower noise, the PBIF methods will be useful and compatible with these total body PET scan systems.

Conclusions

In this paper, using a modern PET system, we assessed and optimized IDIFs and PBIFs using arterial blood samples and commercial software to define the IDIF ROI. We applied these IDIF and PBIF methods for FDG oncological WB PET studies. The PBIF methods scaled by either IDIF AUC or ID and iDV showed good performance, with a small mean bias and moderate variability, whereas the IDIF method produced negative mean bias of Ki. Further improvements in accuracy and precision can be obtained with motion correction and partial volume corrections.