Metabolic response assessment with 18F-FDG PET/CT: inter-method comparison and prognostic significance for patients with non-small cell lung cancer

Objective This study aimed to (1) compare the agreement of two evaluation methods of metabolic response in patients with non-small cell lung cancer (NSCLC) and determine their prognostic value and (2) explore an optimal cutoff of metabolic reduction to distinguish a more favorable subset of responders. Methods This is a secondary analysis of prospective studies. Enrolled patients underwent 18F-PET/CT within 2 weeks before, during, and months after radiotherapy (post-RT). Metabolic response was assessed using both Peter MacCallum (PM) method of qualitative visual assessment and University of Michigan (UM) method of semiquantitative measurement. The agreement between two methods determined response, and their prediction of outcome was analyzed. Results Forty-four patients with median follow-up of 25.2 months were analyzed. A moderate agreement was observed between PM- and UM-based response assessment (Kappa coefficient = 0.434), unveiling a significant difference in CMR rate (p = 0.001). Categorical responses derived from either method were significantly predictive of overall survival (OS) and progression-free survival (PFS) (p < 0.0001). Numerical percentage decrease of FDG uptake also showed significant correlations with survival, presenting a hazard ratio of 0.97 for both OS and PFS. A 75 % of SUV decrease was found to be the optimal cutoff to predict OS and 2-year progression. Conclusions There was a modest discrepancy in metabolic response rates between PM and UM criteria, though both could offer predictive classification for survival. The percentage decrease provides an ordinal value that correlates with prolonged survival, recommending 75 % as the optimal threshold at identifying better responders.


Introduction
18 F-fluorodeoxyglucose positron emission tomography/ computed tomography (FDG PET/CT)-based metabolic response has been reported to be a prognostic indicator for patients with non-small cell lung cancer (NSCLC) by a large number of studies . However, there is a wide variation in the measurement of tumor FDG uptake and the stratification into response categories, resulting in the fact that multiple criteria for metabolic response evaluation are proposed and applied diversely among clinical centers. Currently, methods for evaluating metabolic response on FDG-PET include qualitative visual assessment such as Peter MacCallum (PM) criteria [5], semiquantitative assessment using reduction of metabolic activity (e.g., SUV) such as EORTC [23], University of Michigan (UM) [10], and PERCIST [24] methods, and quantitative evaluation using kinetic modeling [25].
Quantitative method of kinetic modeling or Patlak plot is used infrequently in clinical practice due to the complexity, including the needs for arterial blood sampling or dynamic imaging of a blood-pool structure to acquire a precise input function [25]. The best method of metabolic response evaluation is currently unknown, both for predicting outcome and being easily applicable into routine clinic practice. All of the aforementioned response criteria are currently being used in clinical practice, and there was no direct cross-comparison between the various methods yet.
With regard to semiquantitative assessment, many studies have reported that over 70 % of NSCLC patients underwent at least 30 % of posttreatment decrease in metabolic activity, namely the partial metabolic response (PMR) by semiquantitative evaluation methods [9,10,13,17,20]. This indicates that the current cutoff values of 25 or 30 % of metabolic decrease for PMR are probably too modest or insufficient to identify medically relevant response. Given this consideration, it would be advantageous to explore another reliable cutoff value that might allow distinguishing a subgroup of PMR patients with more favorable outcome.
This study aimed to (1) compare response rates of PM (a qualitative method) and UM method (a semiquantitative) on FDG PET/CT and examine their value for predicting longterm outcome and (2) perform a post hoc analysis to explore an optimal cutoff for the selection of better responders in NSCLC treated with definitive radiotherapy.

Study population
The patients in this study represented a cohort from a series of functional imaging-related prospective studies between 2003 and 2010 at the University of Michigan (UM). These studies were approved by the institutional review board (IRB) of the University of Michigan, and written informed consent was obtained from each patient. Unresectable or inoperable stage I-III NSCLC patients receiving definitive radiation therapy (RT) underwent FDG PET/CT scans within 2 weeks prior to RT beginning (pre-RT PET/CT) and at the fourth week during RT as per protocol. After treatment, patients were followed up with CT with or without PET/CT per physician's discretion, and those with PET/CT scans within 6 months after RT (post-RT PET/CT) were eligible for the present analysis. Exclusion criteria included prior thoracic RT history, small cell carcinoma histology, and comorbidity of diabetes mellitus with uncontrolled blood glucose level >200 mg/dL. All patients had a definitive course of radiotherapy under standard practice or per protocol. As we are focusing on posttreatment response assessment, we did not restrict the RT regimens.
PET/CT image acquisition FDG PET/CT scans were performed at two institutions: University of Michigan Hospital (UMH) and Veterans Administration Health Center/Veterans Affairs Medical Center, Ann Arbor (VA-AA) between 2003 and 2010. The PET protocols used at both institutions were standardized throughout this time period. Details of the PET/CT scan protocols have been previously described [10]. At the UMH between 2003 and 2006, the PET/CT imaging was performed on a Siemens Biograph Classic (Siemens Medical Solutions, Hoffman Estates, IL, USA) and between 2006 and 2010 on a Siemens Biograph T6. All PET/CT studies at the VA Ann Arbor Medical Center were performed on a Siemens Biograph T6.

Image interpretation and response evaluation
Interpretation of PET/CT imagings was performed by nuclear medicine physicians blinded to clinical information using consensus criteria. Mean SUV (SUV mean ) within a circular region of interest (ROI) of about 1.0-cm diameter placed in the aorta arch (AA) was used to represent the metabolic activity of mediastinal blood pool (MBP) for purpose of normalization. Tumor FDG uptake expressed as normalized maximum SUV (SUV max ) to AA (NSUV-A) was calculated using the following formula [10,26,27]: Each patient's set of PET/CT studies was evaluated twice using PM and UM methods, respectively [5,10]. The numerical percentage decline of NSUV-A was also analyzed as a continuum.

Statistical analysis
Kappa coefficient was used to estimate the agreement between PM and UM evaluation systems. Overall response to treatment (ORTT) was determined based on primary tumor (T), regional lymph nodes (N), and distant status (M). Infield response to treatment (IFRTT) was defined as the response to treatment regarding only lesions within the radiation field. For instance, the case in which new lesions appear and progress despite apparent remission of the local disease would be classified as PMD of ORTT whereas CMR of IFRTT. We explored 2-year progression as the endpoint to separate study patients into two groups (early progression vs. late progression). Receiver operating characteristic (ROC) analysis was used to identify the optimal cutoff in percentage decline of NSUV-A to discriminate the two groups. Kaplan-Meier survival analysis and log-rank test were performed to compare survival between different groups. Using cutoffs varying from 25 to 90 % reduction of NSUV-A at 5 % increments, the discriminative values of various cutoffs were analyzed for both overall survival (OS) and progression-free survival (PFS) by Kaplan-Meier method. Cox proportional hazard regression was rendered to investigate the prognostic significance of various response systems. A p≤0.05 was considered statistically significant, and all tests were two-tailed.

General characteristics
A total of 118 patients were enrolled in the prospective studies, and eventually, 44 patients with both pre-and post-RT scans were eligible for this analysis. Table 1 shows the demographics and tumor characteristics for the study population. There were 6 patients with adenocarcinoma, 10 with squamous cell carcinoma, 1 with large cell carcinoma, and 23 with NSCLC but unknown subtype, primarily due to the limited specimen obtained from fine needle aspiration-based biopsy. In addition, four patients had clinically diagnosed NSCLC based on the consensus of tumor board but without pathological verification because of unfitness for biopsy. The median RT dose was 70 Gy. The interval between RT end and post-RT PET/CT scan ranged from 45 to 176 days, with a median of 93 days.
Comparison of categorical metabolic response assessment by PM and UM methods According to PM methods using qualitative visual assessment, the rates of CMR, PMR, SMD, and PMD were 38.6, 52.3, 2.3, and 6.8 %, respectively. For UM method, they were 13.6, 72.7, 9.1, and 4.5 %, respectively. A significant difference in CMR rate was observed between the UM and PM methods (p=0.001), while no significant differences were seen for PMR, SMD, and PMD. The distribution of ORTT between PM and UM criteria is shown in Table 2, with an agreement rate of 68.1 % (30/44) and a Kappa coefficient of 0.434. Figure 1 graphs the actual change in NSUV-A with the corresponding response categories of ORTT and IFRTT determined by PM criteria, with the discrepancy in-between that resulted from two cases with infield CMR (58.4 and 87.1 % reduction in NSUV-A, respectively) but with out-of-field failure.

Categorical response and survival
The median follow-up time for the study population was 25.2 months. Figure 2a, b depicts that both PM and UM methods determined response classification were predictive of OS (p<0.0001). Patients with CMR achieved significantly better outcome than the non-CMR population. The median survival (MS) was 48.1 and 18.3 months for PM criteriabased CMR and non-CMR patients (p=0.016). By the end of the last follow-up, the MS was not reached for UM criteria-determined CMR patients and was 22.0 months for non-CMR patients (p=0.086). Similar findings were observed with respect to PFS results. Both PM and UM method-based response classifications were predictive of PFS (p<0.0001) (Fig. 2c, d). The CMR group compared to the non-CMR group had median PFS of 33.7 months vs. 8.4 months (p= 0.005) for PM criteria and 25.5 vs. 12.4 months, respectively (p=0.231), for UM criteria.
Combining PM and UM criteria, we divided the patients into three groups: (1) those achieving CMR under both criteria (visual CMR and NSUV-A≤1), (2) those identified as CMR using PM system but non-CMR using UM criteria (visual CMR but NSUV-A>1), and (3) the remaining patients as CMR in neither group (neither visual CMR nor NSUV-A≤1) for further survival analysis. A significant difference among the three groups was observed, with regard to both OS (p= 0.044, Fig. 2e) and PFS (p=0.019, Fig. 2f), which was mainly resulted from the inferior outcome of non-CMR classification (neither visual CMR nor NSUV-A≤1).

Optimal cutoff for prediction of progression and survival
Cox regression analysis showed a positive correlation between the numerical percentage decrease of NSUV-A and survival, resulting in the hazard ratio (HR) of 0.971 (95%CI, 0.955-0.988) and 0.971 (95%CI, 0.957, 0.986) from 1 % reduction of NSUV-A for OS and PFS, respectively. ROC analysis identified 75 % decrease in NSUV-A as the optimal cutoff, translating into a sensitivity of 80 % and specificity of 69 % for the 2-year progression (Fig. 3a). The discriminative values of cutoffs varying from 25 to 90 % reduction of NSUV-A at 5 % increments were analyzed for both OS and PFS. Besides the 25 % decrease proposed by EORTC criteria, a threshold of 75 and 60 % reduction was found by OS (p=0.002) and PFS (p<0.0001) analyses, respectively, to introduce the most discriminative significance (Fig. 3b, c). Using 25 and 75 % reduction as the cutoffs to stratify patients into three subgroups, we saw significantly different OS (Fig. 4a, 70.7 vs. 20.5 vs. 7.6 months, p<0.0001) and PFS (Fig. 4b, 25.5 vs. 8.4 vs. 4.6, p<0.0001) across groups.

Discussion
This study demonstrated a remarkable discrepancy in CMR rates between PM (qualitative visual assessment) and UM methods (semiquantitative assessment). Both criteria can distinguish patients with different outcomes providing important prognostic information, although the predictability may be considered somewhat coarse and suboptimal. As a continuum, the actual percentage change of metabolic activity was also shown to be predictive of survival, and a 75 % decline was found to be the optimal threshold to predict 2-year progression and the overall survival in this NSCLC patient cohort treated with definitive RT. Metabolic response has been broadly applied in clinical studies and increasingly used in clinical practice as well. However, there is little literature investigating the agreement across various response evaluation methods and comparing their predictive value for outcome. To our knowledge, there were two published studies currently making the direct comparison of the concordance and the prognostic significance between EORTC and PERCIST criteria [28,29]. Skougaard and colleagues found that response evaluation with EORTC criteria and PERCIST gave similar responses and OS outcomes with good agreement on best overall metabolic response in metastatic colorectal cancer treated with irinotecan and cetuximab [28]. Another study enrolling 29 patients with small cell lung cancer demonstrated a perfect concordance between EORTC and PERCIST criteria (kappa coefficient=1) [29]. Different from the abovementioned studies, our study made a comparison between the qualitative and semiquantitative methods. Our results revealed a moderate agreement between two methods whereas significantly higher CMR rate under PM method-based visual assessment. This finding indicated that the application of various response criteria could cause different conclusion about the efficacy of the treatment, which would lead to confusion in interpretation of trial results or clinical reports and may affect the clinical decision making for the further treatment such as the maintenance therapy. One should be cautious when comparing the metabolic response results across trials without identical evaluation methods.
Our results of 44 patients confirmed findings from previous studies that metabolic response (either UM or PM methods) is predictive of outcome in NSCLC . Posttreatment CMR clearly is associated with the longest survival irrespective of utilized methods. However, the survival differences in terms of both OS and PFS between UM criteria defined that CMR and non-CMR groups in our study did not reach statistical significance. Further stratification dividing PM criteria determined CMR patients into two subgroups (visual CMR and NSUV-A≤1 vs. visual CMR but NSUV-A>1) that failed to find further distinction in the context of OS or PFS, suggesting that the semiquantitative assessment of CMR may be too strict to miss good responders with only mild FDG uptake undiscerned by visual evaluation.
Interestingly, our findings demonstrated that the actual percentage reduction in NSUV-A was positively correlated with OS and PFS in a continuous variable model, which was consistent with the findings of ACRIN 6668/RTOG 0235 Trial [30]. This suggests that every single percent of metabolic activity reduction may confer a certain survival advantage and the relationship between tumor metabolic change and prognosis may be continuous rather than having a fixed cutoff point. Nonetheless, the categorical classification of metabolic response is a practical approach for evaluating treatment efficacy and guiding clinical decisions in daily care.
In this study, we validated that less than 25 or 30 % decrease in metabolic activity applied by EORTC and UM criteria would have allowed for the discrimination of nonresponders (SMD) from PMR patients. However, many studies have reported that over 70 % of NSCLC patients underwent at least 30 % of metabolic decrease [9,10,13,17,20]. Therefore, it is rational to raise the hypothesis that there may be a subset of patients with more favorable outcome in the UM or EORTC criteria determined PMR population. Several studies have attempted to modify cutoffs of posttreatment metabolic change to optimize the categorical response evaluation. Shiraishi and colleagues found that postneoadjuvant chemoradiotherapy of 40 % decrease in SUV could predict pathological response but could not successfully predict long-term overall survival [20]. Eschmann and colleagues proposed postneoadjuvant chemoradiotherapy where a 65 % decrease in SUV was the optimal cutoff to distinguish stage III NSCLC patients with distinct outcome [9]. For stage IV NSCLC, Yoon and colleagues reported that 50 % reduction of SUV could predict time to progression [23]. It is valid to speculate that the optimal cutoff value for favorable subset identification in NSCLC patients receiving definitive chemoradiotherapy should be higher than that in patients with the non-curative or palliative therapy. Accordingly, our study showed that a 75 % of NSUV decrease presented the most discriminative significance in terms of OS and 2-year progression prediction. Although a 60 % reduction was found to be the most robust cutoff for PFS prediction, a 75 % decrease in NSUV still presents adequate value for the identification of better responder.
We acknowledge several limitations in our study. This is a secondary analysis of a series of prospective studies. PET/CT was not required for the follow-up, and thus, no prospectively defined strict timing was used for the posttreatment PET/CT study. In the present study, we did not have the SUV data corrected by lean body mass (SUL) or body surface area (SUVmax-BSA), and therefore, we could not evaluate the response using PERCIST or EORTC methods. However, given the semiquantitative nature and similar cutoff definition, we would expect a strong concordance in response classification between these methods. Further prospective studies are warranted in a larger and independent cohort to validate our findings and hypothesis.

Conclusions
There is a moderate agreement between the visual assessment of PM and semiquantitative inspection of UM criteria for the metabolic response evaluation, with the main discrepancy of the CMR rate. Categorical metabolic response is validated to be strongly predictive of outcome, and visual assessment seems to be sufficient for CMR identification. The numerical percentage reduction of metabolic activity provides an ordinal value that correlates with prolonged survival. At least 75 % decrease of metabolic activity is indicative for a favorable long-term outcome, pending validation by an independent population.
Ethical standard This is a secondary analysis of a series of prospective studies initiated between 2003 and 2010 at the University of Michigan (UM). All of these studies were approved by the institutional review board (IRB) of the University of Michigan, and written informed consent was obtained from each patient.