Tumor response evaluation in patients with malignant melanoma undergoing immune checkpoint inhibitor therapy and prognosis prediction using 18F-FDG PET/CT: multicenter study for comparison of EORTC, PERCIST, and imPERCIST

Objective In malignant melanoma patients treated with immune checkpoint inhibitor (ICI) therapy, three different FDG-PET criteria, European Organization for Research and Treatment of Cancer (EORTC), PET Response Criteria in Solid Tumors (PERCIST), immunotherapy-modified PERCIST (imPERCIST), were compared regarding response evaluation and prognosis prediction using standardized uptake value (SUV) harmonization of results obtained with various PET/CT scanners installed at different centers. Materials and methods Malignant melanoma patients (n = 27) underwent FDG-PET/CT examinations before and again 3 to 9 months after therapy initiation (nivolumab, n = 21; pembrolizumab, n = 6) with different PET scanners at five hospitals. EORTC, PERCIST, and imPERCIST criteria were used to evaluate therapeutic response, then concordance of the results was assessed using Cohen’s κ coefficient. Log-rank and Cox methods were employed to determine progression-free (PFS) and overall (OS) survival. Results Complete metabolic response (CMR)/partial metabolic response (PMR)/stable metabolic disease (SMD)/progressive metabolic disease (PMD) with harmonized EORTC, PERCIST, and imPERCIST was seen in 3/5/4/15, 4/5/3/15, and 4/5/5/13 patients, respectively. Nearly perfect concordance between each pair of criteria was noted (κ = 0.939–0.972). Twenty patients showed progression and 14 died from malignant melanoma after a median 19.2 months. Responders (CMR/PMR) showed significantly longer PFS and OS than non-responders (SMD/PMD) (harmonized EORTC: p < 0.0001 and p = 0.011; harmonized PERCIST: p < 0.0001 and p = 0.0012; harmonized imPERCIST: p < 0.0001 and p = 0.0012, respectively). Conclusions All harmonized FDG-PET criteria (EORTC, PERCIST, imPERCIST) showed accuracy for response evaluation of ICI therapy and prediction of malignant melanoma patient prognosis. Additional studies to determine their value in larger study populations will be necessary.


Introduction
Recent breakthrough results from use of immune checkpoint inhibitors (ICIs) have provided a leap forward, which has led to a new era of cancer immunotherapy and cancer treatment paradigm shift [1]. Notably, strategies for inhibiting the antiprogrammed death-1 (PD-1)/programmed death-ligand 1 (PD-L1) axis with ICI treatment, including nivolumab and pembrolizumab, have been emerging as novel options for malignant melanoma [2].
Adequate assessment of systemic treatment response is crucial for effective cancer treatment management, which includes effective means to monitor responsiveness of the tumor to systemic therapy, and extremely important for moderation of the high risk of mortality as well as toxic effects known to be associated with available systemic therapeutic regimens. Several recent studies have found the utility of baseline and follow-up 18 F-fludeoxyglucose ( 18 F-FDG) positron emission tomography/computed tomography (PET/CT) results for assessing therapeutic response in cases of malignant melanoma treated with an ICI and also prognosis prediction [3][4][5][6]. Criteria commonly used for tumor response shown by PET include assessment of the change in sum of maximum standardized uptake value (SUV max ) or SUV after correction for lean body mass (SUL peak ) of up to five lesions, as reported by the European Organization for Research and Treatment of Cancer (EORTC) [7], and shown by the PET Response Criteria in Solid Tumors (PERCIST) [8] and immunotherapy-modified PERCIST (imPERCIST) [6]. However, widespread use of PET for determining treatment response has been limited by differences in the range of SUV among different available PET scanners. To compensate, harmonization among PET models has been used [9,10]. Another important issue is that until now, treatment response evaluations of patients treated with ICIs have been performed at a single center, while data obtained at multiple centers using various PET scanners have not been utilized. It is considered that more widespread use of PET to determine efficacy could occur should varied PET data obtained at multiple institutions be integrated to better determine treatment response.
This retrospective study sought to evaluate therapeutic response in patients with malignant melanoma treated with ICIs at different medical centers equipped with various PET scanners, and predict prognosis by use of baseline and follow-up 18 F-FDG PET/CT results with harmonized metabolic markers. Additionally, the utility of three different 18 F-FDG PET/CT criteria (EORTC, PERCIST, imPERCIST) was examined.

Patients
An appropriate institutional review board at each hospital approved this retrospective multi-center study, including waiving of informed consent requirements. Clinical records were reviewed to identify appropriate patients for analysis. The information systems of five hospitals were screened for cases of malignant melanoma treated with a PD-1 inhibitor or PD-L1 inhibitor therapy from August 2014 to October 2019, and with 18 F-FDG PET/CT results obtained before and after the start of therapy. Inclusion criteria included (1) 18 F-FDG PET/CT scanning performed within 3 months before, and from 3 to 9 months after initiation of ICI therapy, and (2) FDG-avid lesions observed in the pretreatment 18 F-FDG PET/CT examination. History or coexistence of other malignancies, and treatment with other ICIs before the present ICI therapy were used as exclusion criteria.

Protocol for 18 F-FDG PET/CT
Eight different whole-body PET/CT scanners were used at the participating institutions; Discovery 600, Discovery 710, Discovery iQ HD and Discovery MI (GE Healthcare, WI, USA), Gemini GXL16, Gemini TF, Ingenuity TF (Philips Medical Systems, Eindhoven, The Netherlands), and Aquiduo (Cannon Medical System, Ohtawara, Japan) ( Table 1). Each patient was instructed to fast for at least 4 h prior to the examination. In those with a plasma glucose level < 200 mg/dL, the radiotracer was injected IV at 3-4.5 MBq/kg, followed by 50-70 min of rest before image acquisition. Scans were acquired with an axial field of view from the vertex to mid-thigh or toe. For attenuation correction of the PET emission scan and anatomical orientation, low-dose CT images obtained during PET/CT were used. Reconstruction of the PET/CT images was done with an ordered-subset expectation-maximization algorithm or Bayesian penalized likelihood reconstruction algorithm, as well as with a Gaussian filter using standard reconstruction software supplied by the manufacturer [11,12]. For optimal harmonization filter calculations, PET data were reconstructed using the default parameters of each institution. An experienced medical physicist had harmonized the acquisition and reconstruction parameters to minimize SUV differences between scanners based on testing with regular phantom studies using region of interest (ROI) and volume of interest (VOI) Analysis Tool (RAVAT) and RC Tool for Harmonization (Nihon Medi-Physics Co., Ltd. Tokyo, Japan) so as to harmonize SUVs obtained with different PET/CT systems in a range advocated by the Japanese Society of Nuclear Medicine using a previously reported method [11,12].

Analysis of images
Local experienced physicians who were board-certified for both diagnostic radiology and nuclear medicine at each institution reviewed the 18 F-FDG PET/CT images obtained at their hospital in the comparison between the first and second 18 F-FDG PET/CT scans. An FDG-avid lesion was defined as a focal abnormally increased area of 18 F-FDG uptake as compared to the background, with or without a corresponding anatomic lesion seen on the CT scan image that was suggestive of metastasis. To obtain the SUV, the VOI was placed manually on a suitable reference fused axial image, defined based on the craniocaudal and mediolateral extent encompassing the entire target lesion, then any avid normal structures were excluded. The freely available software package RAVAT (Nihon Medi-Physics Co., Ltd. Tokyo, Japan) was used to calculate SUV max , SUV mean , and SUL peak .
SUV max was defined as maximum concentration in the target lesion (injected dose/body weight). To determine SUV peak , a 1.2-cm diameter volume ROI was placed on the hottest site of the tumor, then normalized (SUV peak × [lean body mass]/[total body mass]) and SUV mean calculations were performed based on the summed SUV in each voxel in the target volume divided by number of voxels within the target volume. Metabolic tumor volume (MTV) was automatically measured inside the tumor VOI with the margin threshold set at 40% of SUV max . Then, tumor lesion glycolysis (TLG) was calculated as SUV mean × MTV, with consideration of both metabolic activity and tumor burden. The corresponding values for each lesion in the patient were summed to calculate MTV and TLG.

Criteria for treatment response
Treatment response was classified as complete metabolic response (CMR), partial metabolic response (PMR), stable metabolic disease (SMD), or progressive metabolic disease (PMD). Based on EORTC, tumor response was also determined, as follows [7]. CMR was defined as complete resolution of 18 F-FDG uptake within the measurable target lesion making it indistinguishable from the surrounding background with no new 18 F-FDG-avid lesions. For patients with metabolically active lesions shown in follow-up scanning, the SUV max values of the same lesions (up to a total of five) noted in the baseline and follow-up scans were summed (maximum of two per organ). When the sum of the SUV max values showed a decrease ≥ 25%, tumor response was classified as PMR. PMD indicated a ≥ 25% increase in the sum of the SUV max values or detection of new 18 F-FDG-avid lesions characteristic of cancer. SMD was used to classify findings other than CMR, PMR, or PMD.
To determine therapeutic response according to PERCIST [8], a 1.2-cm diameter volume ROI was placed on the target lesion and SUL values were calculated. Additionally, the tumor SUL peak value was determined and compared with that of the liver SUL to check if it was 1.5 times or more greater than that of the liver SUL (mean ± 2 standard deviation (SD)) in a 3-cm diameter spherical ROI on the normal right lobe. CMR was the classification when complete resolution of 18 F-FDG uptake within the target lesion was lower than mean liver activity and indistinguishable from the level of the background blood pool. When metabolically active lesions were noted in a follow-up scan, the SUL peak values n/a n/a n/a n/a n/a 400 700 n/a Smoothing n/a n/a n/a Gaussian Gaussian n/a n/a Gaussian of up to five lesions at the baseline and in follow-up examinations were summed (maximum two per organ), and the hottest lesions in each scan selected, thus target lesions noted in follow-up examinations were not necessarily the same as those in baseline images. In cases with an SUL peak sum decreased ≥ 30%, tumor response was classified as PMR.
Conversely, when SUL peak sum was increased ≥ 30%, or appearance of new hypermetabolic lesions or ≥ 75% increase in TLG in follow-up 18 F-FDG PET/CT scan imaging was noted, that was defined as PMD. Cases not defined as CMR, PMR, or PMD received the classification of SMD. imPERCIST was performed in the same manner as PER-CIST, though new lesion appearance alone did not lead to a PMD classification [6], as that was defined only when the increase in the sum of SUL peak values was ≥ 30%. New lesions were included in the SUL peak sum for cases with a higher uptake level than the existing target lesions or when fewer than five target lesions in the baseline scan were detected.

Statistical analysis
Data are presented as the mean ± SD. Concordance between criteria methods was assessed using Cohen's κ coefficient [13], with level of agreement noted as slight (κ < 0.21), fair (κ = 0.21-0.40), moderate (κ = 0.41-0.60), substantial (κ = 0.61-0.80), or nearly perfect (κ > 0.80). Progressionfree survival (PFS) was defined based on the time elapsed from the start of ICI therapy to date of disease progression revealed in radiological and/or clinical examination results, or death from any cause. Patients with no evidence of progressive disease were censored at the date of the last followup examination. Overall survival (OS) was determined as start of ICI therapy until death from any cause. Patients alive at the final follow-up examination were censored, with alive with disease or no evidence of progression used for the classification. Actuarial survival curves were generated using the Kaplan-Meier method, while a log-rank test was employed to examine differences between groups. The SAS software package, version 9.3 (SAS Institute Inc., Cary, NC, USA), was utilized for statistical analyses, with p values < 0.05 considered to indicate significance.

Patients
Twenty-seven patients [18 males, 9 females; mean (± SD) 67.4 ± 11.3 years old; range 39-86 years] were selected as subjects. For OS calculation, the final follow-up date was April 2020. Baseline 18 F-FDG PET/CT scanning was performed at a median 27 days (2-90 days) before ICI therapy initiation, while follow-up scanning was done at a median 147 days (90-269 days) following the first ICI administration. ICI initiation and follow-up 18 F-FDG PET/CT scanning were performed in 4 cycles in 7, 5 cycles in 1, 6 cycles in 3, 7 cycles in 2, 8 cycles in 6, 9 cycles in 3, 11 cycles in 2, and 13 cycles in 3 of the present cases. Patient characteristics are shown in Table 2. The main regimen for nivolumab (n = 21) was a dosage of 240 mg every 2 weeks and for pembrolizumab (n = 6) was a dosage of 200 mg every 3 weeks, until observation of apparent disease progression or unacceptable toxicity, or treatment discontinuation was decided by the patient or attending physician. Of the 27 enrolled patients, treatment-related adverse events were noted in 3 (11.1%) (rash, interstitial lung disease, diarrhea in 1 each).
Harmonized EORTC, harmonized PERCIST, and harmonized imPERCIST each indicated significantly longer PFS in patients with disease control (CMR/PMR/SMD) than in those with PMD (p < 0.0001 for each) (Fig. 3). Similarly, patients classified as responders (CMR/PMR) based on all three criteria showed significantly longer PFS as compared to non-responders (SMD/PMD) (p < 0.0001 for each) (Fig. 4).

Discussion
The present is the first known study conducted to evaluate therapeutic response of patients with a malignant melanoma who were treated with ICIs at multiple medical institutions equipped with a variety PET scanners, with prognosis based on baseline and follow-up 18 F-FDG PET/CT results using harmonized metabolic markers also assessed. The findings clarified that 18 F-FDG PET/CT results obtained before and again from 3 to 9 months after initiation of ICI therapy using harmonized metabolic markers from eight types of PET scanners in place at five different hospitals were useful to evaluate tumor response as well as prognosis prediction in malignant melanoma patients who received ICI therapy. The impact of this study is considered to be high for clinical practice settings as well as multicenter trials. Use of different types of PET/CT scanners at the same institution is becoming common, thus methods for harmonization of PET quantitative values are needed in both clinical settings and for trials conducted in cooperation among multiple centers. Previously established harmonization programs such as the EANM/EARL program [9] and Quantitative Imaging Biomarker Alliance (QIBA/UPICT) [10] have provided useful comparisons of SUV metrics among different systems.
Comparisons of PERCIST and imPERCIST for evaluating response to ICI treatment in malignant melanoma patients, and prediction of OS were presented in an images] show almost disappearance of 18 F-FDG uptake in these nodal metastases (arrows). Because post FDG-uptake of these nodal metastases was slightly higher than the surrounding tissue and the reductions of the sum of harmonized SUV max were 83.3% (from 43.95 to 7.33), the status was PMR according to harmonized EORTC criteria. Because post 18 F-FDG uptake of all three nodal metastases was less than the liver activity, the response status according to harmonized PERCIST and harmonized imPERCIST was CMR. The patient was alive without progression 68.1 months after the initiation of nivolumab. Pretreatment harmonized SUV max /SUV mean /SUL peak of the right common iliac, external iliac, and inguinal nodal metastases were 14.67/9.32/9.6, 13.09/9.21/9.37, and 16.19/10.09/11.55, respectively. Pretreatment harmonized whole-body MTV and TLG were 72.46 and 714.51, respectively. Posttreatment harmonized SUV max /SUV mean /SUL peak of the right common iliac, external iliac, and inguinal nodal metastases were 3.26/1.65/1.96, 1.97/1.43/1.62, and 2.13/1.51/1.67, respectively. Posttreatment liver SUL (mean + 2 standard deviations) was 2.33. Posttreatment harmonized whole-body MTV and TLG were 19.35 and 30.58, respectively interesting study by Ito et al. [6], though there are no known reports of comparisons of EORTC criteria (SUV max ), PER-CIST, and imPERCIST (SUL peak ). In their investigation, Ito et al. found that imPERCIST was superior for OS, while all three harmonized 18 F-FDG PET criteria showed very high concordance of CMR/PMR/SMD/PMD in the present study, as well as accuracy regarding evaluation of response to ICI therapy and prediction of prognosis in malignant melanoma patients. A potential reason for this difference may have been the patient population, along with definitions of early (2-4 cycles of ICI) in their series and late (4-13 cycles of ICI, median 8 cycles) response for the assessments in our series.
Immune cell infiltration can delay tumor shrinkage or even cause a temporary size increase (pseudoprogression), thus assessment of tumor response following ICI treatment can be difficult. Several different criteria for use with 18 F-FDG PET/CT findings have been proposed to determine response to that treatment, such as PET/CT criteria for Because the increase of the sum of the harmonized SUL peak was 44.7% (from 3.87 to 5.6), the status was PMD according to harmonized imPERCIST. The patient exhibited progressive disease at 8.9 months and died 21.9 months after the initiation of nivolumab. Pretreatment harmonized SUV max , SUV mean , and SUL peak of the right lung metastasis were 6.85, 4.50, and 3.87, respectively. Pretreatment harmonized whole-body MTV and TLG were 2.22 and 10.01, respectively. Posttreatment harmonized SUV max /SUV mean /SUL peak of the right lung metastasis and left subscapular muscle metastasis were 4.21/3.08/2.44, and 5.53/3.81/3.16, respectively. Posttreatment harmonized whole-body MTV and TLG were 5.6 and 19.16, respectively  early prediction of response to immune checkpoint inhibitor therapy (PECRIT) [3], PET response evaluation criteria for immunotherapy (PERCIMT) [4], imPERCIST [6], and immune PERCIST (iPERCIST) [5], though an optimal evaluation method has yet to be established. The present criteria were established for early prediction following the start of ICI treatment (2 ~ 4 cycles). While pseudoprogression must be considered in the early phase following treatment initiation, that was not observed in any of our patients, which might have been due to the late (≥ 4 cycles) response assessment. Several studies have presented results demonstrating the usefulness of 18 F-FDG PET/CT for assessing ICI therapeutic response, especially early response (2 ~ 4 cycles) [3][4][5][6]. Cho et al. [3] showed analysis of PECRIT, which includes change in lesion size combined with change in FDG avidity shown by 18 F-FDG PET/CT, in 20 advanced melanoma patients  [6] were the first to present imPERCIST, in which new lesion appearance is not used to define PMD. They analyzed 60 metastatic melanoma patients and noted that a ≥ 30% increase in SUL peak sum in up to 5 measured lesions shown by 18 F-FDG PET/CT accurately reflected PMD after 2-4 cycles of ipilimumab. Using iPERCIST, two new categories for response to PMD were introduced by Goldfarb et al. [5], unconfirmed (UPMD) and confirmed (CPMD). Analyses of the results of 28 non-small cell lung cancer patients receiving nivolumab indicated that evidence of metabolic progression observed at 8 weeks (after 4 cycles) should be confirmed by another 18 F-FDG PET/CT examination 4 weeks later, while the usefulness of iPERCIST for differentiation of responders from non-responders and OS prediction was also noted (p = 0.0003). This study has some limitations. Since the results were obtained from a retrospective review of a small selected patient group, selection bias may have had an influence, as PET/CT imaging was used at the discretion of the referring physician. A prospective study with a much larger population is needed. Furthermore, the time period between start of ICI therapy and follow-up imaging was not standardized, which might have had effects related to changes in tumor FDG uptake and number of lesions detected. On the other hand, the present results reflect typical usage of 18 F-FDG PET/CT in clinical settings, and a clear correlation between PET response criteria and PFS or OS was shown, suggesting that response assessment by PET is acceptable for use in clinical practice.
In conclusion, the three harmonized 18 F-FDG PET criteria (EORTC, PERCIST, imPERCIST) used in the present study demonstrated high concordance for CMR/PMR/SMD/ PMD, as well as accuracy for evaluation of response to ICI therapy and prediction of prognosis in cases of malignant melanoma. Nevertheless, future studies will be needed with larger study populations to better determine the value of these methods.