Introduction

Peripheral nerve sheath tumors (PNSTs) are relatively common and include both benign and malignant tumors. Schwannomas are the most common benign nerve sheath tumors (BPNSTs) and neurofibromas make up the largest proportion of remaining BPNSTs [1, 2]. Nerve sheath tumors may arise sporadically or in association with neurofibromatosis. Neurofibromatosis type 1 (NF1) patients are at increased risk for developing PNSTs, with often high body tumor burden of neurofibromas [1,2,3,4]. Importantly, these neurofibromas may act as precursor lesions and can transform into malignant peripheral nerve sheath tumors (MPNSTs) [5]. MPNSTs are aggressive soft tissue sarcomas (STS), accounting for 2–3% of all STS [6, 7]. Although MPNSTs are rare in the common population, NF1 patients have an 8–13% lifetime risk of developing an MPNST. MPNSTs generally have poor clinical outcomes, being the leading cause of mortality in NF1 patients [8, 9]. The median survival of localized disease ranges from 5–6 years, demanding aggressive treatment [10, 11]. Surgical resection is the only curative therapeutic option improving survival as MPNSTs respond poorly to chemo- and radiotherapy [10,11,12]. While the resection of MPNSTs commonly results in high postoperative morbidity and motor deficits, BPNSTs may be removed by intracapsular resections, minimizing neurologic damage [13,14,15]. BPNSTs only require resection in selected cases, making adequate preoperative differentiation crucial.

18F-FDG PET-CT, using standardized uptake values (SUVs) and tumor-to-liver ratios as semi-quantitative metabolic imaging markers, has been increasingly used as a non-invasive diagnostic tool for the characterization of PNSTs in NF1 patients. However, ideal parameters and their corresponding thresholds have yet to be elucidated [16]. There is large variation in current literature regarding this matter, part of which might be caused by variation among scanners and scanning protocols [17,18,19,20]. Suggested optimal threshold values of semi-quantitative parameters vary greatly, but the SUVmax threshold of ≥ 3.5 is commonly cited [21,22,23,24]. However, its value has been doubted since it may provide high false positive rates [22]. Additional concerns rise among scanning in pediatric NF1 populations, as few studies have investigated the diagnostic accuracy in this subpopulation. By using European Association of Nuclear Medicine (EANM) Research Ltd. (EARL) protocol certified scanners, results are reproducible for any center utilizing a scanner of that kind.

Given current uncertainties of accurately distinguishing MPNSTs and BPNSTs using 18F-FDG PET-CT, this study investigated the diagnostic accuracy of optimal and commonly used thresholds of semi-quantitative 18F-FDG PET-CT markers using EARL certified scanners and evaluated possible differences between adult and pediatric populations.

Methods

Study population

Patient data was retrospectively collected from two neurofibromatosis expertise centers. Patients with NF1 (fulfilling the NIH criteria and/or genetically proven) who underwent 18F-FDG PET-CT examination for suspected MPNST based on clinical symptoms and/or radiological examination were included. The EARL protocol is used for performance harmonization for semi-quantitative imaging markers of 18F-FDG PET-CT, enabling comparison of imaging markers among patients and sites, regardless of the 18F-FDG PET-CT used. To increase homogeneity between imaging only patients following EARL protocol were included, thus only patients that underwent scans after 2013 were included. Patients with BPNSTs, either suspected or concluded by biopsy, with less than 12 months follow-up were excluded. Patients receiving treatment consisting of radiotherapy, chemotherapy or surgical excision of the lesion prior to 18F-FDG PET-CT were excluded as this may alter tumor imaging features. Patient data was obtained from electronic medical files including demographical information, histopathological outcomes, and (semi-quantitative) scan characteristics. This study was approved by the Ethics Committee of both participating centers with waiver of individual patient consent.

Image acquisition

18F-FDG PET-CT scans were performed using a Siemens Biograph mCT PET/CT scanner (Siemens Healthineers, Erlangen, Germany) and Philips Gemini 64 TOF (Philips Medical Systems International BV, Best, The Netherlands). After fasting for approximately 4–6 h the patients received intravenous administration of 18F-2-fluoro-2-deoxy-d-glucose (FDG). In adults the dose of FDG in MBq was based on weight in one center and on weight adjusted to surface body area (ranging 113–385) in the other. Pediatric patients received weight-dependent administration of FDG based on the pediatric dose card of the EANM [25]. Administration of tracer took place after confirming blood glucose levels were within normal range. If blood glucose levels were greater than 10 mmol/L, the study was rescheduled. Whole body attenuation corrected images were acquired approximately 60 min after tracer injection. During this uptake phase, patients were instructed to rest in a warm, dimly lit room with minimal stimulation. According to scanning protocol, first a whole-body low dose CT was acquired for attenuation correction and localization purposes (120 kV, Quality reference mAs 40, rotation time 0.5 s, pitch of 0.8 mm, slice thickness of 3 mm; reconstructed slice thickness 3 mm). Directly after the low dose CT, PET acquisition started in list-mode, using 6 to 7 bed positions per patient (from skull base to inguinal region). All scans were corrected for scatter and attenuation using the low dose CT and reconstructed using ordered subset expectation maximization (OSEM) and Time of Flight (TOF). Logistic time constraints warranted delayed imaging was performed after 3 h. In the neurofibromatosis expertise centers, semi-quantitative analysis was performed by a nuclear medicine physician with over 3 years of experience, blinded to both clinical history and pathology results. Maximum, mean, and peak standardized uptake values (SUVmax, SUVmean, and SUVpeak) were determined by drawing a volume of interest (VOI) around the target lesion or in the liver as reference (Fig. 1). Tumor-to-liver ratios were determined by drawing a VOI with a diameter of 3 cm in the center of the right liver lobe. Care was taken that the whole VOI was inside the liver. The SUVmax and mean of this region were measured.

Fig. 1
figure 1

Maximum, mean, and peak standardized uptake values determined by drawing a volume of interest around the target lesion. Imaging displaying maximum, mean, and peak standardized uptake values determined by drawing a volume of interest around the target lesion in MPNST and neurofibroma. MPNST malignant peripheral nerve sheath tumor; SUV standard uptake value

Histological analysis

Histology was considered gold standard and was performed according to institutional standards. Tumors were classified as typical neurofibroma, atypical neurofibroma, or malignant peripheral nerve sheath tumor, using established pathologic criteria [20, 26, 27].

Statistical analysis

The following semi-quantitative imaging markers were analyzed for potential use to differentiate malignant transformation in neurofibromas: SUVmax, SUVpeak, SUVmax adjusted to lean body mass (SULmax), SUVpeak adjusted to lean body mass (SULpeak), delayed SUVmax, delayed SUVpeak, delayed SULmax, delayed SULpeak, TLmax, and TLmean. Lean body mass (LBM) was calculated using Janmahasatian’s formula [25,26,27,28,29]. Receiver operating characteristic (ROC) analysis was performed for each semi-quantitative imaging marker and optimal threshold values were determined using Youden’s index. Ideal threshold sensitivity, specificity, positive likelihood ratio (pLR), and negative likelihood ratio (nLR) were determined. Diagnostic accuracy was described using area under the receiver operating curve (AUC). Optimization of the diagnostic algorithm was performed using commonly used imaging markers SUVmax and TLmean. Performance of commonly used threshold values for SUVmax (3.0–6.0) and TLmean (1.5–3.0) was assessed. Steps of 0.5 were used to improve generalizability. Additionally, threshold values yielding 100% sensitivity or 100% specificity were assessed. Combinations of these parameters were manually assessed to identify the diagnostic algorithm with highest sensitivity and acceptable specificity. Patients were stratified by age (adults vs. children) and subgroup analysis was performed for MPNSTs and BPNSTs. Nine patients received more than one 18F-FDG PET-CT. Differences in PNSTs and between subgroups were analyzed using chi-square test for categorical variables and for continuous variables a one-way test/t-test depending on normality of distribution based on the Shapiro–Wilk test. Additionally, Kruskal–Wallis or Wilcoxon test were used, depending on distribution. As recent literature indicates that PNSTs are at risk of undergoing malignant transformation at any point in time, each tumor was investigated for malignant transformation at every 18F-FDG PET-CT independently of previous measurements [23, 30,31,32]. Typical and atypical neurofibromas were evaluated together as they are both considered benign lesions. Statistical significance was established for p-values < 0.05. All statistical analyses were performed using R version 4.0.3 (R Core Team 2020).

Results

Study population

Sixty patients were included, undergoing 18F-FDG PET-CT examinations for seventy tumors, 10 MPNSTs and 60 BPNSTs (Table 1). Forty lesions were found in females and thirty in males. Nineteen of seventy lesions had delayed scans, of which 3 MPNSTs. Fifteen lesions were evaluated in children (≤ 18 years). Mean duration of follow-up was 3.5 ± 1.6 years. At last follow-up, 7 MPNST patients and 4 BPNST patients were deceased.

Table 1 Patient characteristics of study population

Optimal threshold values

The optimal threshold for SUVmax was ≥ 5.82 (AUC = 0.88) (Table 2, Supplementary Fig. 1). Sensitivity and specificity were 0.70 and 0.92, respectively. pLR and nLR were 8.26 and 0.33, respectively. Optimal threshold for SULmax was ≥ 8.83 (AUC = 0.86). Sensitivity and specificity were 0.67 and 0.90, respectively. pLR and nLR were 6.56 and 0.37, respectively. Optimum threshold for TLmean was ≥ 2.31 (AUC = 0.91). Sensitivity and specificity were 0.90 and 0.79, respectively. pLR and nLR were 4.35 and 0.13, respectively. Optimum threshold for delayed SUVmax was ≥ 2.53 (AUC = 0.81, Supplementary Fig. 2). Sensitivity and specificity were 1.00 and 0.56, respectively. pLR and nLR were 2.29 and 0.00, respectively. Optimum threshold for delayed SULmax was ≥ 3.42 (AUC = 0.69). Sensitivity and specificity were 1.00 and 0.50, respectively. pLR and nLR were 2.00 and 0.00, respectively.

Table 2 ROC analysis of diagnostic accuracy of semi-quantitative imaging markers

Differences between adults and children

Statistically significant differences between adults and children were found in MPNSTs for mean SUVmax (11.56 vs. 3.10, p = 0.037) and SUVpeak (7.48 vs. 2.14, p = 0.037), but not in BPNSTs (Table 3). By adjusting for LBM, uptake values for pediatric MPNSTs were still significantly lower: SULmax (15.53 vs. 4.25, p = 0.040) and SULpeak (10.59 vs. 2.92, p = 0.040). Proportional values (TLmax and TLmean) were not statistically lower in pediatric MPNSTs.

Table 3 Semi-quantitative imaging markers stratified by age

PET algorithm

An SUVmax of 2.8 yielded 100% sensitivity (Supplementary Tables 1 and 2). An SUVmax of 7.3 yielded 100% specificity. The commonly used threshold of ≥ 3.5 for SUVmax yielded 80% sensitivity with 63% specificity. A TLmean of 1.6 yielded 100% sensitivity. A TLmean of 4.8 yielded 100% specificity. A commonly used threshold of ≥ 2.0 for TLmean yielded 90% sensitivity with 74% specificity. As TLmean ≥ 2.0 offers higher accuracy than ≥ 3.5 SUVmax, and values do not differ significantly between adults and children, an optimal diagnostic work-up can be achieved by performing biopsies in lesions with threshold of TLmean ≥ 2.0 or TLmean < 2.0 and SUVmax ≥ 3.5 (Supplementary Table 3). This diagnostic algorithm resulted in 100% sensitivity and 63% specificity, requiring 22/60 BPNSTs to undergo biopsy (Fig. 2). Additionally, using the optimal threshold of TLmean found in this study (≥ 2.3), specificity may be increased to 65%, resulting in one less BPNST requiring biopsy.

Fig. 2
figure 2

Diagnostic algorithm using TLmean and SUVmax. Diagnostic algorithm for optimal diagnostic work-up by performing biopsies in lesions with threshold of TLmean ≥ 2.0 or TLmean < 2.0 and SUVmax ≥ 3.5. MRI magnetic resonance imaging; SUV standard uptake value; TL tumor-to-liver ratio

Discussion

This retrospective study found that PET scans offer adequate accuracy for detecting malignant transformation of neurofibromas both in adults and children. Combining SUVmax and TLmean threshold values in a diagnostic algorithm increases specificity while retaining 100% sensitivity.

Optimal thresholds in PET scans

In the past decades, 18F-FDG PET-CT scans have increasingly been used to detect malignancy in NF1 patients. Though numerous studies aimed to identify ideal semi-quantitative imaging markers, ideal thresholds for detecting MPNSTs vary across studies. Differences in reported SUV measurements may occur due to different types of scanners and protocols being used. To diminish variations across scanners, criteria were formulated by the EARL to improve reproducibility of evaluated thresholds. Additionally, proportional SUV values as the TL ratio are proposed to reduce measurement variations. The most commonly evaluated characteristics for detection of malignant transformation of PNSTs are SUVmax and TLmean. Studies evaluating SUVmax reported ideal thresholds varying from 2.35 to 6.1 [20, 25,26,27, 31, 33,34,35,36,37,38,39,40]. Studies evaluating TL ratio reported ideal thresholds varying from 1.4 to 3.0 [17, 25, 31, 35, 37, 39, 41]. This study found ideal threshold values for SUVmax and TL ratio consistent with those reported in literature and delayed imaging did not improve diagnostic accuracy. However, using these thresholds some MPNSTs may be missed.

Children vs. adult populations

Malignant transformation of neurofibromas also occurs in children [12, 42]. As detection of MPNST at early stages could increase the possibility of curative resections, frequent and serial imaging for surveillance of lesions is often performed. Conversely, this practice may possibly lead to harmful long term radiation effects [22, 35, 39, 43, 44]. Unfortunately, only few published 18F-FDG PET-CT studies have included children for analysis and no analysis has been performed comparing imaging marker values between adult and pediatric NF1 patients. Studies that combined data from both adults and children with NF1 found an optimal threshold value of SUVmax ranging from 3.90 to 4.00 with sensitivity ranging from 82 to 100% and specificity ranging from 66 to 94% [25,26,27]. Studies including only adult NF1 patients found a wider range of optimal threshold values for SUVmax ranging from 1.8 to 7.0, suggesting that children may have lower SUVmax values compared to adults [20,21,22,23, 28, 30,31,32,33, 35, 39, 41, 42, 44,45,46]. It is suggested that SUV values in adults may be higher, as the administered dose is adjusted by weight and since adults have comparably more fat tissue which has relatively low FDG, the uptake in lesions and normal organs is higher. Adjusting SUV to lean body mass may correct for body composition as a contributing factor for SUV differences found between adult and pediatric patients. Recent studies have investigated the use of SUL using James’s formula to improve diagnostic accuracy in differentiation of PNSTs in adult population [20, 29, 39, 42, 47, 48]. This study adjusted SUV to lean body mass using a recently proposed formula by Janmahasatian, as it is suggested to be more accurate for use in children [25,26,27,28,29]. Significantly lower SUVmax and SUVpeak values in MPNSTs in children were found. However, after adjusting for lean body mass uptake values of SUVmax and SUVpeak remained significantly lower in MPNSTs in children, suggesting it is less likely that differences in body composition significantly contribute to SUV differences found between adults and children [29]. Though based on only 2 MPNSTs, significantly lower SUV values were found in children. This may be due to the large spread in uptake values in adults, which require relatively low SUVmax thresholds. Nevertheless, based on the significant differences in SUV values between adults and children, caution should be taken in interpreting SUV thresholds on their own in children.

Optimal PET algorithm

A threshold of 3.5 for SUVmax has often been proposed as the ideal threshold [21,22,23]. A recent meta-analysis pooled individual level patient data from 11 different study populations and found a threshold of 3.5 provided the highest sensitivity (0.99) and acceptable specificity (0.75) [24]. Arguments against using this threshold often consisted of the low specificity it offered. This study found a sensitivity of 0.80 and specificity of 0.63 using a threshold of 3.5 for SUVmax. In this study, TLmean yielded slightly better accuracy (0.92) compared to SUVmax, while there was no significant difference between adults and children in proportional values. Contrasting to previous studies, the current study combined the use of SUVmax and TLmean, proposing an algorithm aimed to achieve optimal sensitivity while retaining acceptable specificity. Using a threshold of TLmean ≥ 2.0 or TLmean < 2.0 and SUVmax ≥ 3.5, sensitivity of 1.00 was achieved and specificity of 0.63. As TL values did not differ between the adult and pediatric population, there does not seem to be a rationale to have separate diagnostic algorithms. Using single semi-quantitative imaging markers, sensitivity of 1.00 is often not achieved or comes at the cost of lower specificity. A single marker’s threshold may also be less reproducible in other populations.

Strengths and limitations

This study is limited by its relatively small population, which is mainly a result of the strict inclusion criteria. The inclusion of symptomatic lesions and EARL adhering scans only, is stricter than previous studies. As EARL criteria were adapted in both participating centers only in 2013 and a follow-up of a year for benign lesions was required, the study period was relatively short. Therefore, subgroup analysis of pediatric patients should be interpreted with caution. Despite these limitations, the results of this study are reproducible for any center using PET-scanners that adhere to EARL criteria. Additionally, this study used a combination of SUVmax and TLmean and developed an optimal diagnostic work-up algorithm to identify all MPNSTs while minimalizing the number of false positives. To the best of our knowledge, this is the first study to compare semi-quantitative imaging marker values between adult and pediatric patients. This study found that while SUVmax and SUL were significantly lower for MPNSTs in children, TL values were not. Based on the findings of this study, future research should investigate several knowledge gaps. First, the semi-quantitative characteristics evaluated in this study should be validated in large prospective cohort studies with PET scanners adhering to EARL criteria. This may identify ideal threshold values for accurate detection of malignant transformation of PNSTs. Secondly, the use of the proposed diagnostic algorithm should be replicated in a large database of adult and pediatric NF1 patients. Additionally, SUV values of semi-quantitative imaging markers in adult and pediatric NF1 patients should be studied too. Though adjusting optimal threshold values based on age did not impact the diagnostic accuracy of the proposed algorithm, potential differences in diagnostic accuracy between these populations may necessitate different diagnostic guidelines nevertheless. Altogether, the results from these studies will provide a framework that may enable optimal diagnostic algorithms to be formulated. This study only assessed the diagnostic accuracy of 18F-FDG PET-CT. A recently published meta-analysis reported that although conventional MRI yields varying degrees of accuracy, some studies have shown high accuracies in functional MRI [24]. Though further research is required on this modality, reducing the need for 18F-FDG PET-CT may diminish radiation exposure that accumulates due to numerous follow-up scans necessary in NF1 patients prone to tumorigenesis.

Conclusion

In EARL adhering PET-scanners, semi-quantitative imaging markers offer acceptable diagnostic accuracy for detecting malignant transformation of PNSTs in NF1. An algorithm was proposed, combining SUVmax and TLmean, which maximizes sensitivity while simultaneously reducing the number of false positives, thus reducing the number of unnecessary biopsies. This algorithm can readily be used in any center using EARL adhering PET-scanners. In pediatric MPNSTs SUVmax values were significantly lower even after correction for lean body mass, yet TL values were similar to adult cases. These potential differences between uptake values of adults and children did not impact the diagnostic algorithm.