Background

Thyroid carcinoma (TC) is a rare malignancy. Nonetheless, the incidence has grown markedly since the early 1990s. In 2010 the standardized incidence in the US was estimated at 6.0/100,000 in males and 17.3/100,000 in females [1, 2]. In Europe, the standardized incidence ranges from 2.03/100,000 to 5.0/100,000 in males and from 5.65/100,000 to 15.50/100,000 in females [3]. TC has a relatively high rate of local recurrence and lymph node or soft tissue metastases; estimates range between 20 and 30 % [4, 5]. Distant metastases develop in up to 15 % of cases, primarily as pulmonary metastases, followed by bone metastases [6]. Prognosis primarily depends on the site of recurrence and the subsequent treatment, and early diagnosis of recurrence is of utmost importance to determine whether or not a salvage surgery is possible. Therefore, a close postoperative follow-up regime is mandatory [7].

Currently, the most frequently used modalities for post-thyroidectomy follow-up examinations are neck ultrasound (US), measuring thyroglobulin (Tg) blood levels, and radioiodine 131I-whole-body scan (I-WBS) [79]. However, 20–40 % of the patients with recurrent TC or nodal metastases lose their ability to accumulate radioactive iodine due to tumor cell dedifferentiation. Thus, they are not visible on I-WBS. In this case, alternative imaging modalities are needed, such as 18F-fluorodeoxyglucose Positron emission tomography (FDG-PET), 18F-fluorodeoxyglucose-PET/computed tomography (FDG-PET/CT), Magnetic resonance imagining (MRI), or even 18F-fluorodeoxyglucose-PET/MRI (FDG-PET/MRI) [5, 1012].

FDG-PET was shown to be a useful tool in the localization of disease in patients with elevated Tg levels and negative US and I-WBS [13, 14]. It was quickly outperformed by FDG-PET/CT, which combined the advantages of metabolic and morphologic imaging and thus allowed for improved anatomic localization and correlation of focal metabolic activity. FDG-PET/CT is now commonly accepted as the method of choice for post-thyroidectomy patients with increased Tg blood levels and negative I-WBS [11, 12, 1518]. For distant metastases, particularly of the lung and bones, FDG-PET/CT reaches both sensitivity and specificity up to 1.00 [14, 15, 19]. Nonetheless, the sensitivity and specificity of FDG-PET/CT for detecting locally recurrent or metastatic TC is still relatively low and ranges between 0.46 to 1.00 and 0.66 to 1.00, respectively [5, 12, 17]. Furthermore, recurrent, residual, or metastatic tissue does not always accumulate 18FDG, and thus remains undetectable by FDG-PET/CT. Hence, supplementary imaging modalities are occasionally required.

MRI is the primary imaging modality for soft tissue tumors due to its excellent soft tissue contrast, additional functional imaging capabilities, and reduced dental metal artifacts in the head and neck, as compared to FDG-PET/CT [2023]. MRI of the neck is mainly used for planning the surgical approach and postoperative follow-up in head and neck cancer and TC patients. Its specific application for TC has frequently been described in the literature [22, 24, 25]. The sensitivity and specificity of MRI in detecting locally recurrent or metastatic TC ranges from 0.76 to 0.95 and 0.51 to 0.98, respectively [4, 22, 24, 26]. Because of its superior soft tissue contrast and its ability to detect non-iodine-avid and FDG-negative tumor tissue, MRI appears to be a powerful complementary imaging modality to FDG-PET/CT [4, 22, 23, 26, 27]. Additionally, gadolinium-based MRI-contrast media avoids iodine contamination before curative radioiodine therapy [4, 23]. PET/MRI is a new hybrid imaging modality that has been applied to TC. Initial results suggested PET/MRI was superior to PET/CT in detecting iodine-positive lesions [28], but for evaluating the primary disease in the neck area it performed as well as contrast-enhanced PET/CT [29]. However, PET/CT was superior to PET/MRI in assessing pulmonary status [30].

A systematic assessment of the added value of 18F-fluorodeoxyglucose-PET/low dose computed tomography (FDG-PET/ldCT) and MRI for detecting locally recurrent TC or cervical nodal metastases has not yet been reported in the literature. Also, the added value of a consensus reading between a nuclear medicine physician and a radiologist, which we have found to be very helpful in clinical routine, has not been investigated. Therefore, this study aimed to evaluate the impact of combined FDG-PET/ldCT and MRI on the detection of local recurrence and cervical nodal metastases of TC. Furthermore, it attempted to quantify the value of a multidisciplinary consensus reading.

Methods

Study design, ethics, and study group

This study followed all procedures laid out in the ethical standards of the responsible committee on human experimentation and with the revised version of the Helsinki Declaration of 1975. All patients were referred to departments of radiology and nuclear medicine as indicated by clinical need and received sequential MRI and FDG-PET/ldCT scans for postoperative follow-up. No additional radiation dose was applied. Patient records and information were de-identified prior to analysis by the department’s information technology service. Institutional review board approval was waived, given that this study utilized the retrospective analysis of blinded clinical data.

This was a retrospective diagnostic study. Forty-six consecutive TC patients who underwent thyroidectomy and radio iodine therapy with ongoing suspicion of recurrent local or metastatic disease were reviewed by our multidisciplinary tumor board between June 2008 and November 2012. The inclusion criteria included elevated Tg blood level, negative ultrasound and I-WBS (diagnostic rhTSH stimulated I-WBS with administered activity of 370 or 3700 MBq) as well as combined FDG-PET/ldCT and MRI examinations that occurred within 72 h of one another. Tg blood levels above 2 μg/l were defined as clearly pathological. However, in some patients with high risk profile (e.g., lymph node metastases in clinical history) Tg values below 2 μg/l were considered as suspicious. Patients with a medullary subtype of TC were excluded.

Table 1 shows demographic and disease-related information of the study patients.

Table 1 Descriptive demographic and disease-related data of all study patients

Procedures and techniques

PACS-Viewer

Images were analyzed using the TeraRecon Aquarius thin client® viewer (TeraRecon, Inc., CA, USA).

FDG-PET/ldCT

PET/CT of the whole body was acquired on a Gemini® TF 16 PET/CT (Philips, Best, the Netherlands) 60 min after the application of 18F-fluorodeoxyglucose (2–2.5 MBq/kg). A fasting period of more than six hours before application was mandatory for all patients. For attenuation correction, low dose CT without application of contrast media was used.

MRI

Examinations of the cervical region were acquired with Magnetom Sonata® 1.5 T, Magnetom Avanto® 1.5 T, Magnetom Espree® 1.5 T, and Magnetom Skyra® 3 T (all Siemens Healthcare, Erlangen, Germany).

Due to variations between the scan protocols, only axial short tau inversion recovery (STIR)-weighted sequence, axial T1-weighted turbo spin echo (TSE) sequence, and Gadolinium contrast media enhanced (Dotarem®, Guerbet, France) axial T1-weighted TSE sequences with fat suppression were selected for study analysis.

Image analysis and definition of results

The FDG-PET/ldCT studies were independently analyzed by two blinded physicians who both had board certification in nuclear medicine and with 9 (MM) and 25(MS) years of professional experience. The MRI studies were independently analyzed by two blinded physicians who specialized in diagnostic radiology and who had 27 (SSF) and 5 (JMH) years of experience with cross-sectional imaging of the head and neck.

The area of the thyroid bed and cervical lymph node basins in the FDG-PET/ldCT and MRI examinations were analyzed for locally recurrent TC and nodal metastases separately as well as non-split. In order to be able to compare both imaging modalities, all pathology seen in the FDG-PET/ldCT, which was located outside the field of view covered by the MRI, was excluded for analysis. The test results were classified as either positive or negative based on a qualitative visual analysis. If either one or both of the two readers identified a pathologic finding, the result was considered to be positive. Uncertain findings were classified as negative. To explore the possibilities for a synergistic effect, the results of the FDG-PET/ldCT and MRI examinations were combined as follows: If either both FDG-PET/ldCT and MRI or one of each was positive, then the combined result was determined to be positive. If both the FDG-PET/ldCT and MRI findings were negative, then the combined result was classified as negative. The separate analyses were followed by a consensus reading between the FDG-PET/ldCT and MRI observers. Inter-observer agreement was determined by means of Cohen’s kappa coefficient (κ) and classified according to Landis and Koch [31].

Validity

The gold standard was defined as histopathological findings (subgroup a, n = 20) or, if the patient did not receive an operation for any reason, a negative long-term clinical follow-up period of at least 3 years (subgroup b, n = 26). In subgroup a 17 patients were operated as a result of the findings of the FDG-PET/ldCT and MRI examinations, whereas 3 patients underwent surgery in the subsequent clinical course. Due to the high number of patients who did not undergo surgery, we performed a separate subgroup analysis on this group. The gold standard for locally recurrent TC or nodal metastases was determined either as “positive” or “negative”, respectively.

Method comparisons for detection of locally recurrent TC and cervical nodal metastases

The findings of the FDG-PET/ldCT and MRI examinations were independently compared to the gold standard in a cross table analysis. Subsequently, the combined FDG-PET/ldCT and MRI results were compared to the gold standard. Finally, the results of the consensus reading were compared to the gold standard and to the combined FDG-PET/ldCT and MRI results. For each comparison, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy were calculated.

Clinical relevance

To evaluate the clinical relevance of the detected imaging findings, we separately analyzed the therapeutic consequences of the PET/CT and MRI examinations in clinical routine. According to Wiebel et al. [32] the FDG-PET/ldCT and MRI scans were considered to impact the patient management if the “examinations identified disease that was not previously known and resulted in an intervention”, if the scans “resulted in the decision to not pursue a specific intervention”, or if the scan “changed the extent of a planned surgery because of additional disease identified” [32].

Statistical tests

Data analyses were performed using IBM SPSS Statistics® Version 22 (IBM, NY, USA). Cohen’s kappa coefficient was used to determine inter-observer agreement. For method comparison, the sensitivity, specificity, PPV, and NPV were calculated, and McNemar’s test for paired samples was applied, where the level of significance was set at α = 0.05.

Results

In the detection of both locally recurrent TC and nodal metastases with FDG-PET/ldCT, there was a substantial agreement between reader 1 and reader 2 (κ = 0.71 and κ = 0.63, respectively). In the non-split analysis of detection of locally recurrent TC or nodal metastases we found an almost perfect agreement between reader 1 and reader 2 (κ = 0.83).

In the detection of locally recurrent TC with MRI, there was a substantial agreement between reader 1 and reader 2 (κ = 0.69), whereas in the detection of nodal metastases with MRI there was only a moderate agreement (κ = 0.55). In the non-split analysis of detection of locally recurrent TC or nodal metastases we found a substantial agreement between reader 1 and reader 2 (κ = 0.67).

In the detection of both locally recurrent TC and nodal metastases, there was only slight to fair agreement between the findings from the FDG-PET/ldCT and MRI (κ = 0.21).

The sensitivity, specificity, PPV, NPV, and diagnostic accuracy for detecting locally recurrent TC or lymph node metastases are shown in Tables 2 and 3. The separate analyses for locally recurrent TC and nodal metastases are demonstrated in Additional files 1, 2, 3 and 4: Tables S1–S4.

Table 2 Detection of local recurrence or nodal metastases of thyroid cancer
Table 3 Diagnostic performance of FDG-PET/ldCT, MRI, combined FDG-PET/ldCT and MRI, and the consensus reading

The FDG-PET/ldCT and MRI results had impact on patient management in 23 of 46 patients (50 %): 14 patients (30 %) were operated on, 3 patients (7 %) underwent surgery and radiotherapy, 3 patients (7 %) were irradiated, 2 patients (4 %) underwent radio iodine therapy, and 1 patient (2 %) was treated with tyrosine kinase inhibitor therapy. In 10 patients (22 %) with active disease, an active surveillance strategy was chosen for various reasons, e.g., inoperable situation or poor general condition.

Figure 1 shows the added value of FDG-PET/ldCT and MRI for clearer identification of a locally recurrent TC.

Fig. 1
figure 1

The added value of FDG-PET/ldCT for identifying a locally recurrent thyroid cancer with unspecific pretracheal finding via MRI. a STIR-weighted MR image, b contrast-enhanced T1-weighted MR image, c FDG-PET, and d fused FDG-PET and CT

Discussion

The purpose of this study was to investigate the impact of combined FDG-PET/ldCT and MRI on detecting locally recurrent TC and nodal metastases in high risk patients, with a special focus on the value of a multidisciplinary consensus reading.

The rise in use of diagnostic imaging, especially of FDG-PET(/CT) in TC follow-up, leads to increased radiation exposure and associated health costs [3234]. Unnecessary diagnosis and intervention of low-risk and clinically non-significant disease remain a legitimate concern [34]. However, it is important to note that the imaging results analyzed in this study influenced patient management in over 50 % of patients, of whom 17 (37 %) were operated. The high percentage of high-risk patients with stage 4 disease may explain the higher percentage of changes in patient management than has been recently reported in the literature with approximately 30 % [32]. However, the smaller number of patients in our study may have caused selection bias.

Our data showed moderate inter-observer agreement in the detection of cervical nodal metastases with MRI. In our opinion, this moderate agreement is due to the fact that a reliable differentiation between malign and benign lymph nodes that is solely based on morphologic criteria is often not possible – a problem that is well-known in the literature [4, 35].

With regards to the data validity as compared to the heterogeneous gold standard there was a basic issue: If a recurrent disease was suspected, then the patient usually underwent surgery. Thus, over 80 % of the positive imaging results were confirmed by histopathology. In case of clearly negative FDG-PET/ldCT and MRI results, or if the patient was not suitable for surgery, no histologic sample could be gathered. As a consequence, the positive predictive value was higher and the negative predictive value was lower in the surgery subgroup.

Comparing the individual imaging modalities, FDG-PET/ldCT clearly outperformed MRI in detecting locally recurrent TC by having a higher sensitivity, specificity, PPV, NPV, and diagnostic accuracy. In the identification of cervical nodal metastases, FDG-PET/ldCT had a higher sensitivity and NPV than MRI, and an equal diagnostic accuracy. In non-split analysis FDG-PET/ldCT also clearly outperformed MRI by having a higher sensitivity, specificity, PPV, NPV, and diagnostic accuracy. Our overall FDG-PET/ldCT and MRI results were within the sensitivity and specificity range reported in the literature, where the metabolic PET component was described as “the leading tool for the detection of tumor recurrence, regardless of the anatomical imaging component with which it is combined” [26]. Nonetheless, the primary benefit of MRI lays in the precise morphologic assessment and correlation of metabolic foci, and its use in planning a surgical approach [7]. Additionally, MRI correctly identified locally recurrent TC in one patient with PET-negative disease.

The poor agreement between the FDG-PET/ldCT and MRI results (κ = 0.21) most likely originated from their basic methodic differences. In truth, both imaging modalities identified different true positive results and thus demonstrate a synergistic effect for combining these results. The combination of FDG-PET/ldCT and MRI test results primarily increased sensitivity and NPV. Nonetheless, specificity, PPV, and diagnostic accuracy decreased markedly due to concomitant higher rates of false positives. Thus, their combination is not appropriate for clinical routine, for it would increase the rate of subsequent interventions of potentially clinical non-relevant disease. We found that the multidisciplinary consensus reading was a key element in this dilemma. In mammography screenings, for instance, the value of second imaging and consensus reading has been extensively discussed in the literature, which finally led to its mandatory introduction in clinical practice [36, 37]. With regards to TC, the influence of consensus reading on the detection of locally recurrent tumor or nodal metastases has not yet been sufficiently investigated in the literature. Consensus interpretation in assessing FDG-PET/CT examinations have been reported to impact specificity by considerably reducing the number of false positive findings [11]. Correspondingly, for detecting local recurrence or nodal metastases of TC, the consensus reading outperformed the combined FDG-PET/ldCT and MRI results by increasing specificity, PPV, and diagnostic accuracy, while maintaining sensitivity and NPV. Specifically, 5 false positive MRI results with suspicious morphological findings as well as one false positive FDG-PET/ldCT result with FDG-negative disease could have been corrected by means of the consensus reading. However, it is of note that the consensus reading could improve the sensitivity, NPV, and diagnostic accuracy values of FDG-PET/ldCT only moderately, which in turn stresses the importance of metabolic PET imaging for detection of recurrent disease [26].

PET/MRI is a new emerging imaging modality with highly limited availability in clinical practice. In TC the first results suggest that FDG-PET/MRI is equal to contrast enhanced FDG-PET/CT in assessing cervical status. Nonetheless, FDG-PET/ldCT was shown to be superior to FDG-PET/MRI in assessing pulmonary status [30]. Therefore, Vrachimis et al. [30] suggested combined PET/MRI and low dose CT of the lung as a powerful diagnostic tool in patients with suspected recurrent TC. Our study results may support this approach, insofar as it combines the advantages of functional and morphological imaging with potentially highest possible level on diagnostic performance as well as least possible radiation dose. However, future studies should look deeper into the clinical benefit of PET/MRI in the follow-up care of post-thyroidectomy TC patients.

Limitations

Our study patients were examined with different MRI scanners that had different field strengths, gradients, and examination protocols. Despite this limitation, there was a relatively high inter-observer agreement in the MRI group. Secondly, many MRI examinations were performed without additional diffusion-weighted imaging or functional imaging, which may have decreased the sensitivity and specificity of MRI. Thirdly, PET/CT was performed in a "whole-body" approach, while MRI was only performed in the neck region. This study focused on the neck area in order to be able to compare these two imaging modalities. Of course, a whole-body approach with PET/MRI allowing for a direct modality comparison with PET/CT would have been desirable, but was not feasible in our department. In addition, most of the selected patients were negative for active disease. Given the lower rate of positive findings, the sensitivity and specificity values were possibly subject to selection bias and the excellent sensitivity, specificity, and overall accuracy approaching 100 % may not hold up in larger samples. However, the number of patients with suspected recurrent or metastatic TC, negative US, and negative I-WBS is naturally limited. Nonetheless, the size of our sample (n = 46) is close to the calculated minimum sample size required for paired comparison (n = 54), indicating increased probability for Type I Error (α) of 0.05, a Power (1-β) of 0.9, and an expected sensitivity or specificity in group 1 of 0.95 and in group 2 of 0.8. Finally, the retrospective study design is unable to truly assess the impact of MRI and FDG-PET/ldCT on patient management, as clinical decisions were based upon the original reports of these scans and the review in multidisciplinary tumor board, not the retrospective consensus review.

Conclusions

Our study results clearly demonstrate a synergistic effect for combining FDG-PET/ldCT and MRI in detecting locally recurrent TC and cervical nodal metastases in high-risk patients with negative ultrasound and I-WBS. Imaging results notably influenced patient management. Therefore, both complementary imaging modalities should be used together in this specific clinical setting. Finally, a multidisciplinary consensus reading between a nuclear medicine physician and a radiologist is a crucial element in this diagnostic approach.