Introduction

Thyroid-associated ophthalmopathy (TAO) is an autoimmune disorder that affects orbital soft tissues, such as extraocular muscles (EOMs), lacrimal glands (LGs), and intraorbital fat (IF) [1]. Patients with TAO usually experience exophthalmos, eyelid retraction, and diplopia, decreasing quality of life [2, 3]. The natural process of TAO can be divided into two stages: the active stage, which involves inflammatory edema; and the inactive stage, which primarily involves fibrosis and fatty degeneration [4]. The first-line treatment for patients in the active phase is immunosuppressive (e.g., a high dose of intravenous glucocorticoids). By contrast, surgical decompression is usually suggested for patients in the inactive phase [5]. Therefore, it is important to accurately and promptly distinguish between the active and inactive phases for patients with TAO.

The semiquantitative clinical activity score (CAS) is widely used to assess the activity of TAO and predict the response to immunosuppressive treatment [6]. However, the shortcoming of this seven-point scale is its high dependence on the operator’s experience. Moreover, individual muscle involvement cannot be assessed using the CAS alone. Magnetic resonance imaging (MRI), especially fat-suppressed T2-weighted imaging (FS-T2WI), has been widely used to evaluate patients with TAO [7]. Previous studies have indicated that the signal intensity ratios (SIRs) of EOMs, LGs, or IF alone could assist in TAO staging. Higashiyama et al reported that the SIRs of IF and EOMs obtained via FS-T2WI correlated significantly and positively with CAS [8]. Hu et al reported that the SIR of LG on FS-T2WI is a potential imaging biomarker for staging TAO [9]. However, most previous studies have focused on a single structure, and studies combining information on EOMs, LGs, and IF for staging TAO remain scarce.

Conventional FS-T2WI is mainly based on inversion recovery or spectral presaturation, which are prone to imaging artifacts due to magnetic field inhomogeneity at the tissue–air interface between the sinuses and orbit. Severe artifacts can affect the display of EOMs (especially the inferior rectus muscle) and influence staging efficacy [10]. Dixon MRI is a fat-suppressed technique that assesses chemical shift analysis and can directly differentiate fat from water. The superiority of the Dixon technique to conventional inversion recovery or spectral presaturation in terms of overall image quality and FS uniformity has been fully reported [11,12,13]. However, few studies have been conducted using Dixon MRI to quantitatively assess and integrate data from EOMs, IF, and LGs to stage TAO patients.

Therefore, in this study, we explored the combined value of the quantitative parameters of EOMs, IF, and LGs derived from Dixon MR images for staging patients with TAO.

Materials and methods

Patients

This single-center retrospective study was approved by the institutional review board of the First Affiliated Hospital of Nanjing Medical University (Nanjing, China). The requirement for informed consent was waived due to the study’s retrospective nature. All radiological and clinical data were anonymized before analysis. Patients were enrolled from January 2018 to December 2022 according to the following inclusion criteria: (1) fulfilled the criteria of the European Group on Graves’ Orbitopathy (EUGOGO) for diagnosing TAO; (2) included Dixon T2WI in the pretreatment orbital MRI scan; (3) had no history of steroid treatment, radiotherapy, or surgical decompression; and (4) had no other orbital disorders. We identified 215 consecutive patients with TAO in our hospital. Fifteen patients were excluded due to insufficient image quality for further analysis. Finally, a total of 200 patients (121 females; 46.0 ± 13.9 years of age) were included in this study and were divided into training and validation cohorts at a ratio of 8:2 according to the chronological order in which they underwent MR scans. The flowchart of the patient enrollment process is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of patient enrollment and scheme for analysis. TAO, thyroid-associated ophthalmopathy; EUGOGO, European Group on Graves’ Orbitopathy

Clinical assessment

Disease activity was assessed for each eye according to the modified seven-point formulation of Mourits’ CAS, which includes the following: (1) spontaneous retrobulbar pain; (2) pain on attempted up or down gaze; (3) redness of the eyelids; (4) redness of the conjunctiva; (5) swelling of the eyelids; (6) inflammation of the caruncle and/or plica; and (7) conjunctival edema [14]. Patients with a CAS of ≥ 3 were enrolled in the active group; otherwise, they were enrolled in the inactive group.

Image acquisition

All patients were examined using a 3.0-T MRI system (Magnetom Skyra; Siemens Healthcare, Erlangen, Germany) with a 20-channel head coil. The detailed parameters of two-point Dixon T2WI were as follows: repetition time/echo time, 4000/87 ms; field of view, 180 mm; matrix, 179 × 256; section thickness, 3.5 mm; number of excitations, 2; number of sections, 18; and acquisition time (min:s), 2:18.

Image analysis

All the quantitative parameters of EOMs, LGs, and IF were measured in the unit of each eye. The detailed process was as follows:

  1. 1.

    SIR of EOMs, LGs, and IF to the ipsilateral temporal muscle: three consecutive sections behind the eyeball representing the largest area of the muscle bellies were chosen from coronal water images obtained by Dixon MRI. Polygonal regions of interest (ROIs) were outlined on the superior, inferior, medial, and lateral EOMs using ITK-SNAP software (Fig. 2). Other polygonal ROIs were outlined in two consecutive sections showing the largest slices of the LGs and IF (Fig. 2). The maximum, mean, and minimum signal intensities (SImax/mean/min) of the EOMs, IF, and LGs were extracted from PyRadiomics. Moreover, the SI of the ipsilateral temporal muscle was measured using a round ROI of 5–10 mm2 using coronal water images obtained by Dixon MRI (Fig. 2). The SIRs of the EOM (EOM-SIR), LG (LG-SIR), and IF (IF-SIR) were calculated using the following formula: SIRmin/mean/max = SImin/mean/max/SIipsilateral temporal muscle.

    Fig. 2
    figure 2

    Schematic diagrams showing the methods used to measure the quantitative parameters of EOMs, LGs, and IF using Dixon MRI. T2 Dixon water image (a, d, g), QFFI (b, e, h), and QWFI (c, f, i) of a 54-year-old female with active TAO. ac Quantitative measurements of SIR, FF, and WF in the EOM. a A circular ROI (red, 5–10 mm2) was placed in the ipsilateral temporal muscle. df Quantitative IF measurements of the SIR, FF, and WF. gi Quantitative measurements of the SIR, FF, and WF in the LGs. TAO, thyroid-associated ophthalmopathy; QFFI, quantitative fat fraction image; QWFI, quantitative water fraction image; SIR, signal intensity ratio; FF, fat fraction; WF, water fraction; EOMs, extraocular muscle; IF, intraorbital fat; LG, lacrimal gland

  2. 2.

    The water fraction (WF) and fat fraction (FF) of the EOMs, LGs, and IF were calculated as follows: quantitative water fraction images (QWFI) and quantitative fat fraction images (QFFI) were calculated using water and fat images obtained by Dixon MRI in MATLAB software according to the following formula: QWFI = SIwater images/(SIwater images + SIfat images); QFFI = SIfat images/(SIwater images + SIfat images). The abovementioned polygonal ROIs used in the SI measurements were copied into the QWFI and QFFI (Fig. 2). Then, the WF and FF of the EOMs (EOM-WF/FFmin/mean/max), LGs (LG-WF/FFmin/mean/max), and IF (IF-WF/FFmin/mean/max) were obtained by PyRadiomics.

Two radiologists (with 2 and 5 years of experience in neuroradiology) blinded to the study design and clinical information manually and independently selected the ROIs. The measurement results of the two radiologists were used to assess interobserver agreement, and the average value was adopted for further statistical analyses.

Statistical analyses

The Kolmogorov‒Smirnov test was used to analyze whether the continuous variables were normally distributed. Normally distributed data are reported as the mean ± standard deviation. Otherwise, the data are reported as medians and interquartile ranges. Independent samples t tests (normally distributed) or Mann‒Whitney U tests (not normally distributed) were used to compare the continuous variables between the active and inactive groups or the training and validation cohorts. Differences in categorical variables between the two groups were compared using the chi-square test. Significant parameters were included in further binary logistic regression analysis to identify the independent parameters associated with the active stage. The goodness of fit of the logistic regression model was assessed using the Hosmer–Lemeshow test. Logistic regression was used to establish different diagnostic models according to the identified independent parameters. Receiver operating characteristic (ROC) curve analyses and DeLong tests were performed to evaluate and compare the efficiency of different models in staging TAO in both the training and validation cohorts. The interobserver agreement of the quantitative measurements was assessed using the intraclass correlation coefficient (ICC). The ICCs ranged from 0 to 1.00, with values closer to 1.00 indicating better reproducibility. The ICCs were categorized as follows: < 0.40, poor; 0.41–0.60, moderate; 0.61–0.80, good; and ≥ 0.81, excellent [15]. All statistical analyses were conducted using SPSS software (version 25.0; SPSS Inc., Chicago, IL, USA) and MedCalc software (version 18.2.1; MedCalc, Ostend, Belgium). A two-sided p value < 0.05 was considered to indicate significance.

Results

Clinical characteristics

Among the 200 enrolled patients (400 eyes), 211 eyes had active disease, and the other 189 had inactive disease. The training cohort comprised 160 patients (169 active and 151 inactive eyes), and the validation cohort comprised 40 patients (42 active and 38 inactive eyes). There were no significant differences in demographic or clinical characteristics between the training and validation cohorts (age: 46.1 ± 14.0 vs 45.5 ± 13.4, p = 0.755; sex: 60/100 vs 19/21, p = 0.102; CAS: 2.5 ± 1.4 vs 2.4 ± 1.1, p = 0.583) (Table 1).

Table 1 Comparison of patient characteristics between the training and validation cohorts

Comparisons of Dixon MRI-based quantitative parameters

The interreader reproducibility was good to excellent (ICC, 0.710–0.961) for all Dixon MRI-based quantitative parameters. No significant differences were found between the training and the validation cohorts in any of the Dixon MRI-based quantitative parameters (Table 1). In the training cohort, active TAOs showed significantly greater EOM-SIRmax (p < 0.001), EOM-SIRmean (p < 0.001), EOM-SIRmin (p < 0.001), IF-SIRmax (p < 0.001), IF-SIRmean (p < 0.001), LG-SIRmax (p < 0.001), LG-SIRmean (p = 0.004), EOM-WFmean (p < 0.001), EOM-WFmin (p < 0.001), IF-WFmax (p = 0.005), IF-WFmean (p < 0.001), and LG-WFmean (p = 0.030) values than did inactive TAOs (Table 2). Moreover, active TAOs demonstrated significantly lower EOM-FFmax (p < 0.001), EOM-FFmean (p < 0.001), IF-FFmean (p < 0.001), IF-FFmin (p = 0.011), and LG-FFmean (p = 0.030) values than inactive TAOs (Table 2).

Table 2 Comparison of Dixon MRI-based quantitative parameters between the active and inactive TAO groups in the training cohort

Logistic regression analysis

Binary logistic regression analysis indicated that the EOM-SIRmean (odds ratio [OR] = 18.187, β = 2.901, p < 0.001), LG-SIRmean (OR = 0.261, β = −1.341, p = 0.001), and LG-FFmean (OR = 0.015, β = −4.230, p = 0.003) values were independent predictors of active TAO. Representative patients with active and inactive TAO are presented in Fig. 3.

Fig. 3
figure 3

Representative cases of patients with active and inactive TAO. ac A 48-year-old man with active TAO and a bilateral CAS of 5. df A 50-year-old woman with inactive TAO and a bilateral CAS of 1. The EOM-SIRmean, LG-SIRmean, and LG-FFmean values were 2.865/3.407, 3.661/3.543, and 0.026/0.019, respectively, in the left/right orbit for patients with active TAO (ac) and 2.330/2.082, 2.183/2.002, and 0.392/0.487, respectively, in the left/right orbit for patients with inactive TAO (df)

ROC curve analysis

We established two staging models, model 1 (EOM-SIRmean alone) and model 2 (EOM-SIRmean + LG-SIRmean + LG-FFmean), according to the logistic regression analysis results. In the training cohort, the optimal performance in staging TAO patients was achieved with model 2, with an area under the curve (AUC) of 0.820, a sensitivity of 84.02%, a specificity of 66.89%, a positive predictive value (PPV) of 74.00%, and a negative predictive value (NPV) of 78.90%. The staging performance of model 2 was significantly better than that of model 1 (AUC, 0.793; sensitivity, 57.40%; specificity, 88.74%; PPV, 85.10%; NPV, 65.00%) (AUC, 0.820 vs 0.793, p = 0.016) (Fig. 4).

Fig. 4
figure 4

Receiver operating characteristic curves of significant parameters for discriminating active from inactive TAO patients in the training (a) and validation (b) cohorts

In the validation cohort, model 2 (AUC, 0.751; sensitivity, 71.43%; specificity, 78.95%; PPV, 78.90%; NPV, 71.40%) also showed relatively better performance than model 1 (AUC, 0.733; sensitivity, 64.29%; specificity, 76.32%; PPV, 75.00%; NPV, 65.90%), although the difference in the AUCs between the two models did not reach significance (AUC, 0.751 vs 0.733, p = 0.341) (Fig. 4).

Discussion

Our study revealed three main findings. First, all the quantitative parameters of EOMs, LGs, and IF based on Dixon MRI showed significant differences between patients with active and inactive TAO. These findings indicate that the EOMs, LGs, and IF demonstrate potential as target organs for staging TAO. Second, the EOM-SIRmean, LG-SIRmean, and LG-FFmean values were found to be independent predictors of active TAO. Third, compared with a single parameter based on EOMs, a combined model integrating the EOM-SIRmean, LG-SIRmean, and LG-FFmean values could further improve the performance in staging patients with TAO.

The involvement of EOMs is a known disease process in patients with TAO [16, 17]. In this study, we found that the SIRmin/mean/max values of EOMs were significantly greater in active TAOs than those in inactive TAOs, consistent with previous studies [18, 19]. In addition, using the Dixon MRI technique, our study indicated that active TAOs had higher water-related metrics (EOM-WFmean and EOM-WFmin) and lower fat-related parameters (EOM-FFmax and EOM-FFmean) than did inactive TAOs. Previous studies have indicated that the active phase of TAO is dominated by inflammatory responses, while the inactive phase of TAO is dominated by fibrosis, fatty infiltration, and collagen deposition [4, 20]. These mechanisms might explain the elevated water-related metrics in active TAOs and the increased fat-related metrics in inactive TAOs.

Increased orbital fat is another major characteristic of TAO [21]. Previously, Potgieser et al reported that a greater volume of orbital fat is associated with a longer duration of TAO [22]; however, they did not analyze the change in the signal intensity of orbital fat. In our study, the SIRmean/max, FFmean/min, and WFmean/max values of orbital fat differed significantly between active and inactive TAOs. Previous studies have revealed that orbital fat is histologically characterized by lymphocytic infiltration and edema due to the accumulation of hydrophilic interstitial glycosaminoglycans [23]. We suspect that this accumulation is potentially the mechanism underlying the increased SIR and WF values in patients with active TAO.

As LGs are another potential target organ, changes in LGs in patients with TAO have attracted increasing attention [24]. Gagliardo et al reported that patients with right and left active TAO demonstrated significantly greater herniation of the LGs on MRI than in those with inactive TAO [25]. Using the T2 mapping technique, Jiang et al reported that the T2 mapping values of LGs differed significantly between active and inactive TAO. Together with clinical indicators, the T2 mapping technique could effectively stage patients with TAO [26]. In addition, using the diffusion tensor imaging technique, Chen et al reported that the LGs of active TAO showed significantly lower fractional anisotropy and a higher apparent diffusion coefficient than those of inactive TAO [27]. In our study, similar to the change in EOMs, we found that the LGs of active TAOs had higher SIRmean/max and WFmean values and lower FFmean values. The abovementioned pathological changes in EOMs and IFs could help explain these findings. In addition, two LG-based parameters (LG-SIRmean and LG-FFmean) were found to be independently associated with TAO activity. Our results confirmed that the LGs are involved in the TAO process and deserve further study.

According to the binary logistic regression analysis, the EOM-SIRmean, LG-SIRmean, and LG-FFmean values were found to be independent predictors of active TAO. No IF-related metric was found to be an independent variable, possibly due to our study population’s specific sample size and constitution. Furthermore, we constructed a predictive model by integrating the LG-SIRmean and LG-FFmean on the basis of the EOM-SIRmean for staging patients with TAO. The ROC analysis results indicated that the combined model outperformed the EOM-SIRmean alone in both the training and validation cohorts. These results indicated that information on EOMs and other target organs (e.g., LGs and IF) should be integrated and analyzed for staging TAO. Further multicenter studies with larger sample sizes are needed to confirm our results and establish a more robust model for staging patients with TAO in clinical practice.

Our study has several limitations. First, this was a retrospective study from a single center. Further studies with larger study populations and external validation are needed to confirm the findings presented here. Second, the exact pathological state of orbital tissues remains unclear due to the difficulty in obtaining histological samples from patients with TAO, especially those with active disease. Future studies to determine the correlations between imaging metrics and histological changes are needed [28]. Third, this study focused only on the usefulness of the Dixon MRI sequence in staging TAO, and other functional MR sequences (e.g., diffusion or mapping sequences) were not simultaneously scanned. Further studies using machine learning methods to integrate more information from more functional sequences could further improve staging performance.

In conclusion, our study showed that the quantitative parameters of EOMs, LGs, and IF derived from Dixon MR images are useful for differentiating active from inactive TAOs. Integrating multiple parameters from EOMs, LGs, and IF could further improve TAO patient staging.