Background

Thyroid-associated ophthalmopathy (TAO) is an autoimmune disorder. As one of the most vexing problems in endocrinology, it is associated with Graves' disease and can seriously decrease the patients’ quality of life. Almost all patients with Graves' disease have this condition and the majority of them have thyroid involvement [1]. Usually, TAO has discriminable appearance such as exophthalmos, periorbital edema, and eyelid congestion. The etiology of TAO is not completely understood. Known risk factors of TAO include stress, infectious agents, iodine, cigarette smoking, and genes affecting immune function such as HLADR3, CTLA4, PTPN22, CD40, IL-2RA, FCRL3, PPARγ, and IL-23R, as well as genes encoding thyroid-specific proteins like TG [2, 3]. Peribulbar injection of anti-inflammatory drugs or orbital decompression surgery should be recommended for some cases with sight-threatening ocular findings [4, 5]. However, current therapeutic regimens cannot fully restore normal visual function and eye appearance.

According to the changes of thyroid function, TAO can be classified into three subtypes: hyperthyroidism, euthyroidism, and hypothyroidism. When a patient has an impaired thyroid function, spasticity disorders can cause symptoms such as dry eyes, corneal rupture, and periorbital edema. These symptoms can occur in both hyperthyroidism and hypothyroidism cases, but most are related to excess thyroid hormone. Differently, patients with hypothyroidism are more prone to have periorbital edema, while hyperthyroidism cases (as well as those exposed to overdose of thyroid hormone) generally have a staring appearance and retracted eyelids. When thyroid hormone levels increase, patients with Graves’ disease (an autoimmune hyperthyroidism disease) frequently experience cramps [6]. This may be related to the effect of excessive thyroid hormone on the sympathetic nervous system, especially catecholamine excitement (epinephrine or norepinephrine); also, it can be directly caused by anti-thyrotropin receptor antibodies (TRAb) [7]. A lack of thyroid hormone can cause a variety of blood and lymphatic circulation disorders. In particular, fluid retention in the skin causes edema symptoms. However, hyperthyroidism patients may turn to hypothyroidism, and this is frequently accompanied by symptoms (or appearance) of TAO, especially periorbital edema. If thyroid hormone levels return to normal after treatment, these symptoms may be regressed. However, TAO patients with normal thyroid function may also show exophthalmos symptoms due to a previous history of hyperthyroidism which stretches their eyelid muscle fibers. Together, an interesting issue is whether the eye appearance characteristics have diagnostic values in estimating the subtypes of TAO.

According to the EUGOGO standard, TAO can be graded into three severity levels (mild, moderate, and severe) and two stages (active and inactive). Disease severity is the key determinant of indication for therapy; and clinically, a challenge is to recognize the active or inflammatory stage [8]. Similar to thyroid function subtypes, eye appearance or ophthalmic images can present a huge difference between cohorts with different severity levels and grades. For example, orbital positron emission tomography/computed tomography imaging findings can provide references in detecting and grading TAO [9]. Also, amounts of case reports have demonstrated the appearance features of severe or active-stage TAO patients [10,11,12].

Collectively, in general clinical practice, there are sufficient ophthalmic findings and demographic information, many ophthalmic images can be easily acquired (in a non-invasive and timely manner), which may be informative in auxiliary diagnosis of TAO in aspects of the subtypes (different thyroid function changes), stages, and severity levels. Although there have been different biomarkers for TAO identification [13, 14], it is worth to dig the clinical significance of demographic characteristics and ophthalmologic diagram features in TAO diagnosis and stage/severity evaluation before detection of biochemical indicators. In this context, we conducted a retrospective study based on more than 1000 medical records (953 patients), and several useful regression models were generated for further machine-learning assisted diagnosis of TAO.

Results

General information of enrolled subjects

Together, we analyzed the first medical records of 953 cases, including 320 males and 633 females. This sex structure implied that females are more likely to develop TAO than males, which is consistent with known data [7, 15]. The average age was 41.75 ± 13.75 (ranging from 12 to 82). The information of thyroid function types, stages and severity levels is presented in Table 1. Major of TAO patients had hyperthyroidism, and most of them were in the inactive stage and at the moderate level.

Table 1 General information of enrolled subjects (n = 953)

Subtype, stage, and severity-associated factors

First, the thyroid function type-associated factors were analyzed. As Table 2 shows, there was a sex difference in thyroid function type distribution. A higher proportion of males showed normal thyroid function than females (p < 0.01). Hypopsia was associated with the thyroid function type (but interestingly, only hypopsia of the left eye was significant), that the euthyroidism group had lower percentage of left hypopsia (p < 0.01). Also, right eyelid congestion (but not left) showed a correlation with the thyroid function type. Similarly, the euthyroidism group had a significantly lower ratio of right eyelid congestion. Further, these patients with a normal thyroid function had better performance in the best corrected visual acuity of the left eye (but not the right eye), and they had highly significantly less extraocular muscle thickening than the hyperthyroidism and hypothyroidism groups (p < 0.01 in all dimensions of extraocular muscle thickening). These parameters were comparable between subgroups with abnormal thyroid function.

Table 2 The thyroid function type-associated factors

Next, TAO stage-associated factors are presented in Table 3. Males had more cases in the active stage than females, and an elder age was correlated with the active stage. Besides, patients in different stages had distinct features of eyeball pain (both sides), hypopsia (both sides), eyelid congestion (both sides), conjunctival congestion (both sides), corneal ulcer (both sides), ocular motility disorder (both sides), best corrected visual acuity (both sides), and extraocular muscle thickening (all dimensions except the upper and lower extraocular muscle of the right eye). The inactive stage was associated with weaker symptoms of hypopsia, eyelid congestion, conjunctival congestion, corneal ulcer, ocular motility disorder, and better performance in the best corrected visual acuity. However, patients in the inactive stage had a higher extent of extraocular muscle thickening versus those in the active stage.

Table 3 The TAO stage-associated factors

Finally, the severity-associated factors are listed in Table 4. Similar to the TAO stage, the severity level was significantly associated with eyeball pain (both sides), hypopsia (both sides), eyelid congestion (both sides), conjunctival congestion (both sides), corneal ulcer (right eye), ocular motility disorder (both sides), best corrected visual acuity (both sides), and extraocular muscle thickening (all dimensions).

Table 4 The severity-associated factors

Together, these features have important values in TAO recognition and evaluation.

Logistic regression in prediction of subtype, stage and severity

Based on above association analysis, all significant auxiliary factors were collected, and the cases with any lacking record of subtype, stage or severity were deleted. Hence, a dataset containing 922 cases was generated, with definite results of the thyroid function type, TAO stage and severity level. Preliminarily, we compared different models according to AUC or accuracy of candidate models and selected the logistic regression model, because it has a satisfactory performance, and the formula of logistic regression is simple and can be drawn out and verified by SPSS. Three models were first established and then conducted fine-tuning to maximize the AUC or accuracy using Pycaret. Three optimized models are presented in Fig. 1. First, the three-end logistic regression of subtype had a good AUC especially in the micro-average ROC curve (AUC = 0.94) (Fig. 1A), with an average precision of 0.87 across different recall levels (Fig. 1B). The matrix of predicted and true types is presented in Fig. 1C. In the two-end logistic regression of TAO stages, the overall AUC was 0.84 and in the micro-average ROC curve the AUC was 0.88 (Fig. 1D). In the binary precision–recall curve, the average precision was 0.92 across different recall levels (Fig. 1E). As the matrix of predicted and true types shown, this model had an overall accuracy of 0.8 (Fig. 1F), with a recommended discrimination threshold of 0.27 (Fig. 1G). The three-end logistic regression of severity exhibited an ACU of 0.94 in the micro-average ROC curve (Fig. 1H). In this model, the average precision was 0.86 (Fig. 1I). This model had an accuracy of 0.82 as the matrix of predicted and true types presented (Fig. 1J). Together, these tuned models had a satisfactory efficacy in diagnosis of TAO and evaluating the stage and severity. Moreover, when hyperthyroidism and hypothyroidism were combined into one group (abnormal thyroid function), we used the significant factors (as probed in Table 2) and established a logistic regression for prediction of abnormal thyroid function (Table 5), in which female, elder age, evoked eyeball pain of the left eye, and eyelid congestion were risk factors deserving attention.

Fig. 1
figure 1

Three logistic regression models for predicting the subtype, stage and severity. These three models were first established and then conducted fine-tuning to maximum the AUC or accuracy using Pycaret. A The ROC curve of the thyroid function subtype model (for recognition of hyperthyroidism, euthyroidism, and hypothyroidism). B The precision–recall curve for the thyroid function subtype model, with an average precision of 0.87 across different recall levels. C The matrix of predicted types and true types in this model. D The ROC curve of the two-end logistic regression model of TAO stages, the overall AUC was 0.84 and in the micro-average ROC curve the AUC was 0.88. E The precision–recall curve of staging model, with an average precision of 0.92 across different recall levels. F The matrix of predicted types and true types, showing an overall accuracy of 0.8. G The discrimination threshold plot of this staging model. H The three-end logistic regression model of severity, with an AUC of 0.94 in the micro-average ROC curve. I The precision–recall curve of this severity model, the average precision was 0.86. J The matrix of predicted cases and true cases, with an accuracy of 0.82

Table 5 Logistic regression in prediction of abnormal thyroid function

Discussion

Roughly, the pathogenesis of TAO includes three main phenomena: inflammation of the periorbital soft tissues, overproduction of glycosaminoglycans by orbital fibroblasts, and hyperplasia of adipose tissue. The proliferation of orbital and perimysium fibroblasts produce collagen and glycosaminoglycans in the extracellular matrix. As a consequence, the extraocular muscles swell dramatically [16, 17]. Therefore, many known features can be used in warning of TAO. There have been are imaging studies for diagnosing TAO using CT, magnetic resonance imaging (MRI), ultrasonography (US), and color Doppler imaging (CDI) [18]. The evaluation of extraocular muscle using diffusion-weighted imaging can help detect TAO development [19]. Conjunctival and episcleral inflammation in the extraocular muscles may represent a presenting sign of TAO [20]. In the aspect of severity, some demographic factors have been reported. A British cohort study demonstrated that lower social grade and higher social deprivation, but not ethnicity, had independent, statistically significant association with more severe TAO [21]. Turkish scholars reported that male gender was found as an independent risk factor for severity of TAO [22]. Moreover, an interesting indicator, the ratio of orbital fat to total orbit area, is a useful diagnostic index in mild-to-moderate TAO [23]. Although above studies have revealed the consistent clinical characteristics with the present, very few effective models (focusing on subtypes, stages, and severity levels) have been established using above features.

In the present study, we found that TAO subtype, stage and severity can be predicted by demographic factors including age and gender, symptoms from complains such as eyeball pain and hypopsia, and eye-photo features including eyelid congestion, conjunctival congestion, corneal ulcer, ocular motility disorder, best corrected visual acuity, and extraocular muscle thickening. Our findings are mainly consistent with the consensus of TAO changes, and this work is the first one that combined all associated features and established three models in TAO diagnosis/grading.

In our results, there are some interesting findings never noticed previously. For example, there are side differences in features associated with three TAO outcomes. Hypopsia and best corrected visual acuity in left eye (but not right eye) were associated with thyroid function type. This may be due to a slight limitation of the sample size, for the p values of the right eye just exceeded 0.05 in both hypopsia and best corrected visual acuity. Again, among TAO stage associated factors, upper extraocular muscle thickening in the left eye was less in the active group (p = 0.044), but this was not noticed in the right eye (p = 0.098). However, the significance of these two indices were both still around 0.05, which implied a weaker indicating effect of the upper extraocular muscle thickening (in comparison with other directions). Moreover, corneal ulcer in the right eye was associated with a severer TAO level, but this trend was not observed in the left eye. In summary, there may be indeed an asymmetry in the indicating roles of ophthalmic symptoms or image features in evaluation of TAO development. However, this asymmetry is to be further confirmed in multi-center observations.

This study has some limitations. For a lack of follow-up data, we mainly retrospectively analyzed the value of ophthalmologic diagrams in the diagnostic period. For this cohort, almost all patients received the orbital decompression treatment. However, no mid-term or long-term follow-up was conducted, hence the prognostic roles of these image features are still unclear. Besides, the relationships between detailed thyroid function indices (such as T3, T4, TSH, TSHR, TRAb, and TSI) and symptoms/image features are not involved in this study (e.g., no linear regression analysis targeting these blood indices has been performed), which restricts the significance of above selected features. Additionally, we found some right/left-side differences in association with the thyroid function, activity, and severity of TAO, and this far, it is still difficult to understand the side influence when predicting these features. But this intriguing finding merits further confirmation and exploration. In addition, the overall AUC and accuracy of our models are still not above 0.9. We have also attempted other models with a little higher AUC or accuracy, such as the Random Forest Classifier, Gradient Boosting Classifier and CatBoost Classifier. All these models cannot provide an ideal prediction, which suggests that more features are needed besides these ophthalmologic diagrams. More potential non-invasive indicators are to be discovered for auxiliary diagnosis and earl evaluation. Finally, microscopic confirmation and proteomic testing can be confirmatory of our conclusion; however, we haven’t collected enough data about the microscopic and proteomic results. The further study can add these data as a support.

Conclusions

TAO subtype, stage and severity can be predicted by demographic factors including age and gender, symptoms from complains such as eyeball pain and hypopsia, and image features including eyelid congestion, conjunctival congestion, corneal ulcer, ocular motility disorder, best corrected visual acuity, and extraocular muscle thickening. These non-invasive indices are worthy of being collected and applied in a timely manner in clinical practice for TAO detection.

Methods

Study population

A total of 953 diagnosed TAO cases admitted in our hospital from 2013 to 2018 were included. The inclusion criteria were as follows: (1) the basic demographic information (such as gender and age) and ophthalmology symptoms (such as eyeball pain, hypopsia, eyelid congestion, ocular motility disorder, upper eyelid late fall, etc.) were recorded; (2) any type of ophthalmology diagrams was documented, including eye appearance or CT image. When a patient had visited more than 1 time, the data of first visit were used. The best corrected visual acuity of both eyes was recorded if available.

With the help of an image recognition artificial intelligence system, following features of both eyes were extracted by two experienced ophthalmologists based on the morphological images: eyelid congestion, conjunctival congestion, corneal ulcer, extraocular muscle thickening (including medial, lateral, upper and lower). The outcomes of these features were first generated by the recognition system and then independently validated by the ophthalmologists. When the opinions of two ophthalmologists differed, they should finally reach an agreement through discussion.

In the aspect of thyroid function, patients were divided into three types according to the diagnosed subtypes: hyperthyroidism, euthyroidism, and hypothyroidism. Further, the combined cohort of hyperthyroidism and hypothyroidism was regarded as the abnormal thyroid function group. Also, different TAO stages were labeled, including the active stage and the inactive stage. Specifically, according to the Clinical Activity Score (CAS) recommended by EUGOGO, a scale with 7 points (each item = 1 point) was used to evaluate the stage: (1) spontaneous eyeball pain; (2) pain on the eyeball or posterior eyeball induced by eyeball rotation; (3) eyelid hyperemia; (4) conjunctival congestion; (5) eyelid edema; (6) bulbar conjunctival edema; (7) inflammation in tear caruncle or fold. A CAS score ≥ 3 points refers to an active stage. The severity was assessed according to EUGOGO standard: Mild (mild eyelid contracture < 2 mm, mild soft tissue involvement, exophthalmos < 3 mm, temporary or no diplopia, and symptoms of corneal exposure are effective for eye moisturizers), moderate (moderate-to-severe, eyelid contracture ≥ 2 mm, moderate or severe soft tissue involvement, eyeball ≥ 3 mm, intermittent or continuous diplopia, mild corneal exposure), and severe (sight-threatening, with thyroid dysfunction, neuropathy and corneal damage).

Statistical analysis

Categorical data were described by percentages and compared by χ2 test; numeric variables were expressed as mean ± standard deviation (SD) and compared by t-test (between two groups) or one-way ANOVA (among three groups). All data comparison were two-sided, and a p value less than 0.05 was regarded as statistically significant. We mainly focused on three diagnostic outcomes: thyroid function types, TAO stages, and the severity levels. The data were analyzed by SPSS (version 22.0) and the Pycaret python tool (pycaret.org). First, the association between diagnostic outcomes and demographic factors, symptoms and images features were probed by SPSS. The associated factors and each target (outcome label) were further analyzed through Pycaret. Any patient was deleted if this case had an unclear out of thyroid function type, TAO stage, or the severity level. The Pycaret tool preliminarily compared the area under the curve (AUC) of receiver operating characteristic curve (ROC) or accuracy of candidate models (such as random forest classifier, ADA boost classifier, SVM-linear kernel, decision tree classifier, K-Neighbors classifier, and logistic regression). We found that, overall, the logistic regression had a satisfactory power. Besides, the conclusion of the logistic regression can be verified by SPSS and its formula can be clearly drawn out. Therefore, we applied the logistic regression model to show the predictive roles of selected features. Three models were first established and then conducted fine-tuning to maximum the AUC or accuracy. Besides, SPSS was used to establish another logistic regression model of whether a case was in the abnormal thyroid function group based on several thyroid function-associated factors.