Introduction

Well before the outbreak of the COVID-19 pandemic, laboratory observations in patients with SARS-CoV-1 infection had revealed an association of coronavirus disease with distorted thyroid gland structure and function [1, 2]. The thyroid is rich in angiotensin-converting enzyme-2 (ACE-2), which some coronaviruses use as a vehicle for entering cells [3]. It, therefore, came as no surprise when Chen and colleagues reported that SARS-CoV-2 infection was also associated with deranged thyroid function tests [4]. This was subsequently confirmed by numerous investigators in Asian [5,6,7,8,9] and Western [10,11,12,13,14] populations. Although there have been reports of thyroiditis, often atypical, suggesting direct virus-induced or immunologically mediated damage to the gland [15, 16], in most cases the pattern of hormone abnormalities is compatible with the non-thyroidal illness syndrome (NTIS), also known as euthyroid sick syndrome (ESS), characterized by decreased levels of triiodothyronine (T3), normal or decreased levels of thyroxine (T4) and normal or decreased levels of thyrotropin (TSH) [5, 12, 13]. The changes are usually transient, and hormone levels return to normal after convalescence [12, 13]. The pathophysiological mechanisms underlying NTIS are not fully understood. It can occur in probably any serious illness, acute or chronic, and is associated with morbidity and mortality [17] In the setting of COVID-19, NTIS appears to reflect the effect of the “cytokine storm” on the hypothalamus-pituitary-thyroid axis at central (TSH suppression) and peripheral (deiodinase inhibition) levels [18]. The question of whether low TSH is causatively related to worse COVID-19 prognosis remains open [19]. A number of studies have shown that decreased levels of TSH and/or free T3 (FT3) or total T3 (TT3) are associated with unfavorable outcome in COVID-19 patients [6, 8, 9, 20]. However, the prognostic utility (i.e., the ability to discriminate between patients with good or bad prognosis) of thyroid function tests in COVID-19 has not been established. The objectives of the present study were: (a) to examine whether low serum TSH is a risk factor for adverse outcome in Greek patients with COVID-19, and (b) to evaluate the prognostic value of serum TSH and compare it with that of a number of biomarkers arbitrarily selected among those commonly used for assessment of disease severity.

Methods

This was a retrospective observational study involving statistical analysis of routinely collected patient data which were fully anonymized prior to analysis. The study was approved by our institution’s research and ethics committees (Decision No.: 6011/22-3-2022) and was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments.

Patients

We reviewed the records of all patients (n = 213) admitted to the COVID-19 unit of the Department of Internal Medicine of a tertiary public general hospital in Athens, Greece, from February to December 2021. In all cases, the diagnosis of SARS-CoV-2 infection was confirmed by polymerase chain reaction. The main admission criterion was hypoxic respiratory failure requiring supplemental oxygen. Serum levels of TSH were routinely measured at admission as screening for thyroid disease. Serum T4 and T3 measurement was at the discretion of the treating physician. Patients on thyroid hormone replacement treatment or with a known history of thyroid disease, pregnant patients, and patients taking amiodarone, somatostatin analogs, and dopamine agonists were excluded from the analysis. Also excluded were those with missing results of one or more of the laboratory indices under investigation (C-reactive protein [CRP], ferritin, D-dimers, albumin). Patients who had taken low-dose glucocorticoids (the equivalent of <30 mg prednisone daily) before admission were not excluded (see Online Resourse 1). The final study cohort consisted of 128 patients with a median age of 64 years (range 21–91), 68.5% of whom were male.

Laboratory assays

Serum TSH, albumin, CRP, ferritin, and D-dimers were measured as part of the routine laboratory work-up in all patients at the time of admission. TSH, free T4 (FT4), FT3, and T3 were measured on a Cobas e601 analyzer (Roche Diagnostics) using an electrochemiluminescence immunoassay. Albumin and CRP were measured by an immunoturbidimetric assay on a Cobas c501 analyzer (Roche Diagnostics). Ferritin was measured on an ADVIA Centaur analyzer (Siemens Healthcare Diagnostics) using a direct chemiluminescence 2-point sandwich immunoassay. D-dimers were measured with an immunoturbidimetric assay on a BCS XP Coagulation System (Siemens Healthcare Diagnostics).

Statistical analysis

IBM SPSS version 27.0 was used for data analysis, including correlations, comparisons of means and proportions with parametric and nonparametric methods as appropriate, multivariate logistic regression, and Receiver Operating Characteristics (ROC) curve analysis. Patient outcomes were classified as “favorable” (discharge without intubation during hospitalization) and “adverse” (intubation or in-hospital death of any cause). The optimal dichotomous cut-off values of continuous numeric variables (age, TSH, CRP, ferritin, D-dimers, albumin) being evaluated as prognostic factors were calculated from ROC curve analysis using the Juden’s index maximization rule.

Artificial Neural Network (ANN) analysis

The multilayer perceptron neural network module in IBM SPSS (version 27.0) was used to build, train, and test the relationship between the outcome and the independent covariates (sex, age, TSH, CRP, ferritin, D-dimers, albumin) constituting the input layer. The model used has a hidden layer with three units (neurons), excluding the bias unit. The activation function in the hidden layer was selected to be a hyperbolic tangent. The output layer of the network has one dependent variable with two units (adverse outcome, favorable outcome). Softmax was used as the activation function, while the cross-entropy error function was chosen in the output layer.

Machine learning classification algorithms

The Waikato Environment for Knowledge Analysis (WEKA) is an open-source data mining software developed at the University of Waikato, New Zealand [21]. It includes a number of popular classification algorithms which we used to build, train, and test models of COVID-19 outcome prediction based on combinations of the demographic (age, sex) and laboratory (TSH, CRP, ferritin, D-dimers, albumin) variables of our dataset. Besides overall accuracy (the percentage of correctly classified cases), specific quantitative indices considered for the selection of the best classifier were:

  1. a.

    True Positive Rate (TPR) for each class.

  2. b.

    False Positive Rate (FPR) for each class.

  3. c.

    Precision. A proportion calculated as the number of cases that are truly of a class, divided by the number of cases classified as that class. Equivalent to Positive Predictive Value.

  4. d.

    Recall. The fraction of cases correctly predicted as belonging to a class. Equal to TPR.

  5. e.

    F-measure. A combined measure reflecting the balance between precision and recall, calculated as: 2 * Precision * Recall/(Precision + Recall).

  6. f.

    AUC. The area under the ROC (Receiver Operating Characteristic) curve.

Results

The basic demographic and clinical features of the patients are shown in Online Resource 1. Mean serum TSH in the study cohort was 0.93 mIU/L. TSH levels were significantly lower in male patients (0.84 mIU/L versus 1.13 mIU/L in females, P = 0.045). In the subsets of patients who had FT4 (N = 75), FT3 (N = 38), or T3 (N = 30) measurements, there was no correlation between TSH and thyroid hormone (FT4, FT3, T3) levels. For a detailed bivariate correlations matrix, see Online Resource 2. Patients of both sexes with adverse outcomes had significantly lower TSH levels compared to those with favorable outcomes (Fig. 1). Levels of CRP, ferritin, D-dimers, and albumin also differed between the two outcome groups, with varying degrees of statistical significance (Table 1). There was a weak positive correlation of TSH with serum albumin (Spearman’s Rho = 0.188, P = 0.034) but no significant correlation of TSH with the other laboratory indices of disease severity (Online Resource 2).

Fig. 1
figure 1

Boxplots of serum TSH levels of patients by sex and outcome. Differences between favorable and adverse outcome groups were significant in both sexes by the two-tailed Mann-Whitney U test. Created with SPSS v.27.0

Table 1 Comparison of variable levels by outcome (two-tailed Mann-Whitney U test)

ROC curve analysis

ROC curve analysis for adverse outcome prediction gave an AUC of 69.5% for TSH, compared to 86.9% for albumin (Fig. 2). The optimal cut-off value of serum TSH for discrimination between adverse and favorable outcomes was 0.5 mIU/L. Serum TSH below the optimal cut-off value was associated with an odds ratio of 4.13 (95% C.I.: 1.41–12.05) for adverse outcome. Detailed predictive performance characteristics of serum TSH and its comparison with other variables are shown in Table 2. TSH had the highest specificity in adverse outcome prediction (78.3%) but a low sensitivity (55.6%) and low positive predictive value (PPV = 58.1%). Serum albumin had a high sensitivity (93.3%) but low specificity (67.5%) and PPV (60.9%).

Fig. 2
figure 2

TSH and Albumin ROC curves for prediction of adverse outcome. The respective AUCs were 69.5% (95% C.I.: 60–79%) and 86.9% (95% C.I.: 80.4–93.4%). Created with SPSS v.27.0

Table 2 Analysis of ROC curves for predicting adverse outcome. Performance characteristics of individual variables at the optimal cut-off points

Binary logistic regression analysis

Multivariate binary logistic regression with sex, age, TSH, albumin, CRP, ferritin, and D-dimers as independent covariates and disease outcome (adverse—favorable) as dependent variable showed that only albumin (P < 0.001) and TSH (P = 0.006) were significantly associated with outcome (Table 3). The detailed SPSS logistic regression output can be found in the Online Resource 3a,b. The precision of this model was 82.9% for predicted adverse outcomes and 87.1% for predicted favorable outcomes. The probability equation of a simpler logistic regression model using only the statistically significant variables (albumin and TSH) was:

$$P = \frac{1}{{1 + e^{\left( {3.3 \ast ALBUMIN + 1.2 \ast TSH - 11.4} \right)}}}$$

where P > 0.5 predicts an adverse outcome with a precision of 74.4%, while P < 0.5 predicts a favorable outcome with 84.7% precision.

Table 3 Results of binary logistic regression analysis for adverse outcome prediction

Multilayer perceptron analysis

The multilayer perceptron model was trained with 69.5% of the cases (N = 89), while the remaining 30.5% (N = 39) were set aside for testing the performance of the model in outcome prediction. No data were excluded. The overall percentage of correct predictions was 83.1% (85.7% in the “favorable outcome” class, 78.8% in the “adverse outcome” class) in the training sample and 87.2% (92.6% in the “favorable outcome” class, 75.0% in the “adverse outcome” class) in the testing sample. The AUC of the ROC curve reflecting the overall predictive performance of the model was 90.7% for both outcomes (adverse-favorable). The absolute and normalized (%) predictive importance of the individual independent variables included in the model is shown in Fig. 3.

Fig. 3
figure 3

Multilayer Perceptron Neural Network analysis: Comparative predictive importance of independent variables. Created with SPSS v.27.0

WEKA classifier analysis

The best classification performance was obtained by a Naïve Bayes classifier using albumin and TSH as independent attributes. It was trained on 80% of the cohort cases and had an overall precision of 96.2%. Predictive accuracy was high in both the “favorable outcome” and “adverse outcome” classes. Detailed performance indices are shown in Table 4. The inclusion of additional attributes (sex, age, CRP, ferritin, D-dimers), did not improve model performance. Removal of the TSH attribute resulted in a 16.6% reduction in overall predictive accuracy with corresponding reductions in all the specific performance indices of the model.

Table 4 Detailed accuracy of Naïve Bayes classifier by class

Discussion

In the original report on deranged thyroid function in COVID-19, Chen, Zhou, and Xu found that 56% of 50 Chinese patients with moderate to critical severity COVID-19 had significantly lower TSH levels compared to healthy controls and non-COVID-19 pneumonia patients of similar severity [4]. In 34% of their patients, low TSH was the only abnormal thyroid function test. The degree of decrease in TSH levels was positively correlated with COVID-19 severity. In a retrospective study of 150 Chinese COVID-19 patients with NTIS (defined as decreased serum T3), Gong et al. found that low TSH was independently associated with a hazard ratio of 2.78 for 90-day mortality [6]. Similar results have been reported by Ahn et al., who analyzed retrospectively 119 Korean COVID-19 patients and found that the degree of decline in TSH was significantly associated with disease severity [5]. Studies in Western populations have yielded similar results. In the United Kingdom, Khoo et al. observed that patients admitted with COVID-19 had reduced serum TSH compared to their baseline levels, a phenomenon not seen in control patients admitted for non-COVID illnesses. Among patients with COVID-19, those who needed admission to the Intensive Therapy Unit (ITU) had significantly lower median TSH [13]. In a prospective Italian study of 506 patients admitted with COVID-19 of mild clinical severity, Sparano et al. found that admission levels of TSH in patients with unfavorable outcomes (death or transfer to ITU) were significantly lower than those of patients with favorable outcomes (0.56 versus 0.94 mIU/L) [14]. In another prospective Italian study, Campi et al. found that 39% of COVID-19 patients treated in ITU or sub-ITU had decreases in TSH levels inversely correlated with CRP and Interleukin-6 (IL-6). However, only FT3 levels were predictive of mortality [12]. In contrast to the above reports, a retrospective Danish study of 116 COVID-19 patients by Clausen et al. did not find an association between serum TSH and clinical outcome. They did detect weak negative associations of TSH levels with a number of cytokines (IL-8, IL-10, IL-15, IP-10, GM-CSF) but, interestingly, not with IL-6, a cardinal marker of COVID-19 severity. In their study, low serum TSH (<0.4 mIU/L) was not associated with increased mortality [22]. Similarly, Beltrão et al. did not find an association between serum TSH and mortality or disease severity in a prospective, observational study of 245 Brazilian patients with COVID-19 [11] The specific nature of the association of COVID-19 with low TSH has also been challenged: In a prospective study of critically and non-critically ill Greek patients, Vassiliadi et al. found that serum TSH levels of COVID-19 patients were not significantly different from those of control non-COVID patients, suggesting the absence of a specific link between low TSH and SARS-CoV-2 infection [23].

Our results lend further support to the argument that low serum TSH is associated with adverse outcome of COVID-19, independently of other laboratory indices of disease severity. The cause of low TSH in our cohort can only be speculated upon. Thyrotoxicosis due to thyroiditis is unlikely, as there was no inverse correlation between TSH and thyroid hormones. The discordance between TSH and thyroid hormone levels suggests the presence of a deranged hypothalamus-pituitary-thyroid axis as part of NTIS. However, a direct effect of the virus on the pituitary or hypothalamus cannot be excluded since SARS-CoV-2 receptors are known to exist in these locations [24, 25]. Although exogenous glucocorticoids can suppress TSH secretion, they are unlikely to be responsible for the low TSH levels seen in our patients since very few of them had received steroids prior to admission (see Online Resource 1).

Early identification of hospitalized COVID-19 patients who are likely to have an adverse outcome is important for optimization of treatment, rational use of resources, and advancing our understanding of disease pathophysiology. Be it a specific feature of SARS-CoV-2 infection or merely a manifestation of NTIS, low serum TSH in COVID-19 appears to be independently associated with adverse prognosis. However, the prognostic accuracy of low TSH levels in COVID-19 inpatients has not been studied in detail. In the present study, we compared the prognostic utility of serum TSH to that of a number of widely available biomarkers routinely used for disease monitoring and risk assessment in COVID-19. Among them, CRP and ferritin reflect the intensity of the inflammatory reaction, while D-dimers are a measure of the vascular occlusive complications characteristically associated with this disease. Serum albumin is a negative acute phase reactant and also a powerful predictor of outcome in hospitalized patients irrespectively of underlying illness [26]. Using ROC curve analysis, we showed that, in common with other prognostic markers, including CRP, ferritin, and D-dimers, TSH has limited prognostic accuracy, with an AUC of only 69.5%. Low serum albumin, which has been reported to be associated with increased disease severity and mortality in COVID-19 [27, 28] was the most accurate predictor of adverse outcome in our study cohort, with an AUC of 86.9%. However, its low specificity (67.5% versus 78.3 for TSH) resulted in a poor positive predictive value of just 60.9%. We then proceeded to test the predictive performance of combinations of prognostic variables using a number of supervised machine-learning algorithms included in the WEKA platform. A Naïve Bayes classifier obtained the best results (AUC of 99.2%) with serum albumin and TSH levels as independent attributes. This was in agreement with the Perceptron ANN analysis, which indicated that albumin and TSH were the most important predictors of outcome, and also with the multivariate binary logistic analysis from which albumin and TSH emerged as the only statistically significant, independent predictive variables.

To our knowledge, this is the first report on the prognostic utility of TSH, including a face-to-face comparison with commonly used prognostic laboratory markers in COVID-19. The main limitations of our study are its retrospective design and the small sample size which did not permit subgroup analysis to look for a possible modifying effect of clinical variables like comorbidities. It must be also emphasized that the prediction model described in this study has not yet been externally validated and the findings in our cohort of hospitalized Greek COVID-19 patients cannot be generalized to different populations, e.g., primary care patients or those of different ethnic origin.

Conclusion

This study presents additional evidence in favor of the association between low serum TSH levels and adverse outcome of hospitalized patients with COVID-19. However, it also shows that, as is the case with other biomarkers of disease severity, measurement of serum TSH alone has limited prognostic utility. Integration of TSH levels into multivariable machine learning classification algorithms shows promise for outcome prediction with high accuracy and should be the subject of further studies.