Introduction

Thyroid nodules are not a rare entity; moreover, following the introduction of neck ultrasound and image-guided fine-needle aspiration (FNA), thyroid cancer became not also rare [1] but fortunately still has a good prognosis, especially if discovered early [2].

Ultrasound (US) is considered the primary imaging modality for evaluating thyroid nodules; however, it is a highly subjective method as judgment depends on the operator’s ability to describe specific features [3]. Therefore, it has undergone several technical improvements [4] until it reaches a central place in several risk stratification systems (RSS) to detect thyroid malignancy [5].

Improvements in the diagnostic workup for evaluating thyroid nodules is a crucial issue to inform decisions regarding the application of fine-needle aspiration (FNA) and to overcome the unnecessary treatment [4, 6].

Considering the nonspecific characteristics of thyroid nodules by the US, the decision of surgery is mainly dependent on the cytological evaluation, and therefore, the US evaluation is not emphasized in the diagnostic workup of thyroid nodules [7].

FNA provides cytology information able to differentiate benign from malignant tumors; however, the interpretation of FNA depends on the quality of the sample(s) and the pathologist's training [8].

The Thyroid Imaging Reporting and Data System classification (TI-RADS), inspired by the breast BI-RADS classification, could be used to classify nodules according to ultrasound criteria and determine the risk of malignancy [9]. Also, US elastography (USE) has been introduced in the clinical workup of thyroid nodules [10].

Recently, with the emerging artificial intelligence (AI), the US evaluation shows valuable enough information that could predict malignancy with a diagnostic accuracy near to cytological evaluation [11, 12].

Aim of the work

This study aimed to evaluate the diagnostic accuracy of US evaluation in predicting malignant thyroid nodules and evaluate the role of the elastography score, SR, and the TI-RADS scoring system as non-invasive scoring systems in differentiation between malignant and benign thyroid nodules.

Material and methods

Patients

A total of 1269 patients referred to our institute were consecutively enrolled in this study between February 2018 and April 2021. The reasons for referral were either symptomatic or incidentally discovered thyroid nodules. The same expert operator who was blinded to any clinical or cytopathological findings evaluated all patients. The institution's Research Ethics Committee approved the study (MED018) and obtained informed written consent from participating patients.

Thyroid ultrasound

Ultrasound examination of the neck was done using a Hitachi (Avius) machine (Hitachi Medical Corporation, Tokyo, Japan). Neck ultrasound examination was done with the patient lying in the supine position, with the neck slightly extended by placing a pillow under the patient's shoulders. The scanning protocol in our study included scanning of the thyroid gland in both transverse and longitudinal planes by brightness mode (B–mode), color-coded Doppler imaging (CCDI), power Doppler imaging (PDI), and real-time elastography and SR.

Image analysis

Ultrasound and power Doppler images were analyzed. All the clinical and pathological data were blinded to the ultrasound operator to give accurate results and avoid bias.

The thyroid nodules were evaluated according to their site (isthmus, right, or left lobe); number (multinodular goiter [MNG] or solitary thyroid nodule); echogenicity (hyperechoic, isoechoic, hypoechoic, markedly hypoechoic, heterogeneous, cystic, or complex cystic); borders (regular or irregular); the presence of calcification, and type of calcification (egg-shell, microcalcifications, or punctate); antero-posterior diameter/transverse ratio (“taller-than-wide,” ⩾ °1, and “wider-than-tall,” < 1); the presence of breakdown, and a surrounding halo.

Categorization of each nodule according to the European TI-RADS was from 1 to 5: TI-RADS1 = normal thyroid gland. TI-RADS2 = thyroid gland has a simple cyst, spongiform cyst, isolated macrocalcification, or diffuse hypoechoic enlarged thyroid gland. TI-RADS3 = has isoechoic or hyperechoic nodule and has no high suspicious US features. TIRAD4 = has moderately hypoechogenic nodule and has no high suspicious US features. TI-RADS 5 = has at least one of the high suspicious US features and/or adenopathy: irregular shape, irregular margins, microcalcification, and marked hypoechogenicity (and solid) [13].

Strain elastograms of nodules were qualitatively evaluated with a stepwise scoring system. According to the prevalent color in the nodule, they were evaluated based on the breast strain USE scale of Itoh et al., which includes four different patterns. Scores 1 and 2 are considered benign, while scores 3 and 4 are classified as suspicious for malignancy [14].

The semi-quantitative score of elastography was represented by the SR method. Two areas were manually selected: area A, representing the region of interest, and area B, representing the normal area. Area B was then divided by area A. For masses with a homogeneous elasticity pattern, area A was chosen from any region, but in those with heterogeneous patterns, area A was chosen to cover all heterogeneous areas as much as possible. Additionally, multiple measures of SR were taken, and the median of these measures was recorded and considered for statistical analysis. Subsequently, the best cut-off value was calculated and used to calculate the diagnostic value.

The final diagnosis was achieved by cytological evaluation after FNAC and/or histological examination of postoperative specimens (including all malignant cases) and follow-up of all patients for at least 1 year.

Statistical analysis

Data management and analysis were performed using Statistical Package for Social Sciences (SPSS) vs. 25. Numerical data were summarized appropriately using means and standard deviations or medians and/or ranges. Categorical data were summarized as numbers and percentages. Estimates of the frequency were performed using the numbers and percentages. Numerical data were explored for normality using the Kolmogorov–Smirnov test and Shapiro–Wilk test. To measure the association between categorical variables, the chi-square test or Fisher exact was conducted, to compare two independent percentages. Comparisons between two groups for normally distributed numeric variables were carried out using the Student t test, while for non-normally distributed numeric variables, comparisons were made by the Mann–Whitney U test. Logistic regression was performed to give the adjusted odds ratio and magnitude of the effect of different factors. All tests were two-tailed and probability (p value) ≤ 0.05 is considered significant.

The receiver operating characteristic curve (ROC curve) was carried out to determine the best cut-off point, sensitivity, specificity, and area under the curve. The accuracy of the test depends on how the test separates the group being tested into malignant and benign. The area measures accuracy under the ROC curve. An area of 1 represents a perfect test; an area of 0.5 represents a worthless test. A rough guide for classifying the accuracy of a diagnostic test is the traditional academic point system:

  • •0.90–1.0 = excellent (A).

  • •0.80–0.90 = good (B).

  • •0.70–0.80 = fair (C).

  • •0.60–0.70 = poor (D).

  • •0.50–0.60 = fail (F).

Results

A series of 1269 patients (1088 females and 181 males) with a mean age (SD) of 44 (10) years was referred for evaluation of thyroid nodules; 650 were located in the right lobe, 579 were in the left lobe, and 40 were in the isthmus. A total of 1133 patients had MNG, while 136 patients had a solitary thyroid nodule (Table 1).

Table 1 Sonographic data of the included patients

Regarding the echogenicity of the nodules, the majority were hypoechoic in 706 patients, followed by hyperechoic in 243, isoechoic in 207, complex cystic in 48, heterogeneous in 33, markedly hypoechoic in 17, and cystic in 12. About 381 out of 1269 patients had calcifications in the form of microcalcifications (Fig. 1) in 265 patients, patches of calcifications in 76 patients, egg-shell calcifications in 17 patients, punctate calcifications in 16 patients, and partial egg-shell calcifications in 7 patients. About 495 out of the total cases were found to have cystic breakdown inside the nodule, while 856 cases had a halo sign. The majority of cases had antero-posterior/transverse (A/P) ratio < 1 (1250), while only 19 cases had an A/P ratio ≥ 1. Most nodules had a regular border in 1222 cases, and only 47 cases had an irregular border (Fig. 2) (Table 1).

Fig. 1
figure 1

Malignant thyroid nodule with extensive microcalcifications

Fig. 2
figure 2

Malignant thyroid nodule with irregular ill-defined borders and microcalcifications

Regarding real-time elastography, we found that most of our cases were scored 2 in 993 cases, while score 1 was found in 103 cases, score 3 was found in 137 cases, and score 4 in 36 cases. According to the TI-RADS classification, 15 had TI-RADS category 2, 368 had TI-RADS category 3, 511 had TI-RADS category 4, and 354 had TI-RADS category 5 (Table 1).

By US evaluation, the diagnosis in our patients was 91 malignant nodules and 1178 benign nodules, while the final diagnosis was 1197 benign nodules and 72 malignant nodules. Most of the malignancy was papillary carcinoma in 59 cases, follicular neoplasm in 7 cases, medullary carcinoma in 4 cases, and lymphoma in only 2 cases (Table 1). The majority of malignancy was in female patients (53 cases), compared to 19 cases were in male patients (Table 2).

Table 2 The final diagnosis by age and gender

As regards the characteristics of malignant (n = 72) nodules in our studied patients, most of the malignant nodules have a hypoechoic echogenicity (48 cases, 6.8%), irregular border (38 cases, 80.9%), A/P ratio < 1 (58 cases, 4.6%), microcalcifications (33 cases, 12.5%), absent areas of breakdown (65 cases, 8.4%), and no surrounding halo (47 cases, 11.4%). Also, many malignant nodules scored elastography score 4 (30 cases, 83.3%) (Fig. 3) and TI-RADS category 5 (62 cases, 17.6%) (Table 3).

Fig. 3
figure 3

Malignant thyroid nodule with grade 4 elasticity score

Table 3 The characteristic of benign and malignant nodules

The multivariate analysis was used to predict malignancy in thyroid nodules and measure the independent effect of different factors on the occurrence of malignancy in the thyroid gland. The factors that had a significance level less than 0.100 were selected to enter into stepwise logistic regression analysis. The regression coefficient shows the effect of each variable after controlling the effect of other variables in the model. We found that the nodules with A-P/T diameter > 1 have 21 times more risk of being malignant than those with A-P/T diameter < 1. Patients with solitary thyroid nodules have 4.5 times to develop malignancy compared to those with MNG. Nodules with absent halo have 4 times more risk of malignancy.

Microcalcifications in thyroid nodules increase the risk of malignancy 9 times compared to those without calcifications. For every unit increase in SR, the risk of malignancy increased by 20% (Table 4).

Table 4 Multivariate analysis (logistic regression for prediction of malignancy in thyroid nodule)

ROC curve analysis was used to determine the cut-off point of the SR to differentiate between benign and malignant thyroid nodules (Figs. 4 and 5); it showed a sensitivity of 87%, specificity of 80%, 21% positive predictive value (PPV), and 99% negative predictive value (NPV). An AUC of 0.92 and a cut-off value of > 1.8 with p value < 0.001 were determined (Table 5).

Fig. 4
figure 4

Malignant thyroid nodule with high strain ration and microcalcifications

Fig. 5
figure 5

Receiver operating characteristic (ROC) curve analysis to determine cut off point of the strain ratio to discriminate between benign and malignant thyroid nodule

Table 5 ROC curve to determine cutoff point of strain ratio that discriminate between malignant and benign thyroid nodule

The accuracy of ultrasonography in detecting malignant thyroid nodules showed a sensitivity of 89%, specificity of 98%, 70% PPV, and 99.3% NPV, with an overall accuracy of 97.2% (Table 6).

Table 6 Comparing the different diagnostic sonographic criteria, real-time elastography, and TERADS score in prediction of malignant thyroid nodules

Comparison between the different diagnostic sonographic criteria, real-time elastography, and TI-RADS score in prediction of malignant thyroid nodules revealed that combined elastography and TI-RADS scores had a sensitivity of 77.8%, specificity of 91.1%, with an overall accuracy of 90.4% if both were high and a sensitivity of 98.6%, and specificity of 32.7% if either any of them was high (Table 6).

Four out of 72 (4.55%) malignant thyroid nodules were not recommended for FNA as they were 2 nodules with TIRADS 4 smaller than 15 mm in longitudinal diameter and another 2 nodules with TIRADS 5 smaller than 10 mm. However, they have an elasticity score 3, so FNA was performed and finally proved papillary carcinoma after surgical excision. This emphasizes the importance of adding Elasticity score to TI-RADS score to evaluate thyroid nodules that could save 4.55% of our cases. Only one case (1.38%) with malignant papillary carcinoma was TI-RADS 3 and had an elasticity score 2.

Discussion

Finding a thyroid nodule still poses a diagnostic dilemma. There is an urgency for accurate examination of suspicious thyroid nodules in a rapidly rising incidence of thyroid cancer worldwide to avoid overtreatment of benign nodules [15, 16].

In our study, we evaluated the diagnostic accuracy of US features, elastography score, SR, and the TI-RADS scoring system in the prediction of malignant thyroid nodules on 1269 patients. Several sonographic features have been consistently associated with thyroid malignancy, such as hypoechogenicity, taller than wide dimensions in the transverse plane, lobulation or speculation, and microcalcifications [1, 17].

In our present study, the final diagnosis based on ultrasonographic criteria was 91 malignant nodules and 1178 benign nodules compared to the final diagnosis that was 1197 benign nodules and 72 malignant nodules. Most of the malignant nodules (n = 72) were characterized by a hypoechoic texture (48 cases, 6.8%), irregular border (38 cases, 80.9%), A-P/T ratio < 1 (58 cases, 4.6%), microcalcifications (33/56), and absence of a surrounding halo (47 cases, 11.4%). Also, most malignant nodules scored elastography score 4 (30 cases, 83.3%) and TI-RADS category 5 (62 cases, 17.6%). We found that A-P/T ratio < 1 had a sensitivity of 19.4%, specificity of 99.6%, PPV of 73.7%, NPV of 95.4%, and overall accuracy of 95%. While irregular border, calcifications, and absence of halo had a sensitivity of 52.8%, 77.8%, and 65.3%, specificity of 99.2%, 72.8%, and 69.4%, PPV of 80.8%, 14.7%, and 11.4%, NPV of 97.2%, 98.2%, and 97.1, with overall accuracy of 96.6, 73, and 69.2, respectively (Table 6).

Remonti et al. reported that the specificity of microcalcifications, irregular margins, non-oval shape, and marked hypoechogenicity in discriminating benign from malignant nodules was 87.8, 83.1, 96.6, and 62.3%, respectively [18]. This is also matched with many other studies concerned about the ultrasonic characteristics of malignancy as Kim et al. that concluded that high rates of specificity were associated with microcalcifications, irregular margins, non-oval shape, and marked hypoechogenicity (85.8, 83, 92.5, and 94.3%, respectively) [19]. It is noteworthy that these two studies also reported a low sensitivity 39.5, 50.5, 26.7, 62.7%, and 59.2, 55.1, 32.7, 26.5%, respectively.

By multivariate regression analysis, we detected that nodules with A-P/T diameter > 1 have 21 times more risk to be malignant than those with A-P/T diameter < 1. Patients with solitary thyroid nodules have 4.5 times to develop malignancy compared to those with MNG. Nodules with absent halo have 4 times more risk of malignancy. Furthermore, the presence of microcalcifications in thyroid nodules increases the risk of malignancy 9 times compared to those without calcifications.

Fine needle aspiration (FNA) is frequently required in clinical practice to assess whether a nodule is malignant, benign, or requires surgery for a definitive diagnosis. There is a variety of ultrasound grading systems that have been developed to stratify the risk of malignancy and identify nodules requiring FNA [1].

Regarding the SR, we found that it could be an excellent discriminator to differentiate between benign and malignant nodules with P < 0.001. It was found that the SR had a sensitivity of 87%, specificity of 80%, 21% PPV, and 99% NPV to discriminate between benign and malignant thyroid nodules with a cut-off value of > 1.8. Furthermore, we found that the risk of malignancy increased by 20% for every unit increased in SR.

A striking study conducted by Cantisani V et al. showed that an SR greater than 2.31 could predict malignancy with a sensitivity of 86% and a specificity of 82% [20]. Meanwhile, in another study, Ding J et al. revealed that a cut-off point of 2.73 was achieved with a sensitivity of 89.3%, specificity of 73.2% [21].

Another study published by Cantisani et al. on 97 patients showed that a strain ratio greater than 2 has a sensitivity of 97.3%, specificity of 91.7%, a positive predictive value of 87.8%, a negative predictive value 98.2% [22]. On the other hand, another study by Vorlander et al. that was conducted on a larger number of patients (309) showed an NPV of 100% for a ratio of 3.2 and a PPV of 42% for a ratio of 6.7 [23].

Also, many published articles reported that SR has greater diagnostic accuracy in the assessment of indeterminate thyroid nodules as well [24].

In our current study and out of 72 malignant thyroid nodules, the majority were scored elastography score 4 (30 cases, 83.3%) and TI-RADS category 5 (62 cases, 86.11%). Also, we had 16 patients (1.6%) scored elastography score 2, and this raised our suspicious that patients with elastography score 2 considered borderline and could carry a high possibility of malignancy that needs to be combined with other sonographic criteria and TI-RADS classification, and justify for FNA and not just follow-up.

Our results agree with many other studies such as Sachdev et al., which reported that strain elastography had high sensitivity and specificity for differentiating malignant from benign thyroid nodules, and most benign thyroid nodules scored 1 and 2, while the majority of malignant thyroid nodules had elasticity scores of 3 and 4 [25].

Chandramohan et al. evaluated the accuracy of TI-RADS classification in the assessment of thyroid nodules and reported that the PPV for malignancy was high for TI-RADS category 5 and 4 nodules, thus, allowing having TI-RADS classification as a simple and practical method of assessing thyroid nodules in clinical practice [26].

As reported in the literature, the overall malignancy rate of thyroid nodules detected by ultrasound ranges between 8% and 22.7% [27].

In our study, the ultrasonography accuracy in detecting malignant thyroid nodules revealed a sensitivity of 89%, specificity of 98%, 70% PPV, and 99.3% NPV, with an overall accuracy of 97.2%. At the same time, using combined elastography and TI-RADS scores, if both are high, achieved a sensitivity of 77.8%, specificity of 91.1%, 34.6% PPV, and 98.6% NPV with an overall accuracy of 90.4%.

Four out of 72 (4.55%) malignant thyroid nodules were not recommended for FNA as they were 2 nodules with TIRADS 4 smaller than 15 mm in longitudinal diameter and another 2 nodules with TIRADS 5 smaller than 10 mm. However, they have an elasticity score 3, so FNA was performed and finally proved papillary carcinoma after surgical excision. This emphasizes the importance of adding Elasticity score to TI-RADS score to evaluate thyroid nodules that could save 4.55% of our cases. Only one case (1.38%) with malignant papillary carcinoma was TI-RADS 3 and had an elasticity score 2.

The main strength of our study is its prospective design and a large number of patients recruited from one referral center with a single expert operator who is blinded to any data. Therefore, we were overcoming the bias and interobserver variability; however, there were some limitations as there were different pathologists and limitations regarding strain elastograms that are very subjective, leading to misinterpretations and pitfalls; this was partially overcome by the semi-quantitative SR.

Conclusion

We concluded that thyroid US features including TI-RADS score, without cytology, could have a high diagnostic value when performed by an expert operator, and this is an excellent value in avoiding invasive FNAC and reducing the number of unnecessary diagnostic thyroidectomies. Evaluation of the risk stratification scoring systems such as elastography score and SR, adds another high significant value in predicting malignant thyroid nodules.

Recommendations

Elastography is a simple non-invasive rather cheap technique that should be integrated as a parameter of the ultrasound classification of the nodules in addition to the TI-RADS scoring. With a special concern regarding elasticity scores: score 1 is 100% benign; scores 3 and 4 are high indicators of malignancy; meanwhile, score 2 is borderline with a high recommendation for FNA.

Ongoing educational efforts and advocacy are needed to apply consistent and uniform guidelines among radiologists and referring clinicians to reduce unnecessary FNAs and reduce costs to the healthcare system.

Multiple multicenter studies and periodic evaluation by international experts' consensus panels are necessary to establish the role of US features, elastography score, SR, and the TI-RADS scoring system in predicting malignant thyroid nodules.

Artificial intelligence (AI) application will have a significant evolution in multiple facets of thyroid imaging that would have reduced FNA in benign nodules.