Introduction

Since the first study indicating the relationship between abnormal thyroid tissue and increased echogenity with a low-resolution B-Mode ultrasound have been reported by Fujimoto et al. [1] in 1967, the incidence of thyroid nodules detected by ultrasonography has been increased with an epidemic-like trend. Although physical examination enables determining thyroid nodules in only approximately 5 to 7% of the adult population, the frequency of detected thyroid nodules with ultrasonography exceeds 60% of the adult population [2]. More than 90% of thyroid nodules are benign lesions of no clinical significance, but approximately 4 to 6.5% of thyroid nodules may be clinically significant since that they may represent thyroid cancer [2, 3]. Following the first description of fine-needle aspiration (FNA) techniques by Martin and Ellis in 1930 [4] and the recommendation of ultrasound guidance by Walfish et al. in 1977 [5], ultrasonography-guided FNA is the standard tool for the appropriate clinical management of thyroid nodules. Thyroid FNA accompanied by advanced ultrasonographic imaging is a safe, inexpensive, and minimal invasive method in detecting thyroid nodules that require surgery [6]. In this context, The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) is a worldwide accepted reporting system for thyroid FNAs since the first edition reported following National Cancer Institute “Thyroid FNA State of the Science Conference” held in Bethesda, MD, in October of 2007 [7]. TBSRTC was developed to standardize the diagnostic terminology of thyroid cytopathology and consists of six categories, each of which indicates a particular risk of malignancy (ROM) which ensures that patients benefit from appropriate clinical management by using the same language between the pathologist and the surgeon or endocrinologist [6, 8]. However, significant interobserver variabilities exist for indeterminate categories due to the subjective morphological interpretation. Of these categories, undoubtedly one of the most problematic is the Bethesda category III: atypia of undetermined significance (AUS). The recommendation about the upper limit of the AUS diagnostic category to no more than 10% of thyroid FNAs in the 2nd edition of TBSRTC [9] continues unchanged, and also a suggestion about AUS:malignant ratio ≤ 3.0 as a laboratory quality control was included in the recently published TBSRTC 3rd edition [6]. The AUS category has been simplified by removing the term “follicular lesion of undetermined significance (FLUS),” and the possible scenarios presented in the TBSRTC 2nd edition are organized and subdivided into subcategories as “AUS-nuclear” and “AUS-other” in the recent edition of TBSRTC [6, 9]. However, the rates for AUS varying between 1 and 22% among laboratories have been reported in the literature [10]. So, it can be argued that the category with the lowest agreement between cytopathologists is the AUS category according to the studies in the literature [11,12,13,14]. The heterogeneity of this category, the subjectivity of interpretation, and the lack of a set of objective criteria, as many believe, lead to the overuse of this category and the decrease in harmony between cytopathologists [10]. In addition, it is difficult to determine the ROM since only a small portion of AUS cases has surgical follow-ups. The predicted mean ROM for AUS category reported in the TBSRTC 3rd edition as 22 (13–30%) [6]. However, the reported risk of histological proved malignancy in the literature varies between 17 and 83%, showing a wide range [15, 16].

Many authors have emphasized subcategorizing AUS category due to their wide range of malignancy risk and the heterogeneous nature of the category, and they have reported higher rates of malignancy in AUS with nuclear or cytological atypia than AUS with architectural atypia [8, 16,17,18,19,20,21,22,23,24,25,26,27]. And finally, two-tiered subcategorization of AUS including AUS “nuclear” and AUS “other” is recommended in the recent 3rd edition of TBSRTC [6]. While the definition of AUS, nuclear subcategory includes the criteria as focal nuclear atypia, diffuse but mild nuclear atypia, nuclear and architectural atypia, atypical cyst-lining cells, and histiocytoid cells, AUS; other subcategory consists of architectural atypia (defined by three different scenarios); oncocytic atypia (two scenarios); atypia, not otherwise specified (three scenarios); and atypical lymphoid cells [6]. At this point, new questions, which are the subjects of the present study, may emerge as in the following: Why does not every case of nuclear atypia exhibit the same ROM? Are there nuclear features that may predict a higher ROM? Can a more objective approach be obtained by scoring nuclear features in predicting malignancy? Can the ROM differ according to the criteria defined for the subcategories?

So, this study aims to investigate the impact of the subcategories, the impact of the criteria defining these subcategories, and the effects of nuclear and architectural features on risk of malignancy in AUS. And also, the authors aim to investigate the producibility and usefulness of a nuclear scoring system in order to achieve a more objective evaluation of nuclear features in thyroid FNA cytology.

Material and Methods

Study Design

The records of 6940 thyroid FNAs evaluated in Department of Pathology between January 2017 and July 2021 were reviewed. The study design was approved by the local Ethic Committee of the University Hospital (Protocol Code: TÜTF-GOBAEK 2023/76). 1224 (17.6%) cases diagnosed with AUS/FLUS according to TSRBS 2nd edition [9] were reevaluated. In order to rule out reactive changes due to the previous aspirations, cases diagnosed with AUS/FLUS in the first aspiration materials of the nodules were included in the study. To ensure histopathological correlation, 323 cases with resection materials (hemithyroidectomy/total thyroidectomy) were evaluated. The compatibility of the aspirated target nodule with the nodule observed in the resection was achieved by using ultrasonographic findings, macroscopic examination of the resection material, and histomorphological changes due to previous FNAs. Sixty-eight cases for whom correlation could not be established (nodules of similar size located close to each other in cases with multiple nodules, nodules whose ultrasonography report could not be obtained, nodules of similar size of which exact location was not specified in the resection macroscopy report, etc.) were excluded from the study. Of the cases that met the criteria, 15 cases were assigned to other TBSRTC categories (Nondiagnostic 1, Benign 13, Follicular neoplasia 1) after a diagnostic revision. As a result, 260 FNAs of 260 nodules and 240 resection materials of 240 patients were included in the study (Fig. 1).

Fig. 1
figure 1

Study design of the present study

Histopathological and Cytological Evaluation

In cytological evaluation, May-Grünwald Giemsa (MGG) and Papanicolau (PAP)-stained conventional smears and PAP-stained preparations obtained from the liquid-based cytology (LBS) method were used. Cytological evaluation was performed blindly, devoid of all other information except age, gender, nodule size, and location. The cellularity and colloid status of the aspirates were evaluated. The presence of cytological atypia, nuclear score (NS), and the extent of NS was determined. Architectural patterns and their extent were recorded. The presence of atypical cyst-lining cells, histiocytoid cells, oncocytic cells, psammoma bodies, histiocytes, lymphocytes, and atypical lymphoid cells was evaluated.

Hematoxylin & Eosin (H&E)-stained slides of formalin-fixed paraffin-embedded tissues were used in histopathological evaluation, and the histopathological diagnosis was revised according to the 5th edition of World Health Organization (WHO) Classification of thyroid tumors (Beta version) [28]. The cases were divided into four groups according to their histology: non-neoplastic disease (NND), benign neoplasm (BN), low-risk neoplasm (LRN), and malignant neoplasm (MN) (Table 1, Fig. 1).

Table 1 Cinicopathological features of the study group

NIFTP non-invasive follicular thyroid neoplasm with papillary-like nuclear features, WDT-UMP well-differentiated tumor of uncertain malignant potential, FT-UMP follicular tumor of uncertain malignant potential.

Evaluation of Nuclear Features

In order to determine the degree of nuclear atypia, nuclear features were reviewed, and a nuclear scoring was performed, inspired by Maletta et al. [29] and based on the guidelines created by Nikiforov et al. [30] for the histopathological diagnosis of NIFTP. Evaluated nuclear features included the headings as (1) size and shape (nuclear enlargement/overlapping and crowding/molding/elongation), (2) nuclear membrane irregularities (irregular contours/grooves/pseudoinclusions), and (3) chromatin features (chromatin clearing, chromatin margination). Each feature was scored as 0 or 1 depending on its absence/presence, respectively, and total score resulted in a NS ranging from 0 to 3 (Fig. 2). If the NS was 0 or 1, cytological atypia was considered absent, and if the NS was 2 or 3, nuclear atypia was considered present. NS2 was considered mild nuclear atypia, and NS 3 was considered marked nuclear atypia. The presence of a nuclear feature was considered focal when it was present in roughly less than 50% of the sample or when it appeared as a distinct focus from the rest of the sample and as diffuse when it was seen in roughly more than 50% of the sample.

Fig. 2
figure 2

Nuclear scoring. a Nuclear score 1 (liquid-based preparation, Papanicolau × 200). b, c, e, f Nuclear score 3 (liquid-based preparation, Papanicolau × 400). d Nuclear score 2 (liquid-based preparation, Papanicolau × 400)

Evaluation of Architectural Features

Architectural features were evaluated under the headings as in the following: microfollicular pattern, streaming pattern, trabecular pattern, and three-dimensional groups. The presence of one and/or more of these headings at least focally was considered to be “architectural atypia.” The term “streaming pattern” was used for cell populations in a mixed pattern, found in small groups and individual cells in some areas, forming microfollicles and trabeculae, in a continuous relationship with each other, particularly in conventional smears. This property evolved from our observations during the daily practice in evaluating thyroid FNAs. It was called “streaming pattern” because it creates a stream-like appearance in a certain area of the preparation (usually in the midline on the long axis of the preparation) (Fig. 3) (prepared in https://www.sketchbook.com/). The extent of the architectural pattern was considered focal when it involved a single slide or an area of the slide that appeared different from the rest, and otherwise as diffuse (roughly when seen in more than 50% of the aspirate).

Fig. 3
figure 3

Drawing representing schematic depiction of streaming pattern

Determination of Subcategories/Subgroups

The present study was designed in the era of the 2nd edition of TBSRTC [9] and was performed during the transition zone between two editions and finalized after the TBSRTC 3rd edition [6] was reported. The initial subcategories created on the basis of the 2nd edition were revised taking into account the subcategories and criteria expressed in the 3rd edition [6, 9]. Therefore, the subcategorization made in this study covered the criteria reported in both editions of TBSRTC (Table 2). The descriptions expressed about AUS/FLUS in the 2nd edition of TBSRTC [9] and the subcategories with related criteria expressed about AUS in the 3rd edition of TBSRTC [6] were blended, and 6 AUS subcategories and 11 subgroups addressing the possible scenarios that define these subcategories were created as in the following: AUS-nuclear atypia (AUS-N), AUS-architectural atypia (AUS-A), AUS-nuclear and architectural atypia (AUS-N&A), AUS-oncocytic atypia (AUS-A), AUS-not otherwise specified (AUS-NOS), and AUS-lymphoid cells-rule out lymphoma (AUS-L). The subgroups of AUS-N were designed as focal nuclear atypia (AUS-N1) (Fig. 4), diffuse but mild nuclear atypia (AUS-N2) (Fig. 5), atypical cyst-lining cells (AUS-N3) (Fig. 6a, b), and histiocytoid cells (AUS-N4) (Fig. 6c, d). Coexistence of architectural and nuclear atypia was considered subgroup AUS-N&A (Fig. 7a, b). Under the heading of the AUS-A subcategory, two subgroups were determined as in the following: preparations containing diffuse architectural atypia with low cellularity (AUS-A1) (Fig. 7c), preparations containing focal architectural atypia with moderate-marked cellularity (AUS-A2) (Fig. 7d). AUS-O subcategory was divided in two subgroups consisting of AUS-O1 (preparations containing diffuse oncocytic cells with low cellularity) (Fig. 8a) and AUS-O2 (preparations containing diffuse oncocytic cells with intermediate-high cellularity) (Fig. 8b). Three subgroups of AUS-NOS subcategory were created as AUS-NOS1 (preparations with isolated nuclear enlargement and prominence of nucleoli) (Fig. 8c), AUS-NOS2 (preparations including isolated psammomatous calcification), and AUS-NOS3 (preparations with changes that may be due to preparation artifacts and other) (Table 2). AUS-L subgroup included atypical lymphoid cells with suspicion of lymphoma (Fig. 8d).

Table 2 Subcategories and subgroups of atypia of undetermined significance
Fig. 4
figure 4

AUS-N1: focal marked nuclear atypia in a background of benign follicular cells (conventional smear, Papanicolau, × 200; inlet, × 400)

Fig. 5
figure 5

AUS-N1: diffuse nuclear atypia in paucicellular aspirate (liquid-based preparation, Papanicolau, × 200; inlet, × 400)

Fig. 6
figure 6

a, b AUS-N3: atypical cyst-lining cells (conventional smear, MGG, × 400). c, d AUS-N4: histiocytoid cells (conventional smear, MGG, × 400)

Fig. 7
figure 7

a, b AUS-N&A: nuclear and architectural atypia (a conventional smear, MGG, × 400; b conventional smear, Papanicolau, × 400). c, d AUS-A: architectural atypia (c conventional smear, MGG, × 400; d conventional smear, Papanicolau, × 400)

Fig. 8
figure 8

a AUS-O1: extensive oncocytic cells with low cellularity (conventional smear, Papanicolau, × 200, inlet; conventional smear, MGG, × 200). b AUS-O2: extensive oncocytic cells with intermediate cellularity (conventional smear, Papanicolau, × 200). c AUS-NOS1: isolated nuclear enlargement and prominent nucleoli (conventional smear, Papanicolau, × 400). d AUS-L: atypical lymphoid cells (conventional smear, MGG, × 40, inlet; conventional smear, MGG, × 400)

Determination of ROM

The ROM was calculated as a percentages obtained by dividing the total number of histopathologically confirmed malignant cases by the AUS cases with surgical follow-ups. The ROM was calculated separately for each subcategory/subgroup/NS/nuclear feature/architectural feature.

Statistical Analysis

Results were shown as mean ± Std. Deviation or numbers and percentages. The chi-square test (Pearson, continuity correction, or Fisher) was used in comparisons of categorical variables. Multivariate logistic regression analysis was used to examine the effect of nuclear features on histological malignancy, and odds ratio and 95% confidence intervals were calculated. ROC analysis was used to examine the discriminative power of nuclear features on malignancy, and AUC, cutoff, sensitivity, and specificity values were calculated. p < 0.05 value was accepted as the limit value of statistical significance. SPSS 20.0 statistical package program was used to analyze the data (IBM SPSS Statistics for Windows, version 20.0. Armonk, NY, IBM Corp.).

Results

The clinicopathological features of the study group are presented in Table 1. The mean age of the study population was 50.1 ± 11.9 years. The ratio of men to women was found to be 1.00:3.56. Nodule sizes ranged from 0.5 to 7.5 cm, with the average size being 1.5 cm. The rate of AUS was 17.6%. Overall risk of malignancy for AUS was 30.7%. According to histopathological diagnosis after resection, 13 (5%) of the cases were diagnosed as NND, 138 (53.1%) were diagnosed as BN, 29 (11.2%) were diagnosed as LRN, and 80 (30.7%) were in the malignant category (Fig. 1).

Comparisons of Nuclear Features and NS with Histopathological Diagnosis and ROM

Comparisons of Nuclear Features with Histopathological Diagnosis and ROM

The most frequently observed nuclear feature in FNAs was nuclear enlargement which was seen in 239 (91.9%) of the cases. Descending order for frequency of other nuclear features was as follows: nuclear crowding, nuclear membrane irregularities, nuclear elongation, chromatin clearing, molding, chromatin margination, and nuclear grooves. No intranuclear pseudoinclusion was observed (Fig. 9). Comparisons of nuclear features with histopathological diagnosis are presented in Table 3. Nuclear overlapping, chromatin clearing, and margination of nuclear chromatin were significantly more common in LRNs and MNs (p < 0.001), while nuclear molding, nuclear contour irregularity, and nuclear grooves were significantly more frequent in MNs (p < 0.001). The ROM was significantly higher in the FNAs with nuclear overlapping (35.5%, p < 0.001), with nuclear molding (56.9%, p < 0.001), with nuclear contour irregularity (42.1%, p < 0.001), with nuclear grooves (74.1%, p < 0.001), with chromatin clearing (49.4%, p < 0.001), and with chromatin margination (57.7%, p = 0.004) than the FNAs without these features. The ROM was higher in patients with nuclear enlargement with a close to the significance (32.6%, p = 0.051) (Table 3).

Fig. 9
figure 9

Frequency of nuclear and architectural features in the study group

Table 3 Relationship between nuclear features with histopathological diagnosis and risk of malignancy

Impact of Nuclear Features on ROM

The effect of nuclear features on histopathological malignancy was evaluated by logistic regression analysis. Multivariate analysis revealed that nuclear grooves [6.3 (2.5–16.0), p < 0.001], nuclear overlapping/crowding [6.2 (1.4–27.1, p = 0.016], nuclear molding [3.9 (2.1–7.1), p < 0.001], irregular nuclear contours [2.8 (1.5–5.1), p = 0.001], chromatin clearing [2.6 (1.4–4.8), p = 0.002], and margination of chromatin [2.5 (1.1–6.1), p = 0.035] were significant independent predictors for histological malignancy (Table 4). And also, ROC analysis revealed that presence of > 2 nuclear features [AUC = 0.760 (standard error = 0.0299); p < 0.0001] may have likelihood of histological malignancy (Table 5, Fig. 10).

Table 4 Impact of nuclear features on histological malignancy by multivariate logistic regression analysis
Table 5 Discriminative power of nuclear features on malignancy
Fig. 10
figure 10

ROC analysis for likelihood of malignancy according to the number of nuclear features

Comparisons of NS with Histopathological Diagnosis and ROM

Comparisons of NS according to histopathological diagnosis revealed that NS1 (p = 0.001), either extensive NS1 (p = 0.031) or focal NS1 (p = 0.005), was significantly more common in BNs than other diagnostic categories. Also, NS1 was more common in NND (p = 0.001). Focal NS2 was significantly more frequent in BNs (p = 0.034). NS3, either focal or extensive, was significantly detected at higher rates in MNs (p < 0.001). The ROM was significantly higher in the FNAs with NS3 (64.2%, p < 0.001), in FNAs with focal NS3 (61.1%, p < 0.001), and in FNAs with extensive NS3 (70.6%, p = 0.001). However, significant lower ROM was detected in FNAs with NS0 (0.0%, p = 0.020), in FNAs with NS1 (16.7%, p = 0.001), in FNAs with focal NS1 (12.2%, p = 0.009), and in FNAs with focal NS2 (17.3%, p = 0.029) (Table 6).

Table 6 Relationship between nuclear score with histopathological diagnosis and risk of malignancy

Comparisons of Architectural Features with Histopathological Diagnosis and ROM

The most common architectural feature was the microfollicular pattern which was detected in 154 (59.2%) of the cases. Trabecular pattern was seen in 145 (55.8%) of the cases, three-dimensional groups were seen in 140 (53.8%) of the cases, and streaming pattern was present in 44 (16.9%) of the cases. Microfollicular pattern (p = 0.026) and trabecular pattern (p = 0.048) were significantly more common in BNs and in LRNs than other diagnostic categories. However, three-dimensional groups was significantly more frequent in LRNs and MNs (p = 0.018). Also, the ROM was significantly lower in FNAs with microfollicular pattern (26.0%, p = 0.043) (Table 7).

Table 7 Relationship between architectural features with histopathological diagnosis and risk of malignancy

Comparisons of AUS Subcategories and Subgroups with Histopathological Diagnosis and ROM

Comparisons of AUS subcategories and subgroups with histopathological diagnosis are presented in Table 8. According to the comparisons, AUS-N subcategory (48.2%, p = 0.014) was significantly more common in MNs, while AUS-N1 subgroup was also significantly more common in MNs (65.2%) (p < 0.001). FNAs with AUS-A subcategory (71.9%, p = 0.002) and AUS-A2 (77.8%, p = 0.015) subgroup properties were significantly more frequent in BNs. Subcategory AUS-N&A was significantly more common in LRNs (16.0%) and MNs (38.3%) (p = 0.006). FNAs with subcategory AUS-O and subgroup AUS-O2 properties significantly presented as NND (p < 0.001). Subcategory AUS-NOS was significantly related with histological diagnosis as NND and BN (p = 0.016). The ROM was significantly higher in FNAs with properties of AUS-N (48.2%, p = 0.001), AUS-N1 (65.2%, p < 0.001), and AUS-N&A (38.3%, p = 0.048) than FNAs without these features. On the other hand, the ROM was significantly lower in FNAs with features of subcategories AUS-A (17.2%, p = 0.011), AUS-0 (5.6%, p = 0.033), and AUS-NOS (11.5%, p = 0.044). In the AUS-N&A subcategory, comparisons were made between NS2 and NS3 in terms of ROM. According to these comparisons, ROM in the AUS-N&A subcategory with NS3 features was significantly higher than in the AUS-N&A subcategory with NS2 findings (p = 0.006) (Table 9).

Table 8 Relationship between AUS subcategories with histopathological diagnosis and risk of malignancy
Table 9 Relationship between risk of malignancy and nuclear scores in subgroup AUS-N&A

Pairwise Comparisons Between Subcategories of AUS According to the ROM and Identification of New Risk Groups

Pairwise comparisons between subcategories of AUS according to the ROM are presented in Table 10. No statistically significant difference was observed in terms of malignancy risk between the subcategories AUS-N and AUS-N&A (p = 0.308). The ROM for subcategory AUS-N was significantly higher than subcategories including AUS-A (p = 0.001), AUS-NOS (p = 0.003), and AUS-O (p = 0.003). There were similar results in comparisons for the AUS-N&A subcategory with other subcategories. AUS-N&A subcategory revealed higher ROM than subcategories such as AUS-A (p = 0.008), AUS-NOS (p = 0.019), and AUS-O (p = 0.015).

Table 10 Pairwise comparison of malignancy risks in AUS subcategories

Since the malignancy risks of AUS-N and AUS-N&A subcategories are significantly higher than the malignancy risks of other subcategories (AUS-A, AUS-O, AUS-NOS), AUS-N and AUS-N&A subcategories are considered a distinct group, and the other subcategories are considered a separate group. The ROM for two groups was compared. These comparisons revealed that the ROM was significantly higher in the group including AUS-N and AUS-N&A subcategories than the group including other subcategories (42.0% vs 13.9%) (p < 0.001). So, the group consisting of subcategories with higher risk of malignancy was named as “high-risk group” and the group including other subcategories with lower ROM was named as “low-risk group” (Table 11). Finally, ROM values were compared between AUS and “suspicious for malignancy” (SFM) categories (ROM value for SFM category was obtained from the archive records without reevaluation of aspirates). ROM of SFM was calculated in 103 nodules of 103 patients with surgical follow-ups. ROM for SFM (95.1%) was significantly higher than ROM values for AUS (overall), AUS-nuclear subcategory, AUS-other subcategory, and AUS-NS3 subgroup (Table 12).

Table 11 Comparison of malignancy risks in high-risk AUS group and low-risk AUS group
Table 12 Pairwise comparison of malignancy risks between AUS vs SFM, AUS-nuclear vs SFM, AUS-other vs SFM, and AUS-NS3 vs SFM

Discussion

After the introduction of the ultrasound into the clinical evaluation of thyroid diseases in the late 1960s, the frequency of detected thyroid nodules has been increased due to the widespread use of ultrasonography [1, 2]. Ultrasound-guided FNA cytology serves to provide the most appropriate clinical management of thyroid nodules. TBSRTC including six categories with suggestions of appropriate clinical management for each category provides a standardized reporting format for thyroid FNAs which has been updated with significant developments during the last two decades following its first edition [6]. However, indeterminate categories continue to exist including probably the most problematic category “Atypia of Undetermined Significance (Bethesda III)” in the 3rd edition of the reporting system [6]. In this context, the present study aimed to investigate the impact of AUS subcategories and subgroups designed according to the criteria reported in the 2nd [9] and in the 3rd edition [6] of TBSRTC on ROM via a three-staged study plan. In the first stage, nuclear and architectural features were evaluated, and these features were compared with the histopathological diagnostic groups determined according to the diagnostic categories and tumor types reported in the 5th edition (2022 Beta version) of the WHO Classification of Endocrine and Neuroendocrine Tumours [28], and comparisons of ROM were performed between all of the features. In the second stage, the current study tried to develop a nuclear scoring schema inspired by the study of Nikiforov et al. [30]. Finally, in the third stage, subcategories and subgroups were created based on NS, other cellular features, and architectural features, and these subcategories and subgroups were compared in terms of histopathological diagnosis and ROM. So, the most remarkable results of the present study can be summarized as in the following: (i) The umbrella term as AUS-nuclear subcategory, particularly subgroups including focal marked nuclear atypia (focal NS 3) and nuclear atypia with architectural atypia, constitutes a high-risk group in terms of malignancy; (ii) nuclear features including nuclear grooves, nuclear overlapping, molding, nuclear contour irregularity, chromatin clearing, and margination are independent significant predictors of malignancy; (iii) three-tiered nuclear scoring scheme may provide a more objective and reproducible assessment method in the evaluation of thyroid FNAs as in the assessment of histological nuclear features.

Although 3rd edition of TBSRTC [6] continues to suggest an upper limit as 10% for AUS in all thyroid FNAs, the reported rates range between 1 [10] and 20% [23] (Table 13). The rate of AUS was 17.6% in the current study (This value reflects the general data in our department. It does not include the evaluations made in the present study.), and this value was higher than the proposed rate of AUS in TBSRTC 3rd edition [6]. Probable cause of the higher rate of AUS may be the subjective application of the defined objective criteria in the evaluation of thyroid FNAs. Also, in each edition, the rate of ROM was updated according to the accumulated data and the effects of changing terminologies [6]. However, the reported rates of ROM in the literature have been often higher than the recommended values during the age of TBSRTC (Table 13). The ROM for categories was reported as mean and expected ranges in the recent edition of TBSRTC and ROM for AUS was reported as 22% (13–30) [6]. The current gold standard calculation of ROM is obtained by dividing the total number of histopathologically confirmed malignant cases by the AUS cases with surgical follow-ups; however, the calculation method is controversial since approximately half of the thyroid nodules in the AUS category do not have surgical follow-ups. Most of the data reported about the rates of ROM in the literature are the results of the studies performed before the era of NIFTP and ranges between 17.0 [15] and 83.1% [16]. Liu et al. [8] excluded NIFTP from the malignant diagnostic category and informed ROM as 54.3% in their study. In the present study, histological diagnostic categories were created according to the 5th edition (Beta version) of the WHO Classification of Endocrine and Neuroendocrine Tumours [28], and NIFTP was categorized as a LRN and was not considered a malignant tumor in the calculation of ROM. The risk of malignancy was 30.7% in the present study and was slightly higher than the upper value of the expected range in the 3rd edition of TBSRTC [6] but lower than the rate reported by Liu et al. [8]. The higher rate of ROM in the present study may be due to the selection of high-risk nodules for surgery by evaluating patients diagnosed as AUS with clinical and radiological findings in a multidisciplinary manner. So, wide range between the reported results for the rates of AUS and ROM, which mostly represent real-life data, may be related to various factors, including inadequate clinicopathological communication, insufficient practice of objective criteria for evaluation of FNAs, the tendency of observers to stay in the safe zone, and the lack of experience of researchers.

Table 13 Review of the previous studies investigating the effect of AUS subcategories on risk of malignancy

The present study investigated the impact of nuclear and architectural features on risk of malignancy in FNAs with AUS. All of patients in the study group had surgical follow-ups, and the FNA localization and the target lesion in the surgical specimen overlapped. The first FNAs of each nodule were included in the study in order to prevent previous FNA-related regenerative changes from affecting the evaluation. Microfollicular pattern and trabecular pattern were seen significantly more frequently in FNAs of BNs and LRNs, while three-dimensional groups were significantly more common in FNAs of MNs in surgical specimens. And also, three-dimensional groups show the highest ROM value with close to significance. Gularia et al. [31] and Kaymaz et al. [32] showed that cases containing a three-dimensional group had higher ROM values. So, a more skeptical investigation of FNAs with three-dimensional groups for an additional evidence of malignancy may be argued. Additionally, definition of streaming pattern which was detected in 44 cases in our study is reported for the first time in the literature. Although, a partially similar pattern is mentioned in the TBSRTC 3rd edition (architectural atypia, criteria 3) [6], it is not thought to fully correspond to the pattern described here. When we examined the cases with streaming patterns, it was determined that initial FNAs of BNs more frequently exhibited this pattern without statistically significance. However, considering that all of the cases in our study were in the AUS category, our findings may not clearly reflect the nature of the streaming pattern, and investigation of this pattern in a study group with a larger number of cases including all of the TBSRTC categories may reveal the impact of this pattern on thyroid cytology.

Comparisons of nuclear features with ROM values revealed that the risk of malignancy in FNAs with nuclear grooves, chromatin margination, nuclear molding, nuclear contour irregularity, and nuclear overlapping (in an order with descending ROM values) was significantly higher than in FNAs without these features. Also, nuclear grooves, nuclear overlapping, molding, nuclear contour irregularity, chromatin clearing, and chromatin margination (in an order with descending Odds ratios) were significant independent predictors of histological malignancy in multivariate analysis. However, nuclear overlapping, chromatin clearing, and chromatin margination were also more common in the group consisting of histological LRNs. There was no relationship between nuclear elongation and nuclear enlargement with histological malignancy. Kato et al. [33] reported that the presence of nuclear grooves may indicate histological malignancy in indeterminate thyroid cytology. Additionally, the authors stated that the presence of four or more atypical nuclear features or coexistence of nuclear grooves and inclusions may be associated with malignancy. Kaymaz et al. stated that membrane irregularities such as pseudoinclusion, nuclear contour irregularity, and nuclear grooves were associated with malignancy, and also, nuclear elongation and overlapping were predictive for malignancy [32]. FNAs with nuclear grooves revealed the highest risk of malignancy, and in multivariate analysis, the presence of nuclear grooves resulted as an independent significant predictor for malignancy with the highest value of odds ratio as the authors have reported. On the other hand, the presence of more than two of six atypical nuclear features was found to be a significant likelihood of malignancy in the ROC analysis performed in this study. The previous results reported in terms of the relationship between malignancy with nuclear contour irregularity, nuclear overlapping, and nuclear grooves were similar with the results of the present study regarding these parameters. However, according to the definition in TBSRTC [6, 9], pseudoinclusion should by definition not be present in the evaluation of nuclear atypia for AUS except atypical cyst-lining cells. In the current study, there was no FNA revealing intranuclear pseudoinclusion in the study group. As a result, the presence of nuclear features investigated in the current study may be valuable criteria for marked nuclear atypia. Also, the presence of nuclear grooves or the presence of more than two of the mentioned atypical nuclear features may represent likelihood of malignancy.

The difficulty in using subcategories, as well as the difficulty in using the diagnostic category of AUS, is evidenced by the wide range of ROM reported in the previous studies (Table 13). These differences may probably result from changes in the identification of variables and the entities over time. However, the striking point is the presence of interobserver variability in the evaluation of cytological atypia despite the reported widespread definitions and the criteria. Although the features called cytological or nuclear atypia (such as nuclear enlargement, chromatin clearing, and nuclear membrane irregularity) are the same among cytopathologists, there is significant subjectivity in the evaluation and standardization is quite weak. In the recent history of endocrine pathology, definition of nuclear scoring for nuclear features has made a significant contribution to the evaluation of encapsulated follicular patterned thyroid tumors with papillary-like nuclear features [30]. So, the current study tried to develop a nuclear scoring schema inspired by the study of Nikiforov et al. [30] in order to reduce this subjectivity in the evaluation of nuclear features. Thus, nuclear features were evaluated under the headings of (1) size and shape, (2) nuclear membrane irregularities, and (3) chromatin features resulting in a NS ranging from 0 to 3. Among the NS groups, the group with the highest ROM with a rate of 64.2% (n = 34) is the NS3 group. Compared to the overall ROM in the present study (30.7%), the malignancy risk of focal/diffuse NS3 was above this rate and was significantly higher than the other NSs. In this context, focal or diffuse NS3 was found to be more noteworthy in cytological evaluation. The NS2 was significantly more common in FNAs of BNs. Guleria et al. [31] and Kaymaz et al. [32] used similar nuclear scoring models as in the current study and showed that the NS2-3 group was more frequently associated with malignancy. However, in these studies, ROM of NSs were not presented separately, and their focal/diffuse status was not evaluated. Altınboğa et al. [34] examined aspirates with AUS according to the NS, but the scoring system they used was different from the scoring system used in the present study and was designed according to the percentages of nuclear features, and scores were given between 0 and 10 points for nuclear features. For this reason, the relationship between NS and ROM could not be compared with the results reported by the authors. However, there are also studies examining the cytological features of NIFTP using similar scoring system [29, 35, 36]. According to the data of these studies, which evaluated cytological features in TBSRTC indeterminate categories, it can be inferred that cases with NS2-3 have a higher ROM compared to cases with NS0-1. Evaluation of nuclear features by a scoring schema may provide a more objective and safer evaluation method in the categorization and subcategorization of thyroid FNAs, as it can transform the detected findings into numerical data.

The TBSRTC categorization and reported recommendations for each category standardizes the reporting of thyroid cytology and the clinical management of thyroid nodules. However, the criteria that will be reflected in the microscope objective for each category are defined in detail; the interpretation of what is reflected from the microscope eyepieces varies among pathologists. The fact that most of the studies published since the first edition reported values above the recommended rates of AUS and ROM by TBSRTC is a reflection of these variabilities. In the current study, which was carried out during the transition zone between the TBSRTC 2nd edition [9] and TBSRTC 3rd edition [6], 6 subcategories and 11 subgroups [AUS-N (AUS-N1-4), AUS-A (A1-2), AUS-N&A, AUS-O (O1-2), AUS-NOS (NOS1-3), and AUS-L] were created based on the definitions and criteria reported for each category in the TBSRTC 2nd edition [9], updated according to according to the reported terms in the 3rd edition of TBSRTC [6] and powered by the nuclear scoring schema, in order to contribute to a more objective evaluation. So, the definitions of subcategories and subgroups in the present study enclosed the successor and predecessor criteria of TBSRTC (Table 2) [6, 9].

Accumulated data from published studies reporting ROM for AUS following the 1st edition [7] resulted in subcategories as AUS-nuclear and AUS-other in the 3rd edition [6]. The subcategories reported in some of the studies on this subject in the literature and the malignancy rates evaluated in surgical resection materials for each category are summarized in Table 13. In these studies, there are different terminologies that may cover the AUS-nuclear subcategory and subcategories with different numbers and definitions that may correspond to the AUS-other subcategory. In subcategories that may include the AUS-nuclear subcategory, ROM varies between 28 [21] and 100% [26]. While some of these mentioned studies evaluated architectural atypia separately, some examined it within other subcategories. Additionally, reported ROM values for architectural atypia vary between 6.9 [21] and 73.3% [37].

In this study, the AUS-nuclear subcategory covers the cytological atypia criteria in the TBSRTC 2nd edition [9]; it includes definitions other than the nuclear and architectural atypia criteria defined under the AUS-nuclear atypia subcategory in the TBSRTC 3rd edition [6]. ROM of the AUS-N subcategory was significantly higher than the other subcategories (AUS-L subcategory was ignored due to the low number of cases). Therefore, if the results of the current study and the data accumulated from previous studies are combined, the superiority of the AUS-N subcategory over other subcategories in predicting histological malignancy is clear, and the subcategorization reported in the TBSRTC 3rd edition [6] may be foreseeable to benefit in the clinical management of thyroid nodules diagnosed with AUS. In this case, the question may emerge: Does every nuclear atypia express this prediction correctly? In order to clarify this question, the AUS-N subcategory was examined separately in four subgroups in this study. AUS-N1 subgroup had higher ROM than other AUS-N subgroups and all other subcategories and subgroups in the study. In other words, in FNAs with focal NS3 features, ROM is high, and this value was found to be 65.2% in this study. This value is close to the ranges [74 (67–83%)] reported in the TBSRTC 3rd edition [6] for “Suspicious for malignancy, Bethesda Category V.” Perhaps, it may be questioned that upgrading of FNAs with NS 3/AUS-N1 subgroup as SFM rather than the AUS category and the use of the AUS diagnostic category may be reduced. Therefore, ROM values were compared between AUS and SFM categories in the current study (ROM value for SFM category was obtained from the archive records without reevaluation of aspirates). ROM of SFM (95.1%) was significantly higher than ROM values for AUS (overall), AUS-nuclear subcategory, AUS-other subcategory, and AUS-NS3 subgroup. The higher ROM of SFM in our department was approximately within the recommended rates for Bethesda category VI, malignant [97% (97–100)] [6]. Lack of the minimum quantitative threshold for a diagnosis of malignancy in FNAs and also protective effect of the term as “suspicious” may be the causes of the higher rate of ROM in SFM. The quality (presence/absence of intranuclear pseudoinclusions) and quantity (extent of intranuclear pseudoinclusions for malignancy) of the nuclear features and the perception of these features on the pathologists may cause the same features to be evaluated in different TBSRTC categories by different pathologists. Although presence of a few intranuclear pseudoinclusions and/or extensive nuclear grooves in follicular cells and widespread of other atypical nuclear features (approximately > 70% of aspirates) was regarded as SFM rather than AUS in the present study, presented ROM for SFM obtained from the archive records may reflect the interobserver variability in indeterminate TBSRTC categories. Thus, studies including comparisons between the subgroups of the AUS-N subcategory with SFM category via a nuclear scoring schema in larger study groups may be beneficial in this regard.

Since, ROM is similar in aspirates with both mild nuclear and architectural atypia regardless of the presence or absence of concomitant architectural atypia, this subgroup is considered AUS-nuclear subcategory in the TBSRTC 3rd edition [6]. AUS-N&A subcategory had the third highest ROM rate following the AUS-N1 subgroup and AUS-N subcategory in the current study. The rate of LRN was also high in the surgical materials of the AUS-N&A subcategory. Therefore, we examined the NS distribution of AUS-N&A subcategory. These evaluations showed that the AUS-N&A subcategory with NS3 features had significantly higher ROM (60.0%) than AUS-N&A aspirates evaluated as NS2 (28.1%) (p = 0.006). Also, ROM for extensive NS2 features (40.6%) was higher than the focal NS2 features (15.6%) with close to the significance (p = 0.052). These findings emphasize the importance of nuclear scoring in thyroid FNA samples. AUS-A subcategory and AUS-A2 subgroup significantly indicated BNs on histological examination. Considering the relationship of three-dimensional groups with high ROM, which is emphasized in some previous studies [31, 32] and in the current study, perhaps architectural atypia, as well as nuclear atypia, may be nature-defining. So, detailed definition of architectural features as well as nuclear scoring and raising awareness in terms of three-dimensional groups may be useful in the correct evaluation of thyroid cytology.

On the other hand, a significantly higher rate of NND was detected in the surgical specimens of nodules considered AUS-O and AUS-NOS subcategories with the lowest ROM value in the AUS-O category (5.6%) which was followed by AUS-NOS (11.5%). The ROM value determined for the AUS-O subcategory in this study is lower than the values reported in previous studies [16, 37,38,39] and the mean value reported in the TBSRTC 3rd edition [6]. A similar situation existed for the AUS-NOS subcategory, undoubtedly the category with the most different definitions [6, 16,17,18,19, 21, 31, 32, 39,40,41,42]. Park et al. [10, 31] used similar criteria for AUS-NOS as our study and detected a ROM of 14.5%, and this rate supports our study. Since clinical, radiological, and laboratory findings are important in cytological evaluations for the AUS-O and AUS-NOS subcategories, it may be due to the fact that evaluation of the study group in the multidisciplinary endocrine diseases council contributed to the differential diagnosis of these subcategories and to the selection of appropriate patients for surgery in the present study. Therefore, the integration of clinical and laboratory findings with cytological findings may save the AUS-NOS subcategory from being a wastebasket category.

In this study, all categories except the AUS-L subcategory were compared pairwise in terms of ROM. No statistically significant difference was observed between AUS-N and AUS-N&A subcategories in terms of ROM. In other words, these two subcategories exhibited similar patterns in terms of ROM. However, the ROM values of these two categories were found to be significantly higher than the ROM values of other subcategories. Therefore, these two subcategories were combined to form a group, and all other subcategories were combined to form a separate group. And finally, the ROM value of the group consisting of AUS-N and AUS-N&A subcategories was found to be significantly higher than the other group. Based on these results, the group consisting of AUS-N and AUS-N&A subcategories named as “high-risk group,” and the group consistent of other subcategories named as “low-risk group.” These last groups created actually correspond to the AUS-nuclear and AUS-other subcategories expressed in the TBSRTC 3rd edition [6]. Studies reported following the first introduction of TBSRTC [7] and a recently reported study based on TBSRTC 3rd edition [6], comparing the changes reported as AUS-nuclear or AUS-cytological atypia with other AUS defining changes, individually or in different combinations are summarized in Table 13 [8, 15,16,17,18, 20,21,22,23, 25, 26, 37,38,39, 43,44,45,46,47,48]. When the table is examined, it is obvious that the ROM values of the subcategories defined as AUS-nuclear atypia or AUS-cytological atypia are higher than other groups, both in individual studies and overall. Therefore, it can be predicted that the subcategorization proposed in the TBSRTC 3rd edition will have a positive impact on clinical practice. On the other hand, the majority of these studies belong to the era before NIFTP and malignancy rates included NIFTP. However, in the post-NIFTP period, NIFTP is included in the malignant category in some of these studies [37], while in others, the status of NIFTP is not fully explained [16, 26, 38, 47]. NIFTP was not even mentioned in several studies [8, 25, 48]. It is understood from this table that in studies that had not evaluate the NIFTP in the malignant category, ROM values for AUS-nuclear atypia or AUS-cytological atypia continue to be higher than other subcategories. Valderrabano et al. [49] reported that the ROM in cases with nuclear atypia was 46%, and the ROM in cases without nuclear atypia was 18% (NIFTP is considered in the malignant category) in a meta-analysis including 20 studies. In the current study, diagnostic categories were created based on the 5th edition (2022 Beta version) of the WHO Classification of Endocrine and Neuroendocrine Tumours [28], and NIFTP and other LRNs were examined as separate diagnostic categories. Therefore, the ROM values reported in the current study can be considered more objective. The current study had the advantage of comparing the TBSRTC 2nd edition [9] and TBSRTC 3rd edition [6] periods. Namely, the subcategories and subgroups included in this study offer the opportunity to compare in terms of ROM separately according to both TBSRTC editions. According to these data, based on the TBSRTC 2nd edition criteria for nuclear atypia, ROM was 48.2% [9], while it was 42.0% according to the TBSRTC 3rd edition [6]. In other words, when the nuclear and architectural atypia criteria are considered under the title of nuclear atypia, the ROM value actually decreases because the group dynamics are affected. Therefore, although thyroid cytology continues to be reported based on the criteria defined in the TBSRTC 3rd edition and the suggested subcategories, reporting the criteria defining the subcategory as a subgroup may contribute to the management of nodules. In addition, the use of nuclear scoring for the AUS-nuclear subcategory and the identification of detailed architectural patterns rather than general architectural atypia may have an impact on ROM values, and perhaps in the future, the nature of these two subcategories can be determined more realistically, and the subcategories can be included in other categories. Since the AUS-nuclear subcategory by definition also includes atypical cyst-lining cells and histiocytoid cells, this category actually includes not only nuclear atypia but also atypia partially due to cytoplasmic features. On the other hand, it also includes architectural atypia under its umbrella. For these reasons, perhaps defining the AUS-nuclear subcategory as “AUS-high risk group” and the AUS-other subcategory as “AUS-low risk group” can be discussed as a more appropriate terminological choice.

The present study has some limitations. First of all, this study is a retrospective study. Additionally, as discussed in the TBSRTC 3rd edition, the ROM value was calculated only for patients who underwent surgery; the actual ROM value in patients who did not undergo surgery is unknown. Since that, the rate for AUS and ROM for SFM in the current study were achieved from the records of previous FNAs, presented rates do not represent the rates likely to be achieved according to the criteria used in this study.

On the other hand, this study also has strengths. First of all, the number of cases with surgical follow-ups in the present study is higher than most of the previous studies except the study reported by Mathur et al. [39]. Also, the aspirates and surgical specimens of the cases were reevaluated and revised according to current diagnostic guidelines. Therefore, results and the definitions directly express current practice. Correlations between the aspirated nodule and the target lesion observed in the surgical specimen were made according to very strict criteria, and incompatible cases were excluded from the study. Since the cytomorphological features in recurrent aspirations may also include regenerative changes resulting from the previous aspirations, the initial aspirations of the nodules were included in the study in order to evaluate the real situation at the zero point, and repeated aspirations were excluded from the study. And finally, this study presents a review of previous studies investigating the ROM for subcategories of AUS following the first edition of TBSRTC (Table 13).

In conclusion, the accumulation of data reported in the literature since the TBSRTC 1st edition represented that the risk of malignancy in the subcategories defined as AUS-nuclear or AUS-cytological atypia is higher than other subcategories. Therefore, it can be predicted that the “AUS-nuclear” and “AUS-other” subcategorization proposed for the AUS category in the TBSRTC 3rd edition will be beneficial in the clinical management of thyroid nodules. However, considering the definitional framework of the AUS-nuclear subcategory, changing the nomenclature of AUS subcategories as “AUS-high risk” and “AUS-low risk” is open to debate. Since that, significant differences in terms of ROM were detected among the subgroups of AUS-nuclear atypia subcategory, the use of a nuclear scoring system in aspirates with AUS-nuclear atypia subcategory can convert nuclear atypia into numerical values and lead to a more objective evaluation. According to the results of our study, it can be predicted that subcategorization may not be the end point, and nuclear scoring with evaluation of architectural patterns according to strict criteria may provide data for remodeling of TBSRTC categories. In this regard, future studies that include the evaluation of nuclear atypia based on nuclear scoring in aspirates diagnosed as AUS may be beneficial and may guide the recommendations of the TBSRTC subsequent edition.