Thyroid gland carcinoma is a very prevalent neoplasia worldwide. A survey sponsored by the World Health Organization (WHO) in 2010 revealed that there are around 44,670 new cases and 1,690 deaths caused by this disease every year[1].

The majority of malignant lesions of the thyroid, such as papillary carcinoma, medullary carcinoma and undifferentiated histological types, can be diagnosed by cytological criteria using samples obtained by fine-needle aspiration biopsy (FNAB) guided by ultrasonography[2]. Likewise, the diagnosis of benign lesions, such as hyperplastic nodules, colloid nodules and auto-immune diseases like thyroiditis, can be cytologically established[3]. However, to distinguish between malignant and benign lesions histological demonstration is often required for a precise diagnosis. Therefore, they are cytologically grouped as undetermined tumors or suspected follicular neoplasia[47] and patients often undergo a diagnostic surgical procedure (thyroidectomy) even though the general carcinoma rate of this condition is very low[8]. Thus, the immunohistochemistry method plays a complementary role in the attempt to clarify this dilemma[9].

Many studies employ immunohistochemistry techniques as an attempt to search for markers involved in the genesis or specific characteristics of follicular patterned tumors. Among the immunocytochemistry (ICC) or immunohistochemistry (IHC) markers most employed to distinguish between benign and malignant lesions of the thyroid are: cytokeratin-19 (CK-19: a keratin member family responsible for the structural integrity of epithelial cells), galectin-3 (Gal-3: involved in the process of cell migration, adherence and apoptosis) and Hector Battifora mesothelial-1 (HBME-1: an unelucidated membrane antigen that exists in the microvilli of the mesothelioma cells and also in follicular thyroid tumor cells) or their associations[10]. However, their results and applications are still controversial since these molecules have not proved to have specificity and – more critically, to avoid an eventual diagnostic thyroidectomy – enough sensitivity in the differentiation of follicular lesions because of persistent variable rates of, respectively, false-positive and false-negative results[11].

In view of this, the objective of the present study was to establish the diagnostic accuracy of CK-19, Gal-3 and HBME-1 markers, and their associations, for the differentiation between benign and malignant thyroid lesions.

Material and methods

Systematic review

A search for articles published exclusively in the English language between January 2001 and December 2011 was carried out in the electronic databases MEDLINE and The Cochrane Library.

A wide strategy was employed in the search in order to avoid publication bias, and the following describers were used: ((ck-19 and thyroid) OR (galectin-3 and thyroid) OR (hbme-1 and thyroid)). Reference lists of previously obtained articles were also analyzed so that other relevant studies could be identified for inclusion in the present study.

The exclusion criteria adopted for both the study as a whole and for cases individually selected were as follows: inability to obtain individual data, review articles, case reports, use of the same sample, absence of case or control groups (as control group was considered any diagnosis of benign thyroid lesions, such as: goiter, follicular adenoma, thyroiditis, hyperplasic nodules or normal thyroid samples), fewer than 12 patients in each group (both case and control), individuals under 18 years, use of any organ other than thyroid, use of any marker other than CK-19, Gal-3 or HBME-1, inclusion of another histological malignant type other than the well-differentiated thyroid carcinoma, use of techniques other than immunocytochemistry or immunohistochemistry, use of specimens other than human, use of specimens other than those obtained exclusively from the thyroid gland (for example, blood and derivatives).

The data from the studies was independently collected by two researchers, who employed a standardized form. The following information was extracted: reference, number of patients in the case and control groups, technique employed (immunocytochemistry or immunohistochemistry), histological types of neoplasias studied and results (stratified into four groups: true-positive; false-positive; false-negative; and true-negative). Differences in the data extracted were resolved by group consensus.

Initially, 265 abstracts were selected and, after applying the established criteria above, 66 articles were included in the meta-analysis itself with 5,168 patients, as shown in Figure1.

Figure 1
figure 1

Flowchart of article selection.


The Meta-DiSc® Program (Clinical BioStatistics Unit – Hospital Ramón y Cajal, Madrid, Spain) was employed in all the analyses[12]. The method applied was the meta-analysis of diagnostic tests of independent studies stratified according to the size of the sample in each study, using Mantel-Haenszel’s method: a fixed effect estimated from the size of each study calculated by the inverse of its variance. The random effect of each study was established by the DerSimonian-Lair method and the presence of heterogeneity among the studies was estimated by the Cochrans Q-Test and was considered significant when P<0.1.

Values of sensitivity, specificity, positive and negative likelihood ratios, as well as their confidence intervals (95% CI), were calculated separately for each study and also for the studies grouped according to the type of marker or associations. Forest-plots of the most relevant results were performed.

The diagnostic odds ratio (dOR) was also calculated. It is an additional measure that expresses the accuracy of the test and represents how much greater the chance of achieving exactitude is when the test is positive as opposed to when the test is negative.

Complementarily, ROC (Receiver Operating Characteristic) analysis was done and areas under the summary ROC curves were calculated. This method is different from conventional ROC analysis, which compares test accuracy over different thresholds for positivity, because in an SROC graph each data point comes from a different study, but diagnostic thresholds should be similar for each study so as not to influence the shape of the curve[13].


The analyses of diagnostic accuracy of markers CK-19, Gal-3 and HBME-1, and their associations, in the differentiation of well-differentiated carcinoma and benign thyroid lesions, were evaluated separately by the immunohistochemistry and immunocytochemistry techniques, as described below.

Immunohistochemistry technique

The present meta-analysis included 49 articles and 5168 patients in the broader analysis, with variable rates of true-positive and true-negative tests, and with a considerable rate of false results (Table1).

Table 1 Number of studies, patients and their distributions included in each analysis by the immunohistochemistry technique

The values for sensitivity, specificity, and likelihood ratios and their respective heterogeneity coefficients are detailed in Table2 and Table3. It was noted that these values are very discrepant and not so high when the immunomarkers are analyzed alone. Nonetheless, the association of markers can significantly increase the diagnostic rates but with an important loss of references.

Table 2 Sensitivity and specificity of each immunohistochemistry marker or association
Table 3 Positive likelihood ratio (Positive LR) and negative likelihood ratio (Negative LR) of each immunohistochemistry marker and association

Diagnostic odds ratio (dOR) was calculated directly from sensitivity and specificity values (Table4). This measurement represents the overall diagnostic power of each test (a high dOR implies that the test shows good diagnostic accuracy in all patients) and, as seen, the test with greatest diagnostic accuracy and least inconsistency in the distinction between benign and malignant thyroid lesions is the positivity of the three combined markers (CK-19, Gal-3 and HBME-1). Thus, the forest-plot charts that summarize the individual results of the articles selected for this analysis in a global rate (“diamond” as pooling symbol) for sensitivity, specificity, positive and negative likelihood ratios are represented in Figure2.

Table 4 Diagnostic odds ratio (dOR) calculated for each immunohistochemistry marker or combination
Figure 2
figure 2

Forest-plot graph with results for sensitivity, specificity, positive and negative likelihood ratio of immunohistochemistry expression of the positive combination of CK-19, Gal-3 and HBME-1 in the diagnosis of well-differentiated malignant thyroid lesions [[14],[17],[29]].

Immunocytochemistry technique

The same analysis was performed for the three markers with the exclusive aim of making a preoperative diagnosis of thyroid lesions. However, the combination of markers suitable for the application of meta-analysis was not identified in the literature and the results were only based on the individual expression of each molecule.

This analysis included 17 articles with a special focus on HBME-1 analysis with 3900 samples included. False-negative and false-positive rates were significant, and diagnostic results showed that Galectin-3 had low negative LR and high sensitivity, specificity and positive LR with the highest diagnostic odds ratio, an analysis with less heterogeneity; demonstrating that this marker is the best at making a preoperative distinction between benign and malignant thyroid lesions. The results are described in Tables5,6,7, and8.

Table 5 Number of studies, patients and their distributions included in each analysis, for the immunocytochemistry technique
Table 6 Sensitivity and specificity of each immunocytochemistry marker
Table 7 Ratios of positive likelihood (Positive LR) and of negative likelihood (Negative LR) of each immunocytochemistry marker
Table 8 Diagnostic odds ratio (dOR) calculated for each immunocytochemistry marker

Exploring heterogeneity

The first factor of heterogeneity loss in the analyses employing the immunohistochemistry technique was the combination of markers, as previously shown. Therefore, it became clear that none of these molecules, when studied independently, can reliably differentiate between benign and well-differentiated malignant tumors of the thyroid.

Therefore, in a search for other factors involved in the determination of heterogeneity causes, the following possible confounding variables were evaluated: inclusion of oncocytic or Hürthle cells in the sample and/or the criterion adopted to consider a marker as “positive”.

When both techniques (imunocytochemistry and immunohistochemistry) are considered, the review of the selected studies indicated that some authors actually included oncocytic patterned tumors (or Hürthle cell neoplasias) in their samples. Hürthle cells are characterized by their wide and granular cytoplasm and, besides, most oncocytic lesions at cytology are shown to be benign lesions upon histopathological examination (Hürthle cell adenomas, hyperplastic nodules, thyroiditis and Graves’ disease), the mere presence of these cells in a cytological exam indicates a greater likelihood of malignancy (Bethesda IV), regardless of other criteria[78]. These factors make an exact etiological preoperative diagnosis of these neoplasms even more difficult[79].

It was also observed that in some studies the immunostaining was considered to be positive when at least 5% of the cells expressed the marker, whereas this minimum percentage was considered by others to be 10%, 25%, or even 50%. Also, when the immunostaining was weak, heterogeneous or sometimes even focal, it was likewise, considered positive.

Thus, when the combination of markers was then analyzed (only for immunohistochemistry analysis), and after removing Hürthle cells from the data and reclassifying the cases with a percentage of immunostained cells below 25%, weak or focal marking as “negative”, it was possible to exclude the previously noted heterogeneity from the groups (data not shown). SROC curves were plotted at this time to thresholds of the different studies to make them more similar and to better illustrate these results (immunohistochemistry – Figure3 and immunocytochemistry – Figure4). However, when these same variables were excluded and a new analysis of CK-19, Gal-3 and HBME-1 was undertaken separately, there was always the presence of unequivocal heterogeneity for the immunohistochemical technique. As the meta-analysis was performed exclusively on published studies and did not use the authors’ original data, in some instances the criteria described above could not be applied with certainty; thus, these articles were excluded from the analysis.

Figure 3
figure 3

SROC curve for positive immunohistochemistry expression of the association of the three markers (CK-19, Gal-3, HBME-1) in the diagnosis of well-differentiated malignant thyroid lesions. Area under the SROC curve: 93.25%.

Figure 4
figure 4

SROC curve for positive immunocytochemistry expression of CK-19, Gal-3 and HBME-1 in the diagnosis of well-differentiated malignant lesions of the thyroid. Areas under the SROC curve: CK-19=86.32%; Gal-3=97.07%; HBME-1=94.12%.


The preoperative diagnosis of thyroid lesions is not the only challenge faced by pathologists. Very often, establishing the differential diagnosis between benignancy and malignancy of a thyroid nodule, based only on the histopathological exam, can be quite difficult.

One of the greatest research challenges involving well-differentiated thyroid carcinoma is to develop a method to enable the correct differential diagnosis between benign and malignant lesions, trying to avoid a diagnostic surgery. To really reach this objective a test would need to have an especially high sensitivity rate,[11] but it has not yet been achieved in the literature even when genomic classifiers are employed[80]. Thus, the search for a “marker” that enhances this diagnostic capability is ongoing[81].

Cytokeratin-19 (CK-19) expression in thyroid nodules is in general intense and diffuse in papillary carcinoma and heterogeneous labeling in carcinoma and in follicular adenoma, with nil or low expression in other benign lesions[30, 82]. Galectins, especially galectin-3, are suggested to play a role in the pathogenesis of well-differentiated thyroid carcinoma, particularly in papillary carcinoma[83] and, therefore, it is one of the markers most commonly used to assist in distinguishing thyroid lesions. Hector Battifora mesothelial-1 (HBME-1) has been demonstrated to be important as a thyroid marker of follicular origin, with greater affinity to malignant lesions when compared to benign lesions[84]. Because of that, they are the three most used immunomarkers in pathology practice and each of them had different rates of false-negatives and false-positive results and some authors advocate that a panel of the three markers might be more helpful than the use of a single immunomarker, improving the specificity, positive and negative predictive value and thus diagnostic accuracy[85].

The main contribution of this meta-analysis was to precisely quantify the accuracy of values of these three important markers employed in clinical practice. Several literature reviews have already been published but the present study is the first to analyze cumulative data and is worthy for this reason.

As demonstrated, the association of positivity for CK-19, Gal-3 and HBME-1 in IHC assays and the preoperative expression of galectin-3 in ICC samples proved to be highly accurate tests in the distinction between benign and well-differentiated thyroid carcinoma. This is further noted when heterogeneity factors were disregarded; the SROC analysis showed a global accuracy of more than 90% in this situation.

However, these results must be analyzed with great care. Despite the fact that the accuracy rates are, in general, high there is a considerable percentage of false-results. When a diagnostic test could potentially produce a false-negative result this is not a good reason to take a watchful waiting approach, especially when a malignant neoplasm is the object of the study, and many patients are subjected to a theoretically unnecessary diagnostic surgery, with associated morbidity and mortality rates.

Another important point of this study was the determination of heterogeneity variables involved in the analysis of tumors markers employed in thyroid nodule diagnosis. Thus, the combination of markers, the exclusion of Hürthle cells and the review of what must be considered positive immunostaining were the main heterogeneity factors identified. Another possible heterogeneity factor that might be considered and that was not possible to evaluate in this research has to do with the technical methodology applied in the immunohistochemical reactions like specimen fixation, monoclonal or polyclonal antibodies, biotin-free detection method, etc. These parameters should be standardized in future works in order to achieve uniformity in the studies and improvement in diagnostic accuracy of the immunocytochemistry and immunohistochemistry methods.

Nevertheless, this study has some limitations. The present review might have been influenced by publication bias since it was limited to articles in English and included only published articles. However, the wide search criteria applied and the rigorous exclusion criteria have helped to ensure the inclusion of the most relevant studies.

In summary, this meta-analysis demonstrated that the three studied immunomarkers are accurate in making a pre- and postoperative distinction between benign and malignant thyroid lesions with accuracy of around 90% for both immunocytochemistry and immunohistochemistry assays, despite avoiding variables responsible for heterogeneity in the analysis. Although, the search for other molecular markers must continue in order to enhance this diagnostic accuracy since the results found still show persistency of false-negative and false-positive tests.