Thyrotoxicosis refers to a condition associated with an increase of the levels of free triiodothyronine (FT3) and free thyroxine (FT4) in the blood circulation [1]. It can be classified into two categories: destruction-induced thyrotoxicosis and stimulation-induced thyrotoxicosis. Destruction-induced thyrotoxicosis is often seen in patients with destructive thyroiditis (DT), including postpartum thyroiditis, painless thyroiditis, amiodarone-induced thyrotoxicosis and subacute thyroiditis, whereas stimulation-induced thyrotoxicosis is often observed in Graves’ disease (GD) [2]. In terms of treatment, GD can be treated by antithyroidal medication, radiation therapy or thyroidectomy, but DT is generally treated conservatively [3]. Given that the different prognosis and treatments of these two conditions, it is extremely important to make a correct and rapid differential diagnosis from each other.

Radioactive iodine uptake (RAIU) has been recognized as the most accurate test for discriminating thyrotoxicosis [3], the sensitivity and specificity are 100 and 90%, respectively [4]. But the results can be affected by iodine-containing foods or drugs. What more, RAIU is contraindicated during gestation and lactation [5]. Serum thyroid stimulating hormone (TSH), ratio of serum tri-iodothyronine to thyroxine, serum thyroid stimulating hormone receptor antibodies (TRAb) and markers of inflammation also have been used to discriminate GD from DT [6].

The high intrathyroidal blood flow and increase in mean peak systolic velocity (PSV) of superior thyroid artery are signs of Graves’ disease [7]. Thyroid ultrasonography has enriched the diagnostic accuracy of thyroid diseases, including thyrotoxicosis [8]. The measurement of mean peak systolic velocity of superior thyroid artery (STA-PSV) by ultrasonography, which is easier and convenient, can provide qualitative and quantitative mention to clinicians in discriminating thyrotoxicosis [9]. However, no appropriate cut-off value criteria have been established about the STA-PSV, and the sensitivity and specificity of STA-PSV for the diagnosis of GD are different in different studies [10, 11]. In the present study, we performed a systematic literature review and meta-analysis to evaluate the diagnostic accuracy for differentiating GD from DT patients by STA-PSV.


Identification of studies

We searched the literatures published in Embase, Web of Science, PubMed, CNKI, Wanfang Data, and the VIP database before September 1, 2018. Key words searched were as follows: (“peak systolic velocity of superior thyroid artery”or“STA-PSV”or“color flow doppler sonography” or “doppler sonography” or “ultrasonography” or “echography” or “ultrasound”) and (“thyrotoxicosis” or “Graves’ disease” or “GD” or “destructive thyroiditis” or “DT” or “painless thyroiditis” or “postpartum thyroiditis” or “subacute thyroiditis”). Furthermore, other relevant published reports and the references of selected studies were also manually searched.

Inclusion and exclusion criteria

Studies were included if they met the following criteria (1). The research types were diagnostic studies on the peak systolic velocity of superior thyroid artery by using ultrasonography in patients with Graves’ disease or destructive thyroiditis (2). Patients can be subdivided into two groups: those with destructive thyrotoxicosis and those with Graves’ disease group (3). Destructive thyrotoxicosis was diagnosed on the basis of the symptoms, T3 to T4 ratio less than 20, T3 and T4 concentrations increased and TSH concentration decreased lasting for fewer than 3 months and/or later development of hypothyroidism, and/or low uptake on pertechnetate thyroid scan. Graves’ disease was diagnosed on the basis of clinical parameters, eye signs, T3 to T4 ratio greater than 20, and increased uptake on pertechnetate thyroid scan (4). The sensitivity and specificity (the number of true-positive, true-negative, false-positive and false-negative results) and their corresponding 95% confidence intervals (CIs), were provided or can be calculated. Abstracts, reviews, case reports, repeated publications were excluded.

Study selection and data extraction

After reading the title and abstract, the first selection was carried out by Xiaojuan Peng. Then, the full paper of qualified study was obtained. Xiaojuan Peng and Shenglan Wu assessed qualified studies for inclusion independently. Different opinions were resolved by discussing and consulting Shaohui Tang. The following were extracted from each selected study: first author name, country of study, year of publication, number of patients with thyrotoxicosis, destructive thyroiditis and Graves’ disease, cut-off value, raw data for analyzing sensitivity and specificity (the number of true-positive, true-negative, false-positive and false-negative) from the included studies.

Assessment of methodological quality

QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) was used to assess the quality of the included studies. The QUADAS-2 form is consists of four domains: (1) patient selection, (2) index test, (3) reference standard, and (4) flow and timing. Each domain is assessed in terms of risk of bias, and the first 3 domains are also assessed in terms of concerns about applicability. Signaling questions are included to help judge risk of bias. Risk of bias is judged as “low”, “high” or “unclear”. If the answers to all signaling questions for a domain are “yes”, then risk of bias can be judged low. If any signaling question is answered “no”, potential for bias exists. The “unclear” category should be used only when date are insufficiently reported to permit a judgment. Applicability sections are structured in a way similar to that of bias sections but do not include signaling questions. Concerns about applicability are rated as “high”, “low” or “unclear”. The results of quality assessment were used to provide an evaluation of the overall quality of included studies and to investigate potential sources of heterogeneity [12].

Statistical analysis

STATA version 12.0 (Stata Corp, College Station, Texas) was used to perform the statistical analysis. Statistical heterogeneity between studies was examined using the I2 value. If the heterogeneity was acceptable (I2 < 50%), a fixed-effects model was used; conversely, random-effects model was used. In this study, the following data were calculated: threshold effect, spearman correlation coefficient, diagnostic odds ratio (DOR, used to eliminate possible threshold effect), sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and area under receiver operating characteristic curve (AUC) .


Search results and study characteristics

384 relevant articles were identified in the initial search, and 240 were duplicates. In the remaining articles, after the titles and abstracts were reviewed, 104 articles were excluded. Full-text articles of the remained 40 articles were reviewed. Of these, 30 articles were not diagnostic studies, so they were excluded. Finally, 10 published articles [1, 5, 6, 10,11,12,13,14,15,16,17] were included. The process of study selection was summarized in Fig. 1. The 11 studies included 1052 patients, namely, 706 patients with Graves’ disease and 346 patients with destructive thyroiditis. In all patients, their thyroid function assessment by measurement of TSH, total T3, and free T4 were performed, and color-flow doppler ultrasonography of the thyroid gland were done. Population and characteristics of the included studies were listed in Table 1, and the results of ultrasonography of the study participants were listed in Table 2.

Fig. 1
figure 1

Flowchart of the study selection strategy

Table 1 Population and characteristics of included studies
Table 2 The result of ultrasonography of the study participants

Study quality

According to QUADAS-2, the result of the evaluation of the risk of bias and concerns regarding applicability of the included studies was reported in Fig. 2 a, b. As shown in Fig. 2, the included studies were generally of high quality.

Fig. 2
figure 2

Methodological evaluation according to QUADAS-2 of the included studies a overall and b by study

Combined results

The test for the heterogeneity among the studies showed significant heterogeneity (I2 = 65.75 and 65.08% for sensitivity and specificity, respectively), so the random-effects model was used. Meta-analysis results showed the pooled sensitivity and pooled specificity of STA-PSV by ultrasonography were 0.86 (95% CI, 0.80–0.90) and 0.93 (95% CI, 0.86–0.97) in distinguishing GD from DT, respectively (Fig. 3), with the area under receiver operating characteristic curve (AUC) of 0.94 (95% CI, 0.92–0.96) (Fig. 4), which was similar to the diagnostic accuracy for GD by radioactive iodine uptake (sensitivity 100%, specificity 90%) [4]. The average likelihood ratio of the positive and negative test result was calculated on the basis of the pooled estimates of sensitivity and specificity, and the results showed PLR and NLR of STA-PSV in differentiating GD from DT patients were 13.0 (95% CI, 6.1–27.8) and 0.15 (95% CI, 0.10–0.22), respectively. In addition, the mean DOR value of STA-PSV was 85 (95% CI, 33–220). In later steps of this meta-analysis, we repeated calculating all these pooled sensitivity and specificity with each of the 11 studies removed individually, and found that the final results were very nearly the same as the initial result. These findings reflect the stability and credibility of the results of this meta-analysis. For publication bias analysis, Deeks funnel plot asymmetry test was not significant (p = 0.97).

Fig. 3
figure 3

Forest plots of sensitivity and specificity of STA-PSV in distinguishing GD from DT. Plots display diagnostic probabilities of included studies, corresponding 95% confidence intervals

Fig. 4
figure 4

Receiver operating characteristic graph of STA-PSV in distinguishing GD from DT, with 95% confidence region and 95% prediction region. Individual study estimates are represented as circles


It is difficult to differentiate subclinical or mild GD from DT because of the absence of specific signs, such as ophthalmopathy, skin and nail changes. RAIU is the gold standard, but high cost, limited availability and contraindications to a radioisotope scan during pregnancy and lactation may restrict its application [15]. TSH receptor antibody level also can help in aetiological differentiation of thyrotoxicosis in difficult situations [18], while TSH-receptor-stimulating immunoglobulin bioassays are also costly and time-consuming. Color flow doppler ultrasonography (CFDS), a cost-effective, portable, safe, and noninvasive method [8], is now widely used to measure tissue vascularization and blood flow. CFDS of the thyroid gland, both qualitative and quantitative [19], helps in assessing thyroid gland functional status indirectly by studying the vascularity [7].

Compared with thyroid CFDS judgment, STA-PSV detection is more objective and accurate [20]. One study [19] reported that STA-PSV > 40 cm/s had a sensitivity of 94.0% and a specificity of 100.0% for the differential diagnosis of GD and DT in 65 patients with thyrotoxicosis. However, another study [10] showed the sensitivity and specificity of STA-PSV > 50.5 cm/s in distinguishing GD from DT in 304 patients with thyrotoxicosis were 70.87 and 96.88%, respectively.

It was reported that Graves’disease accounted for 95% of the cases with hyperthyroidism during pregnancy [21]. Poorly controlled Graves’ disease during pregnancy can cause serious complications in both the mother and the fetus [22], such as low birth weight [23], preterm birth [24], and congenital malformations [25]. Therefore, early diagnosis is essential to successful management. Because of many of the signs and symptoms are similar to normal physiologic changes that occur in pregnancy, diagnosing hyperthyroidism during pregnancy is challenging [26]. Ultrasonography may be a good choice for pregnancy, not only because of its relatively low cost, real-time capability, safety, and operator comfort and experience, but also due to the security and free of radioaction [27]. It was reported that the sensitivity of STA-PSV in differentiating GD from hyperthyroidism in pregnancy was 80–83% [14, 16].

In the present meta-analysis including 11 studies, we evaluated the accuracy of STA-PSV for the differential diagnosis of GD and DT, which is the first meta-analysis reported in this field. Our results showed that the pooled sensitivity and specificity of STA-PSV in differentiating GD from DT were 0.86 (95% CI, 0.80–0.90) and 0.93 (95% CI, 0.86–0.97), respectively, and the AUC was 0.94 (95% CI, 0.92–0.96). The AUC value ranges between 0 and 1, higher value indicating better test performance. Furthermore, the pooled PLR and NLR of STA-PSV in differentiating GD from DT were 13.0 (95% CI, 6.1–27.8) and 0.15 (95% CI, 0.10–0.22), respectively, and the mean DOR value of STA-PSV in differentiating GD from DT was 85 (95% CI, 33–220). Likelihood ratios greater than 10 or less than 0.1 generate large and often conclusive changes from pre-test to post-test probability, likelihood ratios of 5 to 10 and 0.1 to 0.2 generate moderate shifts in pre-test to post-test probability [28], and higher DOR value indicates higher accuracy. Taken together, the results indicate that STA-PSV by ultrasonography has a better diagnostic accuracy in the differentiation of GD from DT.

There were some limitations to this meta-analysis. Firstly, this meta-analysis only included 11 studies and 1052 subjects, so further subgroup analysis could not be performed due to a small number of included diagnostic studies and patients. Secondly, this meta-analysis coverage in the world was limited because that all included studies were from Asia, and there was no study from other area. Therefore, the value of our results is limited for other areas except for the countries involved in the study. Lastly, threshold effect and other potential factors might have influenced the results, although the sensitivity analysis was performed and reflected the stability and credibility of the results of this meta-analysis.


STA-PSV by ultrasonography is a useful diagnostic method in differentiating GD from DT. More studies from other countries are needed to further evaluate the accuracy of STA-PSV for the differential diagnosis of thyrotoxicosis.