Change in newly diagnosed Graves’ disease phenotype between the twentieth and the twenty-first centuries: meta-analysis and meta-regression

Purpose According to a few recent studies, the clinical phenotype of Graves’ disease (GD) at onset is becoming milder in recent years, in terms of prevalence and severity of hyperthyroidism, goiter and overt eye disease. The aim of this study was to assess the change in GD phenotype across the late twentieth and the early twenty-first centuries. Materials and methods We carried out a systematic search of studies published between 1/1/1980 and 12/31/2017 describing naïve GD patients at diagnosis. We collected epidemiological, clinical, biochemical and serological data reported in the selected studies, and (1) conducted a single-arm meta-analysis to compare clinical and biochemical characteristics of naïve GD patients before and after year 2000 and (2) performed a meta-regression to identify the trend of the observed clinical presentations. Results Eighty selected articles were related to the period before the year 2000, 30 to the years 2000–2017. According to demographics, the two defined populations were homogeneous at meta-analysis: overall estimated female prevalence was 81% [95% CI 79–82], mean estimated age of the entire population was 39.8 years [95% CI 38.4–41.1], with no significant differences between pre- and post-2000 groups (p > 0.05). The overall estimated prevalence of smokers was 40% [95% CI 33–46], with no significant difference between the two groups (p > 0.05). Mean estimated free thyroxine (FT4) and free triiodothyronine (FT3) levels at diagnosis were higher in the pre-2000 group: 4.7 ng/dl [95% CI 4.5–4.9] for FT4 and 14.2 pg/ml [95% CI 13.3–15.1] for FT3, as compared to the post-2000 group: 3.9 ng/dl [95% CI 3.6–4.2] for FT4 and 12.1 pg/ml [95% CI 11.0–13.3] for FT3 (all p < 0.01). Goiter estimated prevalence was higher in the pre-2000 group, 87% [95% CI 84–90], than in the post-2000 group, 56% [95% CI 45–67]. Estimated prevalence for Graves’ Orbitopathy (GO) was 34% [95% CI 27–41] in the pre-2000 group and 25% [95% CI 19–30] in the post-2000 group (p = 0.03). Accordingly, meta-regression adjusted for covariates showed an average annual reduction of FT4 (− 0.040 ± 0.008 ng/dl, p < 0.0001), FT3 (− 0.316 ± 0.019 pg/ml, p < 0.0001), goiter prevalence (− 0.023 ± 0.008%, p = 0.006), and goiter size (− 0.560 ± 0.031 ml, p < 0.0001). Conclusions Our meta-analysis and meta-regression confirmed that GD phenotype at diagnosis is nowadays milder than in the past; we hypothesize that conceivable factors involved in this change are iodoprophylaxis, worldwide decrease in smoking habits, larger use of contraceptive pill and micronutrient supplementation, as well as earlier diagnosis and management. Supplementary Information The online version contains supplementary material available at 10.1007/s40618-020-01479-z.


Introduction
Graves' disease (GD) is the most common cause of hyperthyroidism in iodine-sufficient areas [1], ultimately caused by antibodies directed against the TSH-receptor (TRAb). Cumulative lifetime risk to develop GD is 3% for women and 0.5% for men [2], with an incidence of 20-50 cases/100,000 people per year [3]. GD occurs at any age, with the highest risk of onset in the 3rd-5th decades of life, and a 1 3 female-to-male ratio between 4:1 and 10:1 [4,5]. TRAb are responsible for the main clinical manifestations of the disease: hyperthyroidism, goiter and orbitopathy, the so-called "Merseburg triad", by the name of the German city in which the physician Karl von Basedow practiced and described the clinical pattern of this disease [6]. Graves' orbitopathy (GO) incidence is reported to be 16 cases per 100,000 per year in females and 3 cases per 100,000 in males [7]. Besides GO, rare extrathyroidal manifestations include thyroid dermopathy (or pretibial myxedema) and acropachy [8].
GD is easily diagnosed if hyperthyroidism is associated to extrathyroidal manifestations; diagnosis is more challenging if hyperthyroidism is mild, goiter is absent or multinodular, extrathyroidal manifestations are lacking. In those cases, diagnosis is mainly based on TRAb detection, eventually corroborated by color-flow Doppler ultrasonography, thyroid scintigraphy and/or thyroid uptake of radioactive iodine [9].
Clinical GD features at diagnosis can be heterogeneous, and the clinical phenotype at onset may have changed in the last decades as compared to the classical pattern described in the Merseburg paradigm. According to our recent clinical experience, GD phenotype at diagnosis seems milder in recent years as compared to previous decades. This impression is corroborated by a few articles published in the latest decades, suggesting a lower incidence and severity of GD features as compared to frequencies reported in the literature. In particular, in a 2016 article from two Northern Italy centers (ours, based in Varese, and another based in Pavia), a significant proportion of GD patients at diagnosis presented with mild or moderate GD, about half of them presenting normal thyroid volume, and only 20% presenting GO [10].
To further investigate these preliminary data, aiming to define if the change in clinical presentation between centuries was significant, we first conducted a single-arm metaanalysis to compare clinical and biochemical characteristics in naïve GD patients before and after the year 2000; then, we employed meta-regression to identify the determinants of the observed clinical presentations, among the sample characteristics reported in the primary studies, taking into account the study-specific designs.

Search strategy
In this systematic review and meta-analysis, we adopted procedures consistent with the PRISMA (Preferred Reporting Items for systematic reviews and meta-analyses) [11] and the MOOSE (Meta-analysis of Observational Studies) [12] guidelines. Two authors (C.C. and S.I.) independently searched the online databases MEDLINE (PubMed), Embase, Google Scholar, and the Cochrane Central Register of Controlled Trials, using "Graves' disease" and synonyms, restricting to the keyword fields whenever available in the bibliographic database. We applied the most inclusive search strategy, considering any type of study (observational studies, and randomized controlled trials, including also those reporting retrospective analyses) and studies in any language, published between 1/1/1980 and 12/31/2017. Studies published after 1980 reporting data from an earlier recruitment period were also included, consequently, data reported in this article start from 1972. The search strategy was refined with manual searches of reference lists.

Study selection
Studies were eligible for inclusion if they described the clinical, biochemical or serological features of patients with newly diagnosed GD. We excluded: (1) studies reporting the features of patients who had already been treated for GD, (2) studies with inclusion and/or exclusion criteria that select a subset of patients with newly diagnosed GD (e.g., only patients with large goiter), (3) case reports, and (4) studies with overlapping data.
Two authors (C.C. and S.I.) independently selected potentially eligible studies for inclusion. Non-relevant articles were excluded based on title and abstract and duplicates were removed. The full text of the remaining eligible papers was examined in detail to determine their inclusion. Disagreements were solved by discussion between the two authors.
The list of articles included for analysis is detailed in Supplemental Table 1.

Data collection
Two authors (C.C. and S.I.) extracted the following data from the included studies in a piloted data extraction form: (1) author, publication year, start year, end year, study design; (2) number of patients, gender, ethnicity, mean age, smoking habits; (3) biochemical severity of hyperthyroidism: free thyroxine (FT4) and free triiodothyronine (FT3) levels; (4) autoantibody profile: positivity for TRAb and their levels, positivity for thyroid peroxidase antibody (TPOAb); (5) presence of GO and GO severity; and (6) presence of goiter (clinical and/or ultrasonographic), goiter volume.
We considered each subgroup as a different population if a study reported separately data from different subgroups of patients (e.g., stratification according to patients' age). In data analyses, we considered the central year of the recruitment period to allocate the studies before or after 2000, or the publication year when recruitment period was not reported.
Prevalence for dichotomous data or mean and standard error for continuous data, along with their sample size, were used for analyses. When data were available only as median and interquartile range, mean and standard deviation were estimated [13] to allow the inclusion in the meta-analysis. Thyroid function tests (FT4, FT3) were evaluated as raw data, suitably converted into ng/dl (FT4) or pg/ml (FT3), and then adjusted for upper limit of normal (ULN) of the given reference range (laboratory value/ULN), to reduce potential bias related to the measurement method. Similarly, TRAb was assessed in terms of prevalence (% positive) and adjusted for upper limit of normal of the given kit. TPOAb was assessed in terms of prevalence (% positive). Goiter at diagnosis was investigated in terms of prevalence and volume, assessed either by palpation (according or analogue to WHO classification [14]) or by ultrasound thyroid lobe diameter measurement, calculated using the formula for a prolate ellipse: (volume = 0.5 {length x depth x width}), and considering 18 ml for males and 14 ml for females as cutoffs to identify goiter presence. Orbitopathy was diagnosed according to different standards in the published articles, either according to clinical evaluation (considering exophthalmos, diplopia or impaired vision), or using standardized criteria, i.e., EUGOGO [15] or NOSPECS [16] classification; due to the high heterogeneity we decided not to compare GO severity between studies, but to merely analyze the overall GO prevalence for each study.

Outcomes
In the meta-analysis, the primary endpoint was to evaluate the change in the phenotype of newly diagnosed GD between the last decades of the twentieth century  and the first years of the twenty-first century (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). The year 2000 as cutoff was chosen arbitrarily, aiming to divide in half a time span from the 80s (year from which most studies were available) up-to-date. Factors considered to evaluate the phenotype were: severity of hyperthyroidism, prevalence of GO and its severity, prevalence of goiter and its volume, autoantibody prevalence and titer. Using a meta-regression, the secondary endpoint was to evaluate if there was a progressive trend in the change of these features from 1972 to 2017, using patient-and study-specific characteristics as potential confounding variables.

Meta-analysis
Studies computation was performed using the Cochrane Collaboration Review Manager Software (RevMan version 5.3). For meta-analyses, the random-effects methods was used, according to DerSimonian and Laird [17], to take into account potential heterogeneity between studies. For each meta-analysis, between study heterogeneity was assessed using I 2 , with values higher than 0 indicating presence of heterogeneity. The analyses were stratified for recruiting years, pre-and post-2000. Test for subgroups differences were tested using Chi-square.

Meta-regression
A meta-regression was performed using a linear mixed regression using SAS software (SAS Institute, Cary-NC, version 9.4), considering fixed and random-effect models. Univariate and multivariate analysis were performed, adjusting for age, sex, ethnicity, year of study (median year if recruiting was conducted in more than one year or publication year if recruiting period was not available), and therapeutical indication (none vs. selected therapeutical indication in the original paper).

Included studies
The literature search using the strategy mentioned above yielded 12,765 papers. All these records were screened in two steps and assessed for eligibility. One hundred and ten papers provided sufficient quantitative data and were included in the meta-analysis. Eighty papers were included in the pre-2000 group (73 papers published between 1/1/1980 and 12/31/1999, and 7 papers published after 1/1/2000 but referring to studies conducted-completely or mainly-before 2000), 30 papers conducted and published between 1/1/2000 and 12/31/2017 were included in the post-2000 group.
Flow-chart of study selection strategy is depicted in Fig. 1.
A summary of all the meta-analysis performed is provided in Table 1, whereas meta-analyses details and forest plots are available in a dedicated data-repository [18]; metaregression results are depicted in Table 2  According to demographics, the two defined populations resulted homogeneous at meta-analysis. Gender and ethnicity estimated proportion were not different between groups: 81% [95% CI 79-83] of females and 80% [95% CI 79-81] of Caucasians in both pre-and post-2000 groups (p = 0.93 and p = 0.51 at test for subgroup differences, for gender and ethnicity proportion, respectively, Supplemental Fig. 1 and 2 [18]), yet the metaregression showed a minor, but significant, increase of age at diagnosis over time (0.300 ± 0.012 per year, p < 0.0001, Table 2 and Fig. 2a) ). This result was confirmed using FT4 levels adjusted for upper limit of normal (ULN) of the given reference range (FT4/ULN), to reduce potential bias related to the measurement method: estimated mean ratio was lower in the post-2000 group Records not dealing with clinical features of Graves' disease and therefore excluded (n = 12239 )  Fig. 6 [18]). Indeed, meta-regression demonstrated a progressive FT4 reduction over the years (− 0.040 ± 0.008 ng/dl per year, p < 0.0001, Table 2 and Fig. 2b) Fig. 9 [18]). Indeed, meta-regression demonstrated a progressive goiter reduction both in terms of prevalence (− 0.023 ± 0.008% per year, p = 0.006) and in terms of volume (− 0.560 ± 0.031 ml per year, p < 0.0001, Table 2 and Fig. 2c).

Autoimmunity
TRAb laboratory measurement was clearly heterogeneous among different studies and different years. Therefore, we partially overcame this bias evaluating TRAb positivity prevalence and adjusting antibodies level for the upper limit of normal. We found a significant increase in the estimated prevalence in the post-2000 group (n = 2175, mean prevalence 88% [95% CI 83-93]) than in the pre-2000 group (n = 2095, mean prevalence 72% [95% CI 67-78]), p < 0.0001 at test for subgroup differences (Table 1 and Supplemental Fig. 11 [18]); indeed, a mild increase through years in both antibodies prevalence and adjusted serum levels was found at meta-regression study, though at multivariate analysis only serum-adjusted levels reached statistical significance (0.009 ± 0.005 per year, p = 0.109 for TRAb   Fig. 12 [18]).

Discussion
GD is an autoimmune disease, ultimately caused by TSHreceptor antibodies targeting thyrocytes, and thereby causing, in most cases, thyroid hyperplasia, hyperthyroidism and extrathyroidal manifestations. GD pathogenesis is complex and not fully understood yet, and involves both genetic and environmental factors [2], accounting for the differences in clinical presentation among patients and, possibly, for a phenotype change through decades. Although the Merseburg triad is still characterizing many cases of GD at presentation, severity and prevalence of these manifestations seem less prominent in latest years. To further explore this issue, our analysis was conducted analyzing articles published from 1980 to 2017, describing naïve, untreated GD patients' features at diagnosis.
Demographic characteristics were similar between the two centuries and comparable to the literature [19]: female sex was predominant as expected, with a pooled prevalence of 80%. Interestingly, mean age of the overall cohort was about 40 years, with no strong differences between groups at meta-analysis but showing a minor, though significant, increase in age at GD diagnosis per year at meta-regression: this outcome supports the hypothesis that the change in GD phenotype is not exclusively due to an earlier diagnosis: in fact, if that were the case, patients at diagnosis should have been younger, whereas they have similar, if not slightly older, age.
Smoking prevalence, the main environmental risk factor for GD and especially GO [20], did not change significantly between the pre-2000 and post-2000 cohorts, the pooled estimated prevalence of smokers in the overall GD cohort was 40%. At a first glance, this proportion seems superimposable to the global prevalence of smoking, since nearly 47% of men and 12% of women smoke [21]; however, we must take into account that smoking prevalence is much higher in males, whereas GD is greatly predominant in females, confirming previous findings showing that smokers are overrepresented amongst patients with GD [20]. Therefore, it is challenging to determine if and how smoking has an impact in GD phenotype modifications; an intriguing hypothesis is that a decrease, if not in prevalence at least in the number of cigarettes smoked per day, and the smoking restrictions in public locations made by most developed countries, with consequent reduced second-hand smoke, may have contributed to decrease the chronic cell damage and inflammation caused by smoking on GD clinical manifestations; even if the overall prevalence of smokers did not change between centuries, those attempts to limit tobacco use may have contributed to a milder phenotype.
Hyperthyroidism is the main manifestation of GD; in our analysis, hyperthyroidism severity at diagnosis was lower in the post-2000 group, either considering FT4 or FT3 as continuous value or adjusted for upper limit of normal. This result might have different explanations: on one hand, in recent decades, there might be an increased awareness for overall and thyroid health, both in the general population and among general practitioners, leading to an earlier diagnosis, thereby preventing biochemical worsening of uncontrolled hyperthyroidism; on the other hand, mild or subclinical hyperthyroidism at diagnosis might merely express a milder phenotype, remaining stable and not necessarily progressing to overt and more severe forms [22].
Goiter prevalence significantly diminished in the post-2000 cohorts, with meta-regression nicely depicting a progressive volume reduction trend from late 70s to present time, with a plateau in the latest years, yielding an estimated mean thyroid volume at diagnosis of about 35 ml. Goiter evaluation could be biased by the different iodine status in different geographical areas and by different improvement in iodoprophylaxis strategies in recent decades; iodine supplementation is nowadays reported to be optimal in most countries, though 28 countries still have insufficient iodine in their diets [23]. We believe that the worldwide increased awareness of iodine supplementation has been one of the main reasons for the downward trend in goiter prevalence and volume. Considering Italy, a nation with a long history of iodine deficiency, after an effective iodine prophylaxis by iodized salt, urinary iodine excretion has recently reached the cutoff value for iodine sufficiency [24]. Moving from general population to GD cohorts, according to Italian reports, in two studies performed between 1980 and 1990s [25,26], the prevalence of large goiter (volume > 40 ml) was reported in 52% and 67% of patients, respectively; conversely, in a more recent report, 75% of patients presented no or small goiter at diagnosis [10]. It is conceivable that iodoprophylaxis, especially in iodine-deficient areas, had, at least partially, a role in the diminished goiter prevalence and volume at GD diagnosis observed in this meta-analysis.
GO is the most common extrathyroidal GD manifestation; the estimated GO prevalence in the post-2000 cohort was 25%, significantly lower than 34% of the pre-2000 cohort, meaning that nowadays only one newly diagnosed GD patient out of four have signs and symptoms of GO. This finding is consistent with recent studies, that have also identified a lower severity and activity in most of newly diagnosed GD patients [10,27]. Unfortunately, we had sparse and heterogeneous data on GO severity; therefore, we could not compare those features in our analysis.
TRAb are the principal etiopathogenic factor of GD, and their evaluation is of fundamental importance both at diagnosis and in the follow-up of the disease [28]; indeed, TRAb persistence or de novo elevation after antithyroid drug therapy discontinuation is a risk factor for relapse of the disease [29]. Many types of immunoassays, both competitive-binding or functional cell-based assays, have been used to measure TRAb throughout the years; therefore, to minimize this bias, on one hand, we assessed TRAb positivity prevalence, according to the given method, as well as serum levels adjusted for upper limit of normal for the specific measurement method employed in the article. Virtually, all patients with GD should test positive for TRAb, but this has not always been the case, mostly due to the low sensitivity of older assays. TRAb prevalence in the post-2000 group was significantly higher than in the pre-2000 group. Furthermore, by meta-regression, we observed a trend toward an increase in TRAb prevalence and levels from the 80s and the mid-90s, and a subsequent plateau and a prevalence of approximately 90% in the latest decades, TRAb levels being on average six times higher than the upper limit of laboratory-specific cutoff values. We believe that this increase is mainly due to the increased sensitivity of immunoassays and bioassays developed and implemented in latest years [30].
Other thyroid autoantibodies are frequently found in GD, but their role is not completely understood. A recent study demonstrated that the presence of TPOAb or thyroglobulin antibodies did not change presenting clinical and biochemical features of GD, but might play a role in transition from hypothyroidism to hyperthyroidism at disease onset, possibly due to the concomitant presence of blocking (or neutral) and stimulating TRAb pool [31]. In our analysis, TPOAb had no relevant role in the phenotype change throughout the decades, since pooled estimated prevalence of TPOAb was around 70%, not different between the pre-and post-2000 groups.
Apart from iodoprophylaxis and worldwide decrease in smoking habits, an intriguing role tempering GD phenotype could have been played by exogenous estrogens, especially considering the high female predisposition for GD; indeed, oral contraceptive pill use has risen in the latest decades [32], and estrogens are considered to be protective against GD [33]. Another interesting, still unproven, possibility is the increased awareness of the importance of micronutrient supplementation, such as vitamin D, which has been reported to be deficient in female patients with GD [34], and selenium, whose supplementation has proven to be beneficial in case of mild GO [35].
Finally, we believe that this milder phenotype has contributed to the change in the therapeutical approach to GD. For instance in the USA, the preferred first-line treatment for GD historically was 131I-radiotherapy; however, the proportion of patients who undergo first-line 131I-radiotherapy is decreasing in USA, with an increased use of thionamides prescriptions [36]. In the end, medical therapy is the first-line therapy for GD worldwide [1]. This circumstance could be due to many reasons, but milder GD phenotype at diagnosis could be seen as a favorable factor to utilize a conservative approach with antithyroid drug as first-line treatment.
Despite our efforts, this meta-analysis and meta-regression has some limitations: (1) though we considered only data from newly diagnosed, untreated patients, there could be a possible selection bias for studies focused on a specific therapy (medical treatment, 131I-radiotherapy or surgery); to limit this issue, in the meta-regression, we adjusted the multivariate analysis for the presence of such therapeutical indications; and (2) as to laboratory findings, assays were heterogeneous; to minimize this bias and standardize our findings we employed different approaches: (1) when data were available only as median and interquartile range, mean and standard deviation were estimated [13]; (2) apart from the raw biochemical level, we considered also the proportion between the mean value compared to the upper limit of normal for the specific measurement method employed in the article; these expedients allowed us to harmonize results, but a certain degree of approximation must be taken into account; and (3) GO data were not reported in all studies, and classification was very heterogeneous; therefore, we decided not to compare and analyze severity of GO, but only its crude prevalence assessed by physical examination.
Funding Open access funding provided by Università degli Studi dell'Insubria within the CRUI-CARE Agreement. This study was supported in part from grants from the Ministry of Education, University and Research (MIUR, Rome) and the University of Insubria to Luigi Bartalena.

Compliance with ethical standards
Conflict of interest All the authors declare that they have no conflict of interest.
Research involving human participants and/or animals This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent For this type of study, formal consent is not required.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.