Introduction

Congenital hypothyroidism (CH) is the most common congenital endocrine disease and an avoidable cause of severe mental retardation [1].

The CH pathogenesis may include the contribution of genetic and environmental factors [1,2,3]. Nevertheless, the pathogenic mechanisms of CH are still largely undefined, suggesting the involvement of unidentified genes or alternative mechanisms, also supported by the elevated frequency of discordance for CH phenotype between monozygotic twins [4,5,6].

Non-Mendelian mechanisms include epigenetic modifications that can produce phenotype changes without gene sequence variations but involve alterations in gene transcription [7]. In particular, DNA methylation is an epigenetic modification in which methyl groups are added to the cytosine residues within CpG dinucleotides, thereby preventing the binding of transcription factors to DNA. Interestingly, premature birth, which represents a relevant risk factor for CH, was previously associated with alterations in DNA methylation [8,9,10].

To date, systematic methylome analysis in CH has been performed only in the context of thyroid ectopy [11, 12]. However, no differences in methylation profile have been found between ectopic and orthotopic thyroid tissues [11] and between peripheral leucocytes of CH cases with ectopy compared to normal controls [12].

The aim of this study was to investigate the role of epigenetics in CH pathogenesis. To reach this goal we performed a genome-wide DNA methylation analysis in the peripheral whole blood from a large cohort of CH twins, 23 twin pairs (10 monozygotic and 13 dizygotic), of whom 4 concordant and 19 discordant for CH at birth.

Materials and methods

Study design/population

CH cases were enrolled in several Italian referral centers within a specific research protocol that was approved by the Ethics Committees of the involved institutions. The sample size was calculated considering a methylation difference between groups of at least 7% and a power of 80%, as previously reported [13].

Genetic analysis

Genetic analyses were performed on both affected and unaffected twins by NGS of a panel including 11 CH candidate genes, as previously reported [14].

Illumina humanmethylation450K BeadChip array

Array-based procedure was carried out following the manufacturer’s instructions and using Illumina-supplied reagents and conditions as described [15] after bisulfite conversion of genomic DNA.

Differential methylation analysis

Paired differential methylation analysis between case and control groups was performed using the RnBeads (2.4.0) package [16] in R environment (version 3.6.1).

Stochastic epigenetic mutations (SEMs)

Stochastic Epigenetic Mutations (SEMs) and regions enriched in SEMs were identified as previously described by our group [10, 15, 17,18,19].

Statistics

Statistical analyses were performed in R package, as reported in Supplemental Methods.

Results

Clinical data and genetic analysis

Ten MZ and 13 DZ pairs of twins were enrolled (Tables 1 and 2). Eight pairs, 2 MZ (#1A and B, #2A and B) and 2 DZ twins (#11A and B, #12A and B), were concordant for the CH diagnosis. Overall, 27 CH cases and 19 unaffected controls were enrolled. Thyroid dysgenesis was described in 14 cases (5 athyreosis, 1 hemiagenesis, 5 ectopy, 3 hypoplasia) while a gland-in-situ (GIS) of normal or enlarged size was described in 13 cases. The diagnosis at reevaluation was of permanent CH in 19 cases and transient in 8 cases, all cases with thyroid dysgenesis and 5/13 (38%) with GIS resulted permanent. One discordant MZ twin (#9B) at neonatal screening was confirmed as euthyroid at 10 years and diagnosed with hypothyroidism at 12 years.

Table 1 Clinical and molecular characteristics of monozygotic (MZ) twins
Table 2 Clinical and molecular characteristics of dizygotic (DZ) twins

Genetic analysis was performed in all CH patients and healthy co-twins. In particular, 41% (11/27) of CH children resulted to carry at least one variant in one of the 11 candidate genes.  While no difference in genetic data were seen among all the MZ twin couples, rare variants in the candidate genes were detected only in CH cases from 6/11 discordant DZ twin couples.

Immunological characteristic of subjects

Blood cell counts have been estimated from methylation data. The mixed effect regression model considering family as random effect and sex and batch effect as potential confounders failed to identify significant differences between the two groups (Fig. S2).

DNA methylation profiling using multi-dimensional scaling (MDS)

Dimensional reduction was used to visually inspect the dataset for strong signals in the methylation values. The MDS was adopted considering methylation signals from sites and considering genomic regions: Genes, CpG islands, Promoters and Tilings. No macroscopic differences were observed in the methylation profile between cases and controls considering sites or genomic regions (Fig. 1).

Fig. 1
figure 1

Dimensional reduction of methylation data. Scatter plots show samples after performing Kruskal's non-metric multidimensional scaling. Only the first two dimensions are shown. Subjects are represented according to the Sample Group variable

Similarly, we did not observe noticeable differences also taking into account the zygosity status, the presence of a genetic variant, the family origin or the thyroid morphology (Dysgenesis or GIS) (Fig. S3A–D).

Differential methylation analysis

Paired differential methylation analysis was computed both at site and at region level and results were represented as volcano Plot (Fig. 2). At site level, after False Discovery Rate adjustment, no statistical differences in methylation between cases and controls emerged. A list of the top 50 ranked differently methylated regions is reported in Table S1. Differential methylation analysis performed at regional level was conducted considering genes, CpG islands, promoters and tilings. After multiple testing correction, no statistical differences in region methylation levels emerged between cases and controls. A list of the top 50 ranked differently methylated regions is reported in Table S2.

Fig. 2
figure 2

Volcano plots representing paired differential methylation analysis. Analysis was carried out at site (left panel) and at regional level (right panels). Non-adjusted p values, expressed as – Log10 transformed values, are represented in the Y-axis while differences in methylation levels in the X-axis

Gene ontology and functional analysis

To focus the analysis on a more consistent list of markers, a Gene Ontology (GO) enrichment analysis was performed considering the list of genes and promoters found to be significant, at least, at nominal level (unadjusted p value < 0.05). GO enrichment on genes (Fig. S4) highlighted some biological processes, mainly concerning Olfactory Perception and G protein-coupled receptor signaling pathways. About promoter regions, analysis confirmed the same biological functions observed for gene classification (data not shown).

Stochastic epigenetic mutations (SEMs) analysis

The median number of hypo-methylated SEMSs (ln(SEMs)) was 8.059 (Q1 = 7.555; Q3 = 8.827) in the affected group and 7.698 (Q1 = 7.593; Q3 = 7.987) in the control group. The mixed effect regression model considering “sex” variable as potential confounder and “family” as random effect, indicated that this difference was significant (p = 0.02). Conversely, the mixed effect regression model showed that hyper-methylated ln(SEMs) resulted not significantly different between the two groups (p = 0.50). The median number of hyper-methylated log(SEMs) was 8.632 (Q1 = 8.381; Q3 = 9.136) in hypothyroid sib-pair and 8.524 (Q1 = 8.373; Q3 = 8.914) (Fig. 3). The median number of hypo or hyper-methylated SEMs in cases did not correlate with the zygosity (MZ and DZ), the thyroid morphology (dysgenesis or GIS), the CH outcome (permanent vs transient) or the genetic background (data not shown).

Fig. 3
figure 3

Boxplot showing the number and of hypo- and hyper-methylated log(SEMs). The thick horizontal line represents the median of the distribution while the box represents the interquartile range. Whiskers are set as the default option for the “boxplot” function and show the most extreme data point, which is no more than 1.5 times the interquartile range from the box. Dots represent outlier values (single values exceeding 1.5 interquartile ranges)

SEMs annotation analysis and candidate markers

Both for hyper- and hypo-methylated SEMs we annotated genomic positions and obtained the name of genes involved. Based on gene annotations, we selected loci that resulted epimutated only in cases or in controls. The Venn diagram in Fig. 4 describes the strategy adopted to select these genes.

Fig. 4
figure 4

Venn diagrams of a genes resulted annotated from hypo- and hyper-methylations in cases, controls, and reference population, and b univocal gene lists from cases and controls

Through this procedure, concerning hypo-methylations, two lists of 3922 and 146 genes, univocally belonging to the case and control groups were identified, respectively. Similarly, with regards to hyper-methylations, 1332 and 295 univocal genes emerged from Venn analysis (Fig. 4a). The list of these genes is reported in Table S3. As a further refinement, univocal gene lists were also investigated: the Venn diagram in Fig. 4b shows that analysis identified two common sets of genes with discordant SEMs methylation profile (n = 45 for univocal controls hyper- vs. univocal cases hypo-methylated; n = 14 for univocal cases hyper- vs. univocal controls hypo-methylated). Moreover, a list of 207 genes has been found to be associated to both hyper- and hypo-methylations in cases.

Gene ontology—prioritization analysis

A gene ontology analysis was performed on genes found univocally epimutated in the case group: analysis showed “regulation of protein acetylation” and “organic hydroxy compound metabolic process” as the most enriched pathways in hyper- or hypo-methylated genes, respectively (Fig S5). The prioritization analysis considering hypo-thyroidism as a unique disease term on the genes that resulted epimutated only in cases identified SLC26A4, NKX2-5, TSHB, and FOXI1 as the gene with the highest score in hypo-methylated group and SLC26A4, and FOXI1 in hyper-methylated group (Table S4).

SEM-enriched region analysis

For each subject, genomic coordinates of SEM-enriched regions were reported and annotated to obtain the list of genes involved. Finally, the three populations were compared through a Venn diagram to identify cases’ or controls’ specific epigenetic alterations (Fig. 5).

Fig. 5
figure 5

Venn diagrams of SEMs-enriched genes in CH cases, controls and reference population: hypo-methylations (left panel) and hyper-methylations (right panel)

The analysis identified 4 SEM-enriched genes present only in cases (3 hypo-methylated and 1 hyper-methylated) and 3 SEM-enriched genes present only in controls (all hyper-methylated). Hypo-methylated SEM involved three genes: FAM50B (#13A), MEG8 and SNORD114-12 (#19A) while hyper-methylated SEMs involved four genes: ZNF311 (#18A), TATDN1 (#13B), DMRT3 and LINC01230 (#20B). For these genes, the methylation levels of SEMs have been represented (Fig. 6).

Fig. 6
figure 6

Scatter plots of cases/controls/references methylation profiles for MEG8/SNORD114-12, TATDN1 (upper-panel), DMRT3/LINC01230, ZNF331 (middle-panel) and FAM50B (lower-panel) genes. The arrows identify SEMs in the involved subject

Discussion

The role of epigenetics in the aetiology of CH has been scantly investigated so far. Here, we compared, for the first time, circulating DNA methylome profiles of both MZ and DZ twin pairs concordant or discordant for CH diagnosis.

The absence of a significant DNA methylation signatures at genome-wide level in CH cases may likely indicate a broad epigenetic heterogeneity in this rare condition.

Stochastic Epigenetic Mutations (SEMs) represent a potent biomarker of epigenetic drift and an effective indicator of exposure-related accumulation of DNA damages [17]. Recently, rare epigenetic mutations were found significantly enriched in cases with congenital anomalies and were associated with altered gene expression [20]. Interestingly, we identified a significant increase of hypo-methylated SEMs in hypothyroid twin pairs compared to the healthy-twins cohort. This finding suggests that thyroid defects could be associated with an increased expression of predisposing genes in CH-affected twins [20, 21]. Epigenetic drift can be defined as the accumulation of mistakes in maintaining normal epigenetic patterns. This process contributes to impaired cellular and molecular functions and to a decline in phenotypic plasticity at the cellular and molecular levels [22]. Garg et al. (2020) [21] found that approximately one-third of epivariations are discordant between MZ twins, indicating that a significant fraction of epivariations occurs post-zygotically. These epigenetic marks are thought to be particularly vulnerable to environmental stressors in the perinatal period and are maintained across different cell lineages [23]. The involvement of these epivariations may account for the different phenotypes observed in twins, independently from the outcome of CH (permanent vs transient).

The prioritization analysis for CH performed on the genes epimutated only in cases identified SLC26A4, FOXI1 (both hypo and hyper-methylated), NKX2-5 and TSHB (hypo-methylated) as the genes with the highest score. Biallelic mutations in SLC26A4 gene cause Pendred syndrome, characterized by sensorineural hearing loss, enlarged vestibular aqueduct, goiter, and variable CH. FOXI1 is an upstream regulator of SLC26A4 transcription; monoallelic mutations of FOXI1 were documented in patients with sensorineural hearing loss and inner ear malformations [24]. The NKX2-5 gene is a member of the homeobox Nkx2 family that has been implicated in the pathogenesis of CH [25]. TSHB encodes the beta subunit of thyroid-stimulating hormone and a reduced methylation status at this level might favor the circulating TSH rise in CH patients and potentially explain the relative pituitary refractoriness in the normalization of circulating TSH occurring in some patients with CH during levothyroxine replacement [26,27,28,29].

Moreover, an in-deep analysis of single-pairs twins revealed interesting findings.

Pair #8 consists of MZ twins discordant for CH. The #8A presents CH and athyreosis. The genetic analysis revealed that both twins carried a nonsense mutation in DUOX2 gene (p.Q556X). Noteworthy, the phenotypical discordance among these two monozygotic twins argues against the possibility that monoallelic DUOX2 variants can be sufficient to explain the appearance of CH in one family [30]. Previous works showed mutations in genes typically associated with functional defects, including DUOX2, that had been also detected in thyroid dysgenesis, a finding frequently justified by the association with other genetic events in the oligogenic model of CH [14]. However, an additional occult genetic event explaining the discordance for thyroid dysgenesis and CH among two monozygotic twins is at least unlikely. Intriguingly, the #8A affected twin presented several genes with a significant differential hypo-methylation compared to the unaffected twin (Tables 1 and S5), which may instead explain the discordant phenotypical presentation of these MZ twins despite the shared heterozygous DUOX2 variation. Among the differentially methylated genes detected in this couple, the BICC1 gene might be relevant in this context (Table S5). BICC1 encodes an RNA-binding protein that is active in regulating gene expression during embryonic development and involved in stress responses to maintain tissue and organ integrity [31].

Pair #11 consists of DZ concordant twins presenting thyroid hypoplasia and permanent CH, without pathogenic variants in the analyzed genes. Interestingly, the burden of epimutations in this couple was the highest detected (Tables 2 and S5). These twins showed epimutations in most of the genes prioritized for CH phenotype (Tables S4 and S5).

Pair #12 consists of DZ concordant twins with permanent CH without pathogenic variants in the analyzed genes. Both presented thyroid dysgenesis, but the #12A with thyroid hemiagenesis and the #12B with hypoplasia. Of note, #12A showed an epimutation in SLC26A4 gene (Table S5). Interestingly, despite being classically implicated in thyroid hormonogenesis, SLC26A4 variations were reported also in patients with apparent thyroid dysgenesis [32]. Of note, this patient does not present any hearing impairment.

Case #13A was found to be a carrier of a benign NKX2-1 heterozygous variant (p.A116D) lacking the typical extrathyroidal manifestations of Brain-Lung-Thyroid syndrome [33]. Interestingly, this case showed a hypo-methylated SEM-enriched region in FAM50B gene (Family with Sequence Similarity 50 Member B, 6p25.2), which encodes a protein that plays a role in the circadian clock. FAM50B is an imprinted gene paternally expressed in many tissues, including the thyroid gland (https://gtexportal.org). Hypo-methylation of the same region of FAM50B has been previously reported in a patient with development delay [34], and has been associated with multi-locus imprinting defect (MLID) [35]. FAM50B hypo-methylation could have contributed to the appearance of the CH in association with a heterozygous NKX2-1 variant not sufficient per se to explain the phenotype in this case.

Case #19A is a DZ twin discordant for CH, with GIS and a heterozygous benign variation in GLIS3 in the absence of the typical manifestation of neonatal diabetes [27]. This case showed a hypo-methylated SEM-enriched region on chromosome 14q involves MEG8 (Maternally Expressed 8, Small Nucleolar RNA Host Gene) and SNORD114-12 (Small Nucleolar RNA, C/D Box 114-12), two RNA genes, located in an imprinted locus containing differentially methylated regions (IG-DMR, MEG3-DMR, MEG8-DMR), slightly expressed in the thyroid gland (https://gtexportal.org). A hypo-methylated status in these regions is responsible for Temple and Kagami-Ogata syndromes [35, 36] and these were identified in two unrelated cases with neurodevelopmental disorders and congenital anomalies [20]. The severe hypo-methylation status of a different region of MEG8 (Int29-32), potentially resulting in the MEG8 overexpression, could explain the appearance of the isolated CH phenotype in this case.

This study presents some limitations. In the first place, the lack of significant differences in global methylation between affected and unaffected twins may be a consequence of the limited sample cohort. In addition, our analysis was performed on blood samples collected later in life during levothyroxine replacement in CH cases. Nevertheless, a recent study confirmed that epivariations are conserved across multiple tissues, validating the use of peripheral blood for epigenomic analyses [20]. The effect of replacement therapy may instead have mitigated some differences between cases and controls. Finally, these determinations require replications in independent cohorts.

In conclusion, epigenetic modifications may be included among the possible mechanisms that, possibly in association with other events (e.g., hypomorphic thyroid alleles), can account for CH pathogenesis and discordance among monozygotic twins. Their relevance may be particularly high in conditions characterized by an increased risk for CH such as premature birth or low birth weight.