Background

Pathogenic variants in the GJB2 gene (MIM 121011, chromosome 13q11q12) are the most common causes of autosomal recessive non-syndromic hearing loss in various populations. Over 300 pathogenic variations in GJB2 have been reported in the Human Gene Mutation Database [1]. Many of them have high ethno-geographic specificity in prevalence [2], which for certain ethnic groups is being attributed to a founder effect [3,4,5,6,7,8,9,10,11].

The recessive pathogenic GJB2 variant c.35delG (p.Gly12Valfs*2) (NM_004004.5) is known to be prevalent in deaf patients of Caucasian origin [2, 12, 13]. Previous studies have revealed the c.35delG carrier frequency to be around 1 in 50 overall in Europe [12], reaching 1 in 31 in Southern Europe [14]. In meta-analysis of the data published up to 2008, mean carrier frequencies of c.35delG were found to be 1.89, 1.52, 0.64, 1, and 0.64% for European, American, Asian, Oceanic, and African populations, respectively [13]. The c.35delG is a deletion of one guanine (G) from a string of six (GGGGGG) in the GJB2 coding sequence resulting in a frameshift and termination of the Cx26 protein sequence at amino acid 13 (p.Gly12Valfs*2). The occurrence of c.35delG as a possible “hot spot” caused by DNA polymerase “slippage” was previously assumed to be an explanation of the high prevalence of this pathogenic variant in the GJB2 gene [5, 15, 16]. Nevertheless, convincing evidence emerged that the founder effect plays an important role in the prevalence and accumulation of c.35delG in populations of Caucasian origin since lower rates or absence of c.35delG are observed in other populations. According to a generally accepted hypothesis, c.35delG originated from a common ancestor in the Middle East or the Mediterranean approximately 10,000–14,000 years ago and spread throughout Europe with Neolithic migrations. Specific c.35delG prevalence and discovery of common STR- and SNP-haplotypes bearing the c.35delG mutation in Mediterranean, Middle Eastern, North-European populations, and in individuals of European origin in the USA support this hypothesis [14, 17,18,19,20,21,22,23,24,25,26,27]. Relevant data were also obtained for populations of the Volga-Ural region of Russia [28] and Belarus [29].

The c.35delG predominance in deaf patients was reported in several studies conducted in the European part of Russia [30,31,32,33,34,35,36,37,38]. In the ethnically heterogeneous population of Siberia, epidemiological and molecular genetic studies of hereditary deafness are currently limited to regions of the Altai Republic, the Tuva Republic (Southern Siberia), and the Sakha Republic (Yakutia, Eastern Siberia) [39,40,41]. The presence of c.35delG in a homozygous or compound-heterozygous state was the main cause of hereditary hearing loss in deaf Russian patients living in these regions in contrast to deaf patients belonging to Siberian indigenous peoples (Altaians, Tuvinians, Yakuts) who were negative for the c.35delG mutation [39,40,41].

This study presents an updated summary of published data on the c.35delG (p.Gly12Valfs*2) prevalence in Russia complemented by our original data on the c.35delG carrier frequency in Western Siberia. To test the c.35delG common origin hypothesis, we genotyped polymorphic markers flanking the GJB2 gene and reconstructed haplotypes bearing c.35delG in deaf patients from Siberia homozygous for c.35delG.

Methods

Subjects

Twenty-four unrelated patients (mostly Russians) with congenital or early onset profound hearing loss living in several Siberian regions (Altai, Tuva, Yakutia) were previously found to be homozygous for c.35delG [39,40,41]. The carrier frequency of c.35delG was estimated in 122 unrelated normal hearing individuals (mostly Russians) from the Novosibirsk region (Western Siberia). Genotyping of three polymorphic short tandem repeat (STR) markers D13S141, D13S175, D13S1853 and an intragenic SNP (rs3751385) was performed in 24 unrelated deaf patients homozygous for c.35delG and in 67 unrelated healthy individuals from the Novosibirsk region (Western Siberia) who were negative for c.35delG.

C.35delG screening and analysis of genetic markers

All primers and genotyping methods are summarized in Table 1. The c.35delG screening was performed according to [42]. Polymorphic STR markers flanking the GJB2 gene: D13S141 (~ 39.2 kb centromeric to c.35delG), D13S175 and D13S1853 (~ 84.8 kb and ~ 277.1 kb telomeric to c.35delG, respectively), and intragenic SNP (rs3751385) locating ~ 0.7 kb centromeric from c.35delG were used to reconstruct haplotypes bearing c.35delG. These markers were used previously in relevant studies and were therefore chosen to keep compatibility and enable comparative analysis with already available data. Genotyping of D13S141, D13S175 and D13S1853 was performed in the SB RAS Genomics Core Facility (Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk, Russia).

Table 1 Primer sequences for PCR, fragment analysis and Sanger sequencing

Statistical analysis

Haplotype frequencies were estimated from observed genotype data using Expectation–Maximization (EM) algorithm of the Arlequin 3.5.2.2 software [43]. Fisher’s exact test (significance level 0.05) was used to compare the allelic and haplotype distributions. Linkage disequilibrium between the marker alleles and c.35delG as well as the age of c.35delG were estimated as described previously [44, 45]. The linkage disequilibrium was calculated as

$$ \updelta =\left(\mathrm{Pd}-\mathrm{Pn}\right)/\left(1-\mathrm{Pn}\right), $$

where δ is the measure of linkage disequilibrium, Pd is the frequency of the marker allele among all chromosomes carrying c.35delG, and Pn is the frequency of the same allele among chromosomes without c.35delG. The age of c.35delG was estimated as

$\text{g} = \text{log} [1-\text{Q}/(1-\text{Pn})]/\text{log}(1-\ominus)$

where g is the number of generations from the moment of the c.35delG appearance to the present, Q is the share of chromosomes carrying c.35delG unlinked with the founder haplotype, Pn is the frequency of the allele included in founder haplotype in the population, and Ѳ is the recombinant fraction calculated from the physical distance of the markers from the c.35delG location on the assumption of 1 cM = 1000 kb.

Results

Carrier frequency of c.35delG in Russia

Screening of c.35delG in unrelated hearing individuals (mostly Russians) living in the Novosibirsk region (Western Siberia) has revealed 5 out of 122 examined subjects to be c.35delG carriers (4.1%). These data supplement current information about the c.35delG prevalence in Russia. We have analyzed all available literature data (published up to 2018) on c.35delG carrier frequencies in Russia and in some countries of the former Soviet Union (USSR) which populations undoubtedly contributed to the contemporary population of Russia. The distribution of c.35delG carrier frequencies is presented in Fig. 1. High c.35delG frequencies are observed in the populations of north-western and central parts of Russia with downward trend from west to east.

Fig. 1
figure 1

Distribution of the c.35delG carrier frequency on the territory of Russia and in some countries of the former Soviet Union. The c.35delG carrier frequencies (%) were obtained from all available data published up to 2018 (Additional file 1: Table S1). Codes from 1 to 36 indicate analyzed samples (region under study and/or ethnicity which were indicated in the original publications). In some cases, c.35delG carrier frequency was calculated by us from the data given in original publications. The maximum value of the c.35delG carrier frequency is used for the figure if there are several data sets for a region or an ethnically stratified sample. Codes 1–4 – Northern and Eastern Europe: 1 – Estonia (4.4–4.5%), 2 – Lithuania (1.0%), 3 – Belarus (3.4–6.2%), 4 – Ukraine (3.3–4.1%); 5–8 - North-Western part of Russia: 5 – residents of Kaliningrad and Kaliningradskaya Oblast’ (7.5%), 6 - residents of Pskov and Pskovskaya Oblast’ (2.0–4.7%), 7 - residents of St. Petersburg and Leningradskaya Oblast’ (3.3–5.9%), 8 - residents of Arkhangelsk and Arkhangelskaya Oblast’ (5.0%); 9–10 - Central part of Russia: 9 – residents of different regions (3.8–5.1%); 10 – Russians (Kirovskaya Oblast’) (3.8%); 11–19 - Volga-Ural region of Russia: 11 – Komi (0%), 12 - Mari (2.0–2.6%), 13 - Udmurts (0.5–3.7%), 14 – Mordvins (5.7–6.2%), 15 – Chuvashes (0–2.6%), 16 - Russians (5.0%), 17 - Tatars (1.0–2.6%), 18 - Bashkirs (0–3.6%), 19 – Russians (Ekaterinburg) (2.2%); 20–25 – Siberia: 20 - Russians (Novosibirsk, Western Siberia) (4.1%), 21 – Altaians (0%), 22 – Tuvinians (0%); 23 - Buryats (0%); 24 - Yakuts (0.4–1.0%), 25 - Russians (Yakutia, Eastern Siberia) (2.5%); 26–31 - South-Western part of Russia (including North Caucasus): 26 - residents of Rostovskya Oblast’ (Russians) (2.9%), 27 – Cherkessians (1.3–2.0%), 28 – Karachays (0.3%), 29 - Ingush (0–2.0%), 30 - Chechens (0–0.7%), 31- Avars (0%); 32–33 - South Caucasus: 32 – Abkhazians (3.8%), 33 – Armenians (3.7%); 34–36 - Central Asia: 34 - Uzbeks (0%), 35 - Kazakhs (0.8%), 36 - Uighurs (0.9%)

Common haplotypes associated with c.35delG in Siberia

Certain polymorphic STR markers flanking GJB2 are traditionally used for c.35delG haplotype analysis: centromeric D13S141 (~ 39.1 kb from c.35delG), telomeric D13S175 (~ 84.8 kb from c.35delG) and distal telomeric D13S143 (~ 1.5 Mb from c.35delG) [17, 19,20,21,22,23,24,25,26,27,28,29]. Since c.35delG is presumably a very old mutation, a common ancestral haplotype could be observed in even a closest chromosomal locus, therefore instead of the more distal D13S143, we chose the marker D13S1853 that was closer to c.35delG (~ 277.1 kb telomeric), making the haplotype region (D13S141-c.35delG-D13S175-D13S1853) span over ~ 316 kb. Results of D13S141, D13S175, D13S1853 and rs3751385 genotyping in the c.35delG homozygotes and in control individuals from Siberia are summarized in Table 2. Significant differences in allele frequencies of markers D13S141, D13S175, D13S1853 are observed between the c.35delG homozygotes and control subjects (Table 2). Allele T of rs3751385 was only identified in the c.35delG homozygotes showing significant differences (p <  10− 21) between patients and control samples. A strong association of allele T (rs3751385) with c.35delG was also shown in previous studies [18, 22,23,24,25, 27, 29]. Twelve and thirty-nine D13S141-D13S175-D13S1853 haplotypes were reconstructed for c.35delG homozygotes and for control samples from Siberia, respectively (Fig. 2A). Two haplotypes, 126–105-202 (37.5%) and 124–105-202 (25.0%), were the most common (52.5% in total) among chromosomes bearing c.35delG in contrast with 7.8% for 126–105-202 and 8.5% for 124–105-202 among wild-type chromosomes (p <  10− 2–10− 5). Based on observed linkage disequilibrium for D13S175 and D13S1853 alleles (Table 2), we have roughly estimated the age of c.35delG in Siberia as ~ 4800 years (D13S1853) or ~ 8100 years (D13S175). It should be noted that an accurate calculation of the c.35delG age is difficult because of many uncertainties, first of all, unknown true recombination frequency of this chromosomal region [18, 22].

Table 2 Allele frequencies of D13S141, D13S175, D13S1853, and rs3751385 in the c.35delG homozygotes and in the control samples (individuals without c.35delG)
Fig. 2
figure 2

The D13S141-D13S175-D13S1853 haplotypes in studied samples. a Distribution of the D13S141-D13S175-D13S1853 haplotypes reconstructed by the Arlequin software (EM algorithm) in the c.35delG homozygotes (pink bars) and in the control samples (green bars) (see explanation in text). n – number of haplotypes. b The D13S141-rs3751385-c.35delG-D13S175-D13S1853 haplotypes found in the c.35delG homozygotes. N - number of individuals. The most frequent haplotypes are highlighted in color: 126-T-105-202 – by red, 124-T-105-202 – by blue; nt - not tested

Discussion

Distribution of the c.35delG carrier frequency in Russia is characterized by pronounced ethno-geographic specificity (Fig. 1). High c.35delG rates are typical for populations of Caucasian origin in north-western part of Russia (up to 7.5% in the Kaliningrad Oblast’ and 5.9% in the Leningrad Oblast’) [46]. The contemporary population of the Kaliningrad Oblast’ represented mainly by Russians (about 80%) was formed as a result of large-scale post-war (after 1945) migration from European regions of the former USSR. A high carrier frequency of c.35delG in rural areas of the Leningrad Oblast’ (5.9%) could probably be attributed to indigenous Finno-Ugric Vepsians, correlating with high c.35delG rates reported for other Finno-Ugrics: 4.4–4.5% in Estonians [12, 47], 5.7–6.2% in Mordvins [30, 48]. The highest carrier frequencies of c.35delG in the Caucasus are observed in Abkhazians (3.8%) [48] and Armenians (3.7%) [49]. Low carrier frequencies of c.35delG or its absence were observed among Turkic-speaking peoples of Volga-Ural region (Tatars, Bashkirs, Chuvashes), Siberia (Altaians, Tuvinians, Yakuts), Central Asia (Kazakhs, Uighurs, Uzbeks), and also in Mongolic-speaking Buryats (Siberia) [39,40,41, 48, 50]. In Siberia, the c.35delG carrier frequency in Russians varies from 4.1% in Western Siberia (this study) to 2.5% in Yakutia (Eastern Siberia) [41].

Haplotype (D13S141-rs3751385-D13S175-D13S1853) analysis of chromosome 13 in c.35delG homozygous deaf Russian patients from Siberia revealed the two most common haplotypes (126-T-c.35delG-105-202 and 124-T-c.35delG-105-202). It is interesting to compare the common c.35delG haplotypes identified in Siberia with available relevant data for other populations [19, 21, 23,24,25,26, 28, 29] (Table 3). Such comparisons are to some extent possible for the c.35delG haplotypes flanked by D13S141 and D13S175 (~ 124 kb). For these markers, the total share of two common c.35delG haplotypes in Siberia, 126-T-c.35delG-105 (56.3%, 27 of 48 chromosomes) and 124-T-c.35delG-105 (29.2%, 14 of 48 chromosomes), reaches 85.5% (Fig. 2B). Unfortunately, unified classification of the D13S141 and D13S175 alleles based on their size in nucleotides or number of dinucleotide (CA) repeats is absent due to different genotyping methods or simple numerical designations for alleles used in previous studies, making accurate comparative analysis hardly possible. Nonetheless, the 105 bp long allele of D13S175 was detected in the most common c.35delG haplotypes in the majority of studies while a broad variety of alleles was observed for D13S141. We were able to compare both allelic size in nucleotides (determined by fragment analysis) and numbers of CA repeats (detected by Sanger sequencing) in two most frequent D13S141 alleles in the c.35delG homozygotes from Siberia (analyzed in this study), Volga-Ural region of Russia (kindly provided by L. Dzhemileva [28]), and Belarus [29]. Fourteen CA repeats (CA14) were found in D13S141 alleles 126, 127, and 125 while thirteen CA repeats (CA13) were found in D13S141 alleles 124, 125, and 123 in DNA samples from Siberia, Belarus, and Volga-Ural region, respectively (Table 3). Thus, the identity of allelic composition of two most frequent haplotypes D13S141-c.35delG-D13S175 (~ 124 kb) was revealed in these geographically remote regions (Table 3). In addition, we assume that the conservative region of c.35delG haplotypes from Siberia may span longer being flanked by D13S141 and D13S1853 (~ 316 kb).

Table 3 The most common D13S141-c.35delG-D13S175 haplotypes in different populations

The contemporary Siberian population of Caucasian origin (mostly Russians) was formed as a result of multiple migration flows from the European part of Russia that started from the first settlement of Siberia by Russians at the end of sixteenth century [51]. Our rough dating of c.35delG expansion into Siberia (about 4800–8100 years ago) could be a reflection of complex processes of early formation stages of the modern European population (including the European part of Russia). These data do not contradict the earlier hypothesis about the c.35delG occurrence in the Middle East or the Mediterranean approximately 10,000–14,000 years ago followed by its spreading with migration flows across Europe [14, 17,18,19,20,21,22,23,24,25,26,27]. However, taking into account the current data on three ancient components in the origin of modern Europeans [52], it cannot be ruled out that c.35delG could also have independently originated among any ancient populations of North-West Europe or anywhere else.

Conclusions

Distribution of the c.35delG carrier frequency in Russia is characterized by pronounced ethno-geographic specificity. High frequencies of c.35delG are observed in the populations living in north-western and central parts of Russia with a downward trend from west to east. The territory of Siberia can be assumed as the north-eastern geographic “end point” of the c.35delG prevalence in Eurasia. Comparative analysis of the c.35delG haplotypes supports the common origin of c.35delG in some regions of Russia (Siberia and Volga-Ural region) and in Eastern Europe (Belarus). A thorough study of the haplotypes associated with c.35delG in populations from different world regions could further elucidate its origin and age.