Introduction

Y chromosome short tandem repeat (STR) haplotypes show a substantial inter-population differentiation both on a worldwide and continental scale [1, 2]. Because Y-STR haplotype frequencies are required to provide statistical estimates of the significance of a match between forensic samples, local databases must be developed taking the demographic history of the investigated population into account. Large metadatabases as the Y Chromosome Haplotype Reference Database (YHRD) pool different local databases based on intrinsic (genetic) and contextual (geographic, linguistic) information [3]. However, assembling different regional pools is only valid if there is no population substructure, i.e. no statistically significant difference between the Y-STR haplotype distributions in different regions [4]. Here, we present haplotype analysis and genetic differentiation tests of a large Russian population consisting of 12 subgroups leading to a sensible decision on the assignment of the analysed population to a framework of worldwide metapopulations.

Population

A total of 545 unrelated males from 12 Western Russian populations previously typed for Y-chromosome single nucleotide polymorphism markers [5] were analysed for 17 Y-STR loci evaluated in forensic routine diagnostics [6]. The sampling was carried out in the administrative centers of the following districts (oblasts): Smolenskaja (Smo, n = 43), Brianskaja (Bri, 43), Ivanovskaja (Iva, 40), Lipezkaja (Lip, 47), Penzenskaja (Pen, 81), Ryazanskaja (Rya, 36), Orlovskaja (Orl, 42), Tverskaja (Tve, 43), Vologodskaja (Vol, 40), Tambovskaja (Tam, 48), Archangelskaja (Arch, 42) and Nowgorodskaja (Now, 40; see Fig. 1). Informed consent and information about the birthplace of the donor and his parents and grandparents were obtained.

Fig. 1
figure 1

Map of Europe depicting the analysed populations from the European part of Russia and neighbouring territories

Materials and methods

Deoxyribonucleic acid (DNA) was extracted from whole blood samples using the QIAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany) following the manufacturer’s recommendations. All samples were genotyped for 17 Y chromosomal STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a, DYS385b, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATA H4) using the AmpFlSTR®YFiler® PCR Amplification kit (Applied Biosystems, Darmstadt, Germany) according to the manufacturer’s instructions. STR products were analysed on an ABI Prism 3100 AVANT automated sequencer with GeneScan and GenoTyper v. 3.7 (Applied Biosystems). The updated recommendations of the DNA Commission of the International Society of Forensic Genetics for analysis of Y-STR systems were followed [4].

Quality control

Proficiency testing of the German DNA Profiling group (www.gednap.de) and the YHRD (www.yhrd.org) trials was carried out.

Analysis of data

Characteristic parameters for each population, consisting of the number of different haplotypes, the discrimination capacity (D) and the haplotype diversity (h), were calculated (Table 1). Pairwise values of Φ st , an analogue of Wrights Fst that takes the evolutionary distance between individual haplotypes into account [7, 8], were calculated to measure genetic distances between 17 locus haplotypes of 12 Western Russian populations. Subsequently, these populations were treated as one metapopulation and the minimal haplotypes (which includes the nine loci: DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab) were compared to published data from nine neighbouring regions, namely 271 minimal haplotypes from Belarus [9], 502 from the Caucasus [10], 133 from Estonia, 145 from Latvia and 157 from Lithuania [11], 399 samples from Finland [12], 3,021 from Poland [1315], 370 from Siberia [16, 17] and 368 from Ukraine [18], with the statistical significance determined by a permutation test (10,000 replicates; Table 2). We used our own implementation of analysis of molecular variance (AMOVA; available at http://rprojekt.org/amova/). The DYS389I allele length was obtained by subtracting the shorter allele from the longer allele at DYS389I/II. The statistics and analytics software package (STATISTICA package; StatSoft Inc.) was used for multi-dimensional scaling (MDS) analysis [19] based on pairwise Φ st values (Fig. 2).

Fig. 2
figure 2

Multi-dimensional scaling (MDS) plot of the Western Russian metapopulation and nine reference populations, from pairwise Φ st values. Stress value = 0.002. Acronyms are as follows: Rus–Russians, Bel–Belarus, Ukr–Ukrainians, Lat–Latvians, Lit–Lithuanians, Est–Estonians, Cau-Caucasus, Pol–Polish, Fin-Finns, Sib–Siberia

Table 1 Forensic parameters
Table 2 AMOVA pairwise distance based on Φ st values between western Russia (n = 545) and nine neighbouring populations (n = 5,366)

Results and discussion

A total of 545 samples from 12 Western Russian populations were investigated in this study and 494 different 17 locus haplotypes (D = 0.906, h = 0.9994) were detected. Two haplotypes occurred six times, one occurred five times, two occurred four times, three occurred three times, 25 twice and 461 were unique (Supplementary Table S1). The eight most frequent haplotypes occurring 34 times in this sample are closely related and belong (with the exception of one haplotype belonging to hg N3-Tat) to haplogroup R1a1-M17 (printed in bold in Table S1). This typical Eastern European haplogroup is the most frequent in all analysed populations with frequencies ranging between 0.31 and 0.56, followed by hg I-M170 (0.09–0.31) and N3-Tat (0.06–0.29) [5]. We observed high haplotype diversities (h) in all 12 populations ranging from 0.993 to 1.000 (Table 1). Three duplications (12 and 13 at locus GATA H4, 15 and 16 at DYS19 and 20 and 21 at DYS448), one “null” allele at DYS19 and several intermediate-sized alleles were observed.

No genetic substructure was found among the Russian populations. All pairwise comparisons were non-significant (p > 0.05). In contrast, significant variation between populations was observed in the comparison of Western Russia, treated as one homogeneous metapopulation, with neighbouring groups (Table 2). Because only reduced haplotype formats were available for such reference populations, we performed AMOVA based on 545 minimal 9-locus haplotypes from Western Russia with the previously published 5,366 haplotypes from 11 neighbouring regions (in clockwise direction: Ukraine, Belarus, Lithuania, Latvia, Estonia, Finland, Siberia and the Caucasus region) [918]. All pairwise Φ st comparisons between Russia and these neighbours (with the exception of Russia vs. Belarus) were significant with values ranging between 0.0089 (Russia vs. Poland) and 0.3688 (Russia vs. eight Altaic- and Uralic-speaking groups from Siberia). MDS plot based on pairwise Φ st values shows a closely related core group consisting of Slavic-speaking populations (Russia, Ukraine, Belarus and Poland) with an elevated distance to Baltic populations and a large span to linguistically different groups from Estonia, Finland, Siberia and the Caucasus (Fig. 2). From these analyses, we conclude that autochthonous Russian-speaking populations residing for centuries in the European part of Russia can be pooled to form a representative regional reference database for assessment of Y chromosomal matches in forensic analyses. However, on a level of quite low but significant Φ st values, Western Russian populations can be grouped together with a much larger metapopulation defined within the forensic YHRD. This genetically distinct “Eastern European” metapopulation (n = 5,993, YHRD release 22 from 2007-08-10) comprises 56 Balto-Slavic-speaking populations from Eastern Europe [2]. Because the population frequency of a given haplotype is positively correlated with the combined frequency of closely related (surrounding) haplotypes in the population of origin [20], forensic databases collecting haploid markers have to be tested for substructure. This and other studies demonstrate that AMOVA is a sensitive method to detect the extent of inter-population differences accrued from the genetic and demographic history of the populations. The assignment of each population sample to a set of populations sharing a common linguistic, demographic, geographic and genetic background (metapopulations) facilitates the statistical evaluation of haplotype matches due to a significant enlargement of sample sizes.

The data obtained in this study from Russian populations were submitted to the Y Chromosome Haplotype Reference Database (www.yhrd.org) and they were assigned to the Eurasian–European–Eastern European metapopulation [3].