Population genetics of 30 INDELs in populations of Poland and Taiwan

The Investigator DIPplex® kit (Qiagen) contain components for the simultaneous amplification and analysis of 30 biallelic autosomal INDELs and amelogenin. The objective of this study was to estimate the diversity of the 30 markers in Polish (N P = 122) and Taiwanese (N T = 126) population samples and to evaluate their usefulness in forensic genetics. All amplicon lengths were shorter than 160 base pairs. The DIPplex genotype distributions showed no significant deviation from Hardy–Weinberg rule expectations (Bonferroni corrected) except for DLH39 in the Taiwanese population. Among the Poles and the Taiwanese the mean observed heterozygosity values are 0.4385 and 0.4079, and the combined matching probability values are 7.98 × 10−14 and 1.22 × 10−11, respectively. The investigated marker set has been confirmed as a potential extension to standard short tandem repeat-based kits or a separate informative system for individual identification and kinship analysis. Eight INDELs have been selected as possible ancestry informative single-nucleotide polymorphisms for further analyses.


Introduction
INDELs (insertion-deletion) or DIPs (deletion-insertion polymorphisms) are short length diallelic polymorphisms, consisting of the presence or absence of short sequences (typically 1-50 bp). They are relatively common throughout the human genome representing 15-20 % of all polymorphisms [1] with the total number estimated at about 2 million [2]. Short amplicon size (50-150 bp), low mutation rate (\2 9 10 -8 ), and capacity to multiplex (30-40 markers) and type using a single multiplexed PCR with fluorescently labeled primers followed by capillary electrophoresis (a current technology for human identification) [3][4][5] are the main advantages that make INDELs useful in forensic genetics applications including individual identification, kinship testing, population studies and ancient DNA analysis [6][7][8]. The Investigator DIPplex Ò kit (Qiagen) contain components for the simultaneous amplification and analysis of 30 biallelic autosomal INDELs and amelogenin. The INDELs are distributed over 19 autosomes at the minimum distance of 10 Mbp to routinely used STR and SNP markers. The allele length variations of the INDELs are between 4 and 22 bp, and all amplicons are shorter than 160 bp.

DNA extraction
Buccal swabs were anonymized and collected from unrelated volunteers along with information on the birthplace and ethnicity of the donor. Signed informed consents were obtained from all the participants and this study complied with the protocol approved by the Ethical Committee of Poznan University of Medical Sciences (Ref: 139/13). The population sample sizes were: Poles (N P = 122), and Taiwanese (N T = 126). The extraction of genomic DNA was carried out using QIAamp Ò DNA Mini Kit (Qiagen). The quantitation was performed using Quantifiler TM Human DNA Quantification Kit on a 7500 Real-Time PCR System (Applied Biosystems) according to the manufacturer's specifications. The samples were then normalized to 100 pg/ll and stored at -20°C until amplification.

Amplification and genotyping
PCR conditions were applied according to the protocol recommended by the manufacturer of the Investigator DIPplex Kit (Qiagen) in PCR System 9700 (Applied Biosystems, USA) with a total reaction volume adjusted to 5 ll containing 1.8 ll nuclease-free water, 1.0 ll reaction mix A, 1.0 ll primer mix, 0.2 ll MultiTaq2 polymerase, and 100 pg DNA template. Control DNA XY5 was used to test performance of the DIPplex Kit. The amplification was performed with 30 PCR cycles. Electrophoresis and typing were performed in 3130 Genetic Analyzer (Applied Biosystems, USA) using a 36 cm capillary array and a denaturing polymer POP-4. BTO 550 (Qiagen) was used as the internal lane standard spanning fragments from 60 to 550 bps. Prior to the analysis, a five dye matrix standard (BT5) was established with the fluorescent labels dyes 6-FAM, BTG, BTY, BTR, and BTO under the Any5Dye virtual filter. Samples were injected for 10 s at 3 kV and electrophoresed for 1000 s at 15 kV at a run temperature of 60°C. The data were collected using Data Collection v3.0 software. GeneMapper Ò ID-X v1.1.1 software was used for the INDELs classification.

Statistical analysis
Estimates for genetic diversity (allele frequencies, heterozygosity), conformance to expectations of the Hardy-Weinberg equilibrium (HWE) and for independence (Linkage Disequilibrium, LD) were obtained using GDA v1.0 software [9]. For multiple comparisons, the original significance levels achieved (P values) were transformed by the Bonferroni correction procedure [10], i.e. 30 markers per database yield an actual significance level of 0.0016667. Forensic informativeness was estimated by calculating discrimination power (DP), match probability (MP), polymorphic information content (PIC), typical paternity index (TPI), and power of paternity exclusion (PE) using Powerstats v1.2 spreadsheet (Promega) [11]. Comparison of allele frequency distributions was performed by means of a pairwise population comparison test (R 9 C contingency test; G. Carmody, Ottawa, Canada). AMOVA and population differentiation exact test were calculated with the Arlequin v.3.5 software [12].

Results and discussion
A representative DIPplex profile obtained from amplification of 100 pg DNA template is presented in Fig. 1. In the Polish population sample the INDELs frequency distributions showed no deviations from HWE (Bonferroni corrected, 0.0025 \ P \ 1.0000) evaluated by randomization procedure (10,000 cycles). Pairwise comparison using the exact test disequilibrium analysis with 16,000 permutation steps yielded departures from independence for 93 out of 435 pairs of INDELs under the analysis (0.0019 \ P \ 0.0480) (data not shown). The departures appeared statistically insignificant when the Bonferroni correction was used for the number of analysed loci. Observed heterozygosity for all the systems ranged 0.3525-0.5164, with an average of 0.4385, which is slightly lower than the values reported for Czech    Table 2). The individual mutation rate of a locus is one of the factors that may explain the observed discrepancy [21]. However, when compared with mutation rates of 10 -3 -10 -5 for STRs [22,23], SNPs have essentially mutation rates estimated at as low as 10 -8 [24]. From the point of view of forensic genetics, markers with high heterozygosity and very low F ST are potentially advantageous due to relatively high discrimination efficiency irrespective of population of origin [24,25]. High heterozygosity enhances the polymorphism information at each SNP and low F ST diminishes the chance of interpopulation effects. Some SNPs are reported to have remarkably little variation in allele frequency around the world [26]. On the other hand, ancestry informative single-nucleotide polymorphisms (AISNPs) are required to show low heterozygosity and high allele frequency divergence between different ancestral or geographically distant populations (F ST values). These genetic markers are especially useful in establishing the high probability of an individual's biogeographical ancestry [27,28]. We have selected eight INDELs (HLD131, HLD111, HLD118, HLD99, HLD122, HLD64, HLD81, HLD39) with F ST higher than 0.1 between Poles and Taiwanese as potential AISNPs for further analyses. Other sets of population data are needed to verify the robustness of these loci.