Introduction

INDELs (insertion–deletion) or DIPs (deletion–insertion polymorphisms) are short length diallelic polymorphisms, consisting of the presence or absence of short sequences (typically 1–50 bp). They are relatively common throughout the human genome representing 15–20 % of all polymorphisms [1] with the total number estimated at about 2 million [2]. Short amplicon size (50–150 bp), low mutation rate (<2 × 10−8), and capacity to multiplex (30–40 markers) and type using a single multiplexed PCR with fluorescently labeled primers followed by capillary electrophoresis (a current technology for human identification) [35] are the main advantages that make INDELs useful in forensic genetics applications including individual identification, kinship testing, population studies and ancient DNA analysis [68]. The Investigator DIPplex® kit (Qiagen) contain components for the simultaneous amplification and analysis of 30 biallelic autosomal INDELs and amelogenin. The INDELs are distributed over 19 autosomes at the minimum distance of 10 Mbp to routinely used STR and SNP markers. The allele length variations of the INDELs are between 4 and 22 bp, and all amplicons are shorter than 160 bp.

DNA extraction

Buccal swabs were anonymized and collected from unrelated volunteers along with information on the birthplace and ethnicity of the donor. Signed informed consents were obtained from all the participants and this study complied with the protocol approved by the Ethical Committee of Poznan University of Medical Sciences (Ref: 139/13). The population sample sizes were: Poles (N P = 122), and Taiwanese (N T = 126). The extraction of genomic DNA was carried out using QIAamp® DNA Mini Kit (Qiagen). The quantitation was performed using Quantifiler™ Human DNA Quantification Kit on a 7500 Real-Time PCR System (Applied Biosystems) according to the manufacturer’s specifications. The samples were then normalized to 100 pg/μl and stored at −20 °C until amplification.

Amplification and genotyping

PCR conditions were applied according to the protocol recommended by the manufacturer of the Investigator DIPplex Kit (Qiagen) in PCR System 9700 (Applied Biosystems, USA) with a total reaction volume adjusted to 5 μl containing 1.8 μl nuclease-free water, 1.0 μl reaction mix A, 1.0 μl primer mix, 0.2 μl MultiTaq2 polymerase, and 100 pg DNA template. Control DNA XY5 was used to test performance of the DIPplex Kit. The amplification was performed with 30 PCR cycles. Electrophoresis and typing were performed in 3130 Genetic Analyzer (Applied Biosystems, USA) using a 36 cm capillary array and a denaturing polymer POP-4. BTO 550 (Qiagen) was used as the internal lane standard spanning fragments from 60 to 550 bps. Prior to the analysis, a five dye matrix standard (BT5) was established with the fluorescent labels dyes 6-FAM, BTG, BTY, BTR, and BTO under the Any5Dye virtual filter. Samples were injected for 10 s at 3 kV and electrophoresed for 1000 s at 15 kV at a run temperature of 60 °C. The data were collected using Data Collection v3.0 software. GeneMapper® ID-X v1.1.1 software was used for the INDELs classification.

Statistical analysis

Estimates for genetic diversity (allele frequencies, heterozygosity), conformance to expectations of the Hardy–Weinberg equilibrium (HWE) and for independence (Linkage Disequilibrium, LD) were obtained using GDA v1.0 software [9]. For multiple comparisons, the original significance levels achieved (P values) were transformed by the Bonferroni correction procedure [10], i.e. 30 markers per database yield an actual significance level of 0.0016667. Forensic informativeness was estimated by calculating discrimination power (DP), match probability (MP), polymorphic information content (PIC), typical paternity index (TPI), and power of paternity exclusion (PE) using Powerstats v1.2 spreadsheet (Promega) [11]. Comparison of allele frequency distributions was performed by means of a pairwise population comparison test (R × C contingency test; G. Carmody, Ottawa, Canada). AMOVA and population differentiation exact test were calculated with the Arlequin v.3.5 software [12].

Results and discussion

A representative DIPplex profile obtained from amplification of 100 pg DNA template is presented in Fig. 1. In the Polish population sample the INDELs frequency distributions showed no deviations from HWE (Bonferroni corrected, 0.0025 < P < 1.0000) evaluated by randomization procedure (10,000 cycles). Pairwise comparison using the exact test disequilibrium analysis with 16,000 permutation steps yielded departures from independence for 93 out of 435 pairs of INDELs under the analysis (0.0019 < P < 0.0480) (data not shown). The departures appeared statistically insignificant when the Bonferroni correction was used for the number of analysed loci. Observed heterozygosity for all the systems ranged 0.3525–0.5164, with an average of 0.4385, which is slightly lower than the values reported for Czech [6], German [13], Danish [14], Finnish [15], Central Spain, and the Basque Country populations [16]. In the Taiwanese population sample the INDELs frequency distributions showed no deviations from HWE (0.0032 < P < 1.0000) except for DLH39 (P = 0.0005). There were no statistically significant departures from independence between any pair-wise combination of INDELs (0.0018 < P < 0.0597) (data not shown). Observed heterozygosity for all the systems ranged 0.1270–0.6191, with an average of 0.4079, which corresponds to the values reported for Asian-Americans, and African-Americans [17]. The highest DP loci were HLD114 (DP = 0.660) for Poles and HLD118 (DP = 0.656) for Taiwanese. Based on data of the 30 INDELs the combined MP value among Poles amounts 7.98 × 10−14 which is more than two orders of magnitude lower than the value calculated for the Taiwanese population (1.22 × 10−11). Both parameters however, indicate a favourable value of a random match comparable with that of AmpFlSTR SGM kit [18, 19]. The combined values of PE are 0.9900 versus 0.9884, correspondingly (Table 1).

Fig. 1
figure 1

Representative DIPplex profile obtained from amplification of 100 pg DNA template

Table 1 Population data and forensic efficiency parameters for 30 DIPplex INDELs in Polish (N P = 122) and Taiwanese (N T = 126) population samples

A pairwise testing for heterogeneity using the χ2-test was applied to compare allelic distributions. Minor or no significant differences were found between the Polish sample and Czech [6], Danish [14], Finnish [15], and American-Caucasian [17] data sets. Correspondingly, the comparison between the Taiwanese sample and Asian-Americans [17] yielded no significant differences (0.032 < P < 1.000). On the other hand, among differences revealed between the Poles and the Taiwanese at 14 INDELs (P < 0.05), these at HLD131, HLD111, HLD118, HLD99, HLD48, HLD122, HLD64, HLD81, HLD39, and HLD84 remained significant after the critical value was corrected for multiple testing (Table 2). It is noteworthy that the same loci significantly accounted for diversity between Caucasian and Asian samples, based on North American datasets published elsewhere [17].

Table 2 P values of population differentiation tested by an exact test and population specific F ST indices per polymorphic locus (absolute values)

Wright’s F ST was analysed to measure population substructure effects [20]. AMOVA results revealed that most of the molecular variation was due to variation within the analysed populations (92.54 %) rather than among them, with average fixation index values of 0.0743 and 0.0749 (Poles and Taiwanese, respectively). Our findings correspond to those presented by other authors who used AMOVA to compare the allelic frequencies for each DIPplex locus in populations of Europe, Africa, Asia and North America [16, 17]. Moreover, in our analysis individual INDELs displayed noticeable disparities in fixation index spanning from −0.0004 to −0.0003 (HLD93) to 0.4260 and 0.4282 (HLD118) for Poles and Taiwanese, respectively (Table 2). The individual mutation rate of a locus is one of the factors that may explain the observed discrepancy [21]. However, when compared with mutation rates of 10−3–10−5 for STRs [22, 23], SNPs have essentially mutation rates estimated at as low as 10−8 [24]. From the point of view of forensic genetics, markers with high heterozygosity and very low F ST are potentially advantageous due to relatively high discrimination efficiency irrespective of population of origin [24, 25]. High heterozygosity enhances the polymorphism information at each SNP and low F ST diminishes the chance of interpopulation effects. Some SNPs are reported to have remarkably little variation in allele frequency around the world [26]. On the other hand, ancestry informative single-nucleotide polymorphisms (AISNPs) are required to show low heterozygosity and high allele frequency divergence between different ancestral or geographically distant populations (F ST values). These genetic markers are especially useful in establishing the high probability of an individual’s biogeographical ancestry [27, 28]. We have selected eight INDELs (HLD131, HLD111, HLD118, HLD99, HLD122, HLD64, HLD81, HLD39) with F ST higher than 0.1 between Poles and Taiwanese as potential AISNPs for further analyses. Other sets of population data are needed to verify the robustness of these loci.