Introduction

In the last 20 years, enormous progress was made in forensic science, mainly due to the development of DNA typing technologies. Forensic DNA analysis started from variable number tandem repeats, which, however, turned to be not suitable for degraded DNA. Nowadays, short tandem repeats (STR) are used in most forensic laboratories around the world. Many commercial STR kits with validated population data are currently available for routine forensic identification. Nevertheless, the relatively high size of amplicons (150–500 bp) makes the most of them unsuitable for degraded DNA and low copy number samples. Two approaches are used to analyze such samples: mini STR application (amplicons <150 bp) and single nucleotide polymorphisms (SNP), whose amplicons could be much shorter (<50 bp).

Recently, forensic scientists turn their attention to a different class of polymorphisms in human genome—insertion–deletion polymorphisms (INDELs). These polymorphisms originate from single mutation effect and have low mutation rate in comparison with STRs. They are present in the entire human genome, approximately one INDEL per 7.2 kbp [1]. The exact number of INDELs in human genome is unknown because results differ between studies, most likely due to the use of different techniques for INDEL detection [2]. INDELs represent about 20% of all polymorphisms in human genome [3]: high proportion of them (36%) maps within the promoters, introns, and exons of known genes [1]. Mills et al. [1] identified five major INDEL classes: (1) insertions or deletions of single base pairs, (2) monomeric base pair expansions, (3) multi-base pair expansions of 2–15 bp repeat units, (4) transposon insertions, and (5) INDELs containing random DNA sequences ranging from 2 to 9,989 bp in length. Approximately 40% of INDELs belong to the last class; more than 99% of them are shorter than 100 bp [1].

INDELs like most of SNPs are diallelic. The size range of amplicons used for forensic examination of SNPs and INDELs is also comparable (50–150 bp) [46]. INDELs with their reduced mutation rates [7] are particularly appropriate in paternity cases where mutation in STR loci may occur [6]. INDELs can also be analyzed as STRs with no change of laboratory workflow and can be highly multiplexed. Despite indisputable INDEL advantages, the application of INDELs in routine forensic investigation and paternity testing remains still rare.

Here we present validation study of new INDEL kit—DIPplex (QIAGEN, Germany). DIPplex kit contains 30 INDEL markers, which are located almost on all chromosomes, and amelogenin locus as sex-informative marker (see Electronic supplementary material). This paper contains population data obtained on 55 unrelated Czech population members. These data were used for calculation of forensic and population parameters. Finally, we analyzed 11 trios (mother–child–alleged father) to test DIPplex kit suitability for parentage testing.

Materials and methods

Samples and DNA extraction

The study was carried out according to the SWGDAM guidelines [8]. Only samples with complete DNA profile were used for population study. For forensic parameters determinations, complete genotypes of 55 unrelated Czech population members were used. For paternity casework 11 mother–child–alleged father trios with proven paternity based on microsatellite analysis (PowerPlex 16 HS, Promega, USA) were randomly selected. DNA was isolated from buccal swabs using QIAmp Blood Mini Kit (QIAGEN, Germany).

PCR amplification and genotyping

PCR reactions were prepared according to the protocol recommended by the manufacturer of the DIPplex Kit in a Veriti 96-Well Thermal Cycler (Applied Biosystems, USA). Several concentrations of DNA were tested (0.2, 0.5, and 1 ng); best results were obtained with 0.5 ng of DNA. XY5 was used as a control DNA to test performance of the DIPplex Kit.

PCR products were subsequently analyzed on capillary electrophoresis. Samples were mixed with Hi-Di Formamide (Applied Biosystems) and BTO 550 size standard (QIAGEN). Fragment detection and separation were performed using ABI PRISM 310 Genetic Analyzer, G5 filter set, and POP4 polymer. Samples were then genotyped using Genotyper v3.7 software (Applied Biosystems).

Statistical analysis

Allele frequencies were determined by counting. Pearson’s χ 2 test for R×C and Fisher’s exact test with subsequent Bonferroni correction for multilocus testing were employed to test the deviation of allele frequencies from Hardy–Weinberg equilibrium [9]. Gene diversity, polymorphism information content, homozygosity, heterozygosity, and within-population inbreeding coefficient were determined [10]. Probability of a match (PM), power of discrimination (PD), and average paternity index (API) were calculated for each locus and subsequently for whole multiplex as well. Chi-square test was used to calculate differences of allele frequency between Czech populations and other populations (German, European, Asian, African, and Native American). All statistical parameters mentioned above were calculated using in-house-developed applications. Linkage disequilibrium analysis was performed using SNPAnalyzer v1.2 (Istech, South Korea).

Results and discussion

In this study we tested: (1) DIPplex Kit eligibility for routine use in forensic laboratories, (2) suitability of DIPplex Kit for application in Czech population, and (3) INDELs applicability for parentage testing.

Multiplex performance

We realized that DIPplex PCR multiplex is very sensitive and requires very low amount of DNA. However, samples with high DNA concentrations (more than 0.5 ng per PCR reaction) cause pull-up peaks and abnormally shaped peaks. One of the most frequent reasons for sample exclusion from population data were allele and locus dropouts occurred in several loci (rs1610905, rs2307956, and rs1610937). Interestingly, there was no correlation between amplicon size and allele or locus dropout. Further optimization of buffer content and primer concentration would be very helpful in this context.

Genetic variation

Allele frequencies and percentages of homozygotes and heterozygotes in each locus were tested to detect whether loci are in Hardy–Weinberg equilibrium. We used Pearson’s test which showed no deviation from Hardy–Weinberg expectation. We also used Fisher’s exact test; P value was adjusted by Bonferroni correction for 30 loci and P < 0.00167 was considered statistically significant (see Electronic supplementary material). There was no deviation for all 30 loci from Hardy–Weinberg proportions. Average heterozygosity was 0.457. Average gene diversity reached 0.496. All 30 loci have minor allele frequency higher than 0.25. Obtained data indicate that all selected loci are highly polymorphic.

Population comparison

Allele frequencies of 24 loci in DIPplex kit in four populations (general European, German, American/Indian, Asian/Japanese, African) were compared with our results using chi-square test at 0.05 significance level (see Electronic supplementary material). Loci rs17879936, rs8190570, rs17174476, rs17878444, rs3081400, and rs2307433 were tested only in our study and on the German population. We detected statistically nonsignificant differences between Czech and European populations and Czech and Asian ones. After comparison of Czech and American/Indian population, statistically significant difference was found in rs16388. In African population, there were two loci (rs1610937, rs6481) in which allele frequencies are statistically different from the Czech population.

We also compared four INDEL loci in DIPlex kit (rs2307959, rs1305047, rs16438, and rs1611001); they were involved in the study by Pimenta and Pena [6]. The differences between our study and published results were nonsignificant. These findings indicate that selected markers have high discrimination power among different populations. This is an essential assumption for a robust forensic kit.

Linkage disequilibrium

To test whether loci that are located on the same chromosome show allelic association, we applied linkage disequilibrium test. SNPAnalyzer v1.2 as an internet-based software was used. This test showed no significant allelic association between all 30 markers, so they could be treated as independent ones.

Forensic efficiency

Forensic parameters were also evaluated in this study. We tested the efficiency of DIPplex Kit by PM and PD. PM was 1 in 6.8 × 1012, which is lower than PM for STR kits, but it is comparable with other INDEL-based studies (Table 1) [46]. Combined PD reached 99.9999999999% (see Electronic supplementary material). In every locus API was higher than one, therefore proving usefulness of selected loci in paternity testing. Combined PI was about 27,000, so probability of paternity assuming 50% prior probability was 99.99%. Although obtained value is less than probability of paternity for Combined DNA Index System (CODIS) loci, it is still sufficient for paternity to be practically proven. For example, probability of paternity in one case increased from 99.99999 to 99.9999999999 by combining both STR (PowerPlex 16 HS) and INDEL (DIPplex) analysis (Table 2). Despite that we did not perform linkage disequilibrium test between STR markers in PowerPlex 16 HS and INDEL markers in DIPplex Kit, minimal 10-Mb distance between INDEL and STR markers is in our opinion enough for treating all markers as independent ones. Therefore DIPplex kit could be used independently in simple trios but more probably as an additional panel of markers in complicated kinship analysis [11]. With regard to the problem of paternity tests, we also measured informativity of this marker set for kinship testing. Since marker set informativity equals twice the amount of INDELs multiplied by their average gene diversity [12], we received the value 29.76 in our case, which equals to the amount of markers in the DIPplex kit. Marker set informativity of the DIPplex kit in the Czech population reveal an exclusion power at a similar level of 29.76 maximally informative SNPs. The published studies employed similar numbers of INDEL markers, Pimenta and Pena [6] used 40 markers, Pereira et al. [5] 38, and Li et al. [4] 29 markers. As the markers involved in these studies were different with no or minimal overlap, it seems that the forensic parameters are dependent mainly on the total number of examined INDELs (Table 1).

Table 1 Comparison of combined probability of a match for different INDEL sets and STR kits
Table 2 Example of combining STR and INDEL analysis for paternity testing

Conclusions

In this study we presented validation of a new DIPplex Kit (QIAGEN) for the use in the Czech population and we compared our data with few existing INDEL-based studies published till today [46]. The kit contains a panel of 30 INDEL loci. Since we found no LD between all 30 markers, the whole markers panel can be treated as an independent loci set. No significant differences in allele frequencies from European and Asian populations were detected. The API and CPI values achieved in our study led to the conclusion that the DIPplex Kit could be applied as an addition to STR markers for solution of complicated kinship cases, thus increasing power of DNA evidence. The DIPplex Kit could be used as an independent assay for paternity testing with the reduced risk of mutation events during the gametogenesis. However, occasionally paternity indices obtained using the DIPlex kit only may not reach levels comparable to microsatellite CODIS loci. After optimization of buffer content and primer design by the manufacturer, the DIPplex Kit could be used also for analysis of degraded and/or inhibited samples. Data obtained from DIPplex kit could be analyzed by tools used for routine STR analysis and do not require any change in laboratory workflow.