Introduction

Insertion and deletion polymorphism which is known as (indel), is a type of genetic variation in which a precise nucleotide sequence is present (insertion) or absent (deletion). Compared to short tandem repeat (STRs), indels are found to give promising results for population genetic studies and forensic identification1.

Due to the benefits of their short amplicon lengths and the lack of stutter peaks, these markers can be successfully used to analyze degraded DNA samples in challenging forensic cases2. In this study, we have used a commercially available indel kit known as the Investigator DIPplex (QIAGEN, Germany) in a sample from the Bahraini population. This kit contains 30 indel markers, which are located on all chromosomes, and Amelogenin sex-informative marker3.

This paper contains population data obtained from 293 unrelated Bahraini individuals from different governorates. Bahrain being in the Arabian Gulf, connected to the Eastern Coast of the Arabian Peninsula, Iran, Iraq and Oman4. It is one of the most densely populated countries with estimates of Bahrain’s population stood at 1,314,562 persons. Of these, 568,399 are Bahraini citizens (46%) and 666,172 are expatriates (54%)5.

Because of the geographic location of Bahrain, the diversity of the population had been affected due to the prehistoric cultural events that took place and the migrations flow in this area6,7.

Materials and methods

Sample collection

Two hundred and ninety-three (293) buccal swabs were collected using cotton swabs (SceneSafe, UK) from healthy unrelated Bahraini males. The research study was publicized through different media platforms. Participants who wished to contribute their samples communicated with the corresponding author and presented at the General Directorate of Criminal investigation and Forensic Science—Kingdom of Bahrain to deliver their samples for the research after obtaining informed consent.

The age of the participants was ranged from 20 to 70 years old. Ethical review for conducting tests was obtained and approved by the Research and Research Ethics Committee (RREC) (E007-PI-10/17) in the Arabian Gulf University, Manama, Kingdom of Bahrain. All participants agreed to the informed consent which were provided prior to their contribution. All research was performed in accordance with relevant guidelines/regulations. In each case, males declared their ancestry (to the level of paternal grandfather) from four different geographical subdivisions of the country (Capital Governorate, Muharraq Governorate, Northern Governorate and Southern Governorate) were sampled.

DNA processing

Genomic DNA was extracted using QIAsymphony SP instrument (Qiagen, Germany) following magnetic beads principal. Subsequently, the extracted DNA was quantified using Quantifiler HP DNA Quantification kit (Thermo Fisher Scientific Company, Carlsbad, USA) in the 7500 Real-Time PCR System (Thermo Fisher Scientific Company, Carlsbad, USA) according to manufacturer’s recommendation.

About 0.5 ng of the extracted DNA was amplified using Investigator DIPplex kit (Qiagen, Germany) with full-volume reactions (10.5 µl) following manufacturer’s protocol in 30 cycles conditions via MicroAmp Optical 96-Well Reaction Plate (Thermo Fisher Scientific Company, Carlsbad, USA) along with the provided positive control (9948) and nuclease-free water as a negative control in a Veriti thermal cycler (Thermo Fisher Scientific Company, Carlsbad, USA) following the PCR thermal cycles provided in the DIPplex manufacture protocol.

The PCR products (1 µl) were separated by capillary electrophoresis in an ABI 3500xl Genetic Analyzer (Thermo Fisher Scientific Company, Carlsbad, USA) with reference to the BTO size standard (Qiagen, Germany) in total of 12 µl master mix consisting of BTO size standard and Hi-Di formamide (Thermo Fisher Scientific, Inc., Waltham, MA, USA). GeneMapper ID-X Software v1.4 (Thermo Fisher Scientific, Inc., Waltham, MA, USA) was used for genotype assignment in combination with the Investigator DIPplex Template Files and Qiagen DIPSorter software (Qiagen, Germany). Experiments were performed in the Biology and DNA Forensic Laboratory, Ministry of Interior, Kingdom of Bahrain which is accredited with Collaborative Testing Services (CTS).

Statistical analysis

Forensic parameters such as match probability (MP), discrimination power (PD), probability of exclusion (PE), polymorphism information content (PIC), number of alleles (Nall) and observed heterozygosity (Ho) and the insertion allele frequencies (+ DIP) and the deletion allele frequencies (-DIP) of the 30 indels were calculated using the STRAF online software (http://cmpg.unibe.ch/shiny/STRAF/)8.

Arlequin statistical software v3.59 was used to calculate Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) tests between all pairs of the 30 indels, and p values were corrected by the Bonferroni10.

Interpopulation pairwise genetic distances based on Fst calculated from allele frequencies of the population of Bahrain and the rest of populations extracted from the literature which included Kuwait11, UAE 12, Iraq13, Iran14, Turkey13, Slovenia13, Lithuania13, Bangladesh15, Indonesia15, and Japan15 using POPTREE2 software16 and represented by a nonmetric multidimensional scaling (NM-MDS) analysis using IBM SPSS Statistics v21.0 Software to investigate the populations structure between Bahraini population and the abovementioned populations based on Fst’s genetic distances.

In order to compare between different genetic structures of the populations, phylogenetic tree was constructed from allele frequency data using the neighbor-joining method17 via MEGA X: Molecular Evolutionary Genetics Analysis18. The tree was constructed with allele frequency data of thirty indel markers (HLD77, HLD45, HLD131, HLD70, HLD6, HLD111, HLD58, HLD56, HLD118, HLD92, HLD93, HLD99, HLD88, HLD101, HLD67, HLD83, HLD114, HLD48, HLD124, HLD122, HLD125, HLD64, HLD81, HLD136, HLD133, HLD97, HLD40, HLD128, HLD39 and HLD84) for all populations in corrected fixation index (Fst) using neighbor joining for phylogeny in 1000 permutations.

Ethics approval

Ethical review for conducting tests was obtained and approved by the Research and Research Ethics Committee (RREC) (E007-PI-10/17) in the Arabian Gulf University, Manama, Kingdom of Bahrain.

Consent to participate

All participants provided informed consent prior to contribution their buccal swab samples.

Consent for publication

All authors/participants provided consent for publication. All figures are generated from software indicated in the materials and methods.

Results

Allele frequencies, forensic parameters and efficiency

Allele frequencies, and forensic efficiency parameters for the 30 indel loci in the Bahraini population are shown in Table 1. The genotypes are available in supplementary material Table 1S.

Table 1 Allele frequencies and forensic parameters of 30 loci in 293 Bahrainis.

There was no deviation from Hardy–Weinberg equilibrium (HWE) after applying the Bonferroni correction value of P < 0.00017, except for HLD88 which was still deviated even with the correction. The expected heterozygosities (He) ranged from 0.413 to 0.501 with a mean value of 0.481.The observed heterozygosities (Hobs) ranged from 0.332 (HLD97) to 0.534 (HLD6 and HLD125) with a mean average of 0.450. Values for the polymorphic information contents (PIC) ranged between 0.328 and 0.375.

All markers were highly polymorphic and informative for forensic application using Bahraini population sample. To determine the forensic efficiency, we evaluated power of discrimination (PD), power of exclusion (PE) and matching probability (MP). The combined power of discrimination (CPD) and the combined power of exclusion (CPE) for 30 indel markers were 0.9999999999998110 and 0.99276, respectively. The combined MP was 1.89 × 10−13 for Bahrainis, allowing a reliable level of discrimination power in forensic cases. Regarding the allele frequency as indicated with deletion and insertion frequencies shown in Table 1, the deletion frequencies (DIP−) ranged from 0.291 (HLD64) to 0.658 (HLD77) with the mean of above 0.4. Insertion frequencies (DIP+) ranged from 0.342 (HLD77) to 0.709 (HLD64). Linkage disequilibrium tests (P < 0.000115 after Bonferroni correction) revealed no allelic association between all possible pairwise combinations of 30 indels, indicating the independence of the 30 indel markers as shown in Table 2S.

Interpopulation diversity

Determining the genetic structure of populations is becoming increasingly important in genetic studies19. To reveal population genetic similarities and divergences between Bahraini population and other populations previously reported, we have constructed the phylogenetic tree (Fig. 1) from allelic frequencies data (deletions and insertions values collected from each marker) by using the neighbor-joining (NJ) method via MEGA X: Molecular Evolutionary Genetics Analysis. Also, by applying the matrix of the Fst genetic distances to generate Multidimensional scale (MDS) plot (Fig. 2).

Figure 1
figure 1

Phylogenetic tree performed using Nei’s DA Distances for the 30 indels estimated among 11 populations.

Figure 2
figure 2

Multidimensional scaling plot (MDS) constructed from pairwise FST distances between 11 populations analyzed with 30 indel markers in the Investigator DIPplex Kit (stress: 0.0252, RSQ: 0.9982) between Bahraini population and other populations.

We have used 10 different populations along with the population of Bahrain: Kuwait11, UAE12, Iraq13, Iran14, Turkey13, Slovenia13, Lithuania13, Bangladesh15, Indonesia15, and Japan15. Fst values for allele frequency distribution between Bahraini population and the published groups are shown in (Table 2).

Table 2 Nei genetic distance matrix between Bahraini population and other populations showing the Fst values.

It is shown that Bahraini and Kuwaiti populations shared the most genetic relatedness than the other populations, along with the Emirati population. The rest of populations stood distant of genetic association with the Bahraini population. We have also constructed the MDS plot using IBM SPSS Statistics v21.0 Software, and it gave correlating results with the phylogenetic tree.

As Bahraini, Kuwaiti and Emirati populations gave the same clusters in the North East quadrant, while Irani, Iraqi and Turkish in the adjacent south cluster while Slovenian and Lithuanian populations in the South East quadrant. Figure 2 with good accordance to their geographic region.

Discussion

The forensic utility of 30 insertion-deletions polymorphism (indel) markers in a sample from the Bahraini population was successfully evaluated in this paper using the Qiagen Investigator DIPplex Kit. The deviation of HWE in HLD88 locus could be a result of high diversity of the studied population or due to the high polymorphism of HLD88 locus, which can also be supported by the PD and PM parameters. In earlier studies of autosomal STR20, it was indicated that the Bahraini population structure reflected the high level of endogamy, accounting for 20–50% of all marriages compared to other populations in the region21. Also, another explanation is the Wahlund effect within the communities; large number of homozygotes due to population substructure22.

As for the LD and after applying the Bonferroni’s corrections, it was shown that for all possible combinations located on the same chromosome indicated minor findings for departures from the independence. Therefore, these studied indels in different loci can be counted as independent for calculation of matching probabilities. We have compared Bahraini population data with other populations according to the available data using the accessible loci. Regarding the Interpopulation diversity, the phylogenetic tree was constructed based upon the data from the 11 populations which were consistent with other population data from the region based upon the Fst values obtained.

In order to measure the population differentiation due to genetic structure, Fst values are obtained for different populations. It is shown that the Bahraini population shares comparable results with its neighboring countries (Kuwait and UAE) based on the 30 indel markers which indicates that these population have more genetic flow than other distant population resulting in similar pattern of allele frequency distribution between them. Once more studies of Arab populations in the region become accessible, it may be more probable to develop a greater understanding of the genetic associations between the different populations for the Arabian Peninsula.

This study increases the population database relevant for the application of genetic markers in forensic studies and can be complementary to STRs population genetic studies in many challenging forensic cases. To conclude, this is the first study to report the allele frequencies and forensic statistical parameters of Bahraini population using the 30 insertion and deletion polymorphisms included in the Investigator DIPplex Kit. Interpopulation comparisons showed that differences were high among populations worldwide, which revealed that DIPplex Kit might be performed well in intercontinental forensic population analysis. The 30 indels markers consisting of straightforward genotyping procedure with low mutation rate and high level of information indicates a great potential in forensic investigations especially in cases where degraded or low quality samples gave partial/null profile using the conventional STR markers, or in paternity cases where additional set of markers are needed to increase the power of the evidence.