Background

The Kingdom of Saudi Arabia (KSA) is the largest Arab country constituting the bulk of 80% of the Arabian Peninsula. Before the foundation of the modern Saudi Arabia, it consisted of four distinct regions: Hejaz, Najd, Al-Ahsa, and Asir (Al-Rasheed 2013). Tribes in the KSA are actually the descendants of the peninsula’s original ethnic stock; therefore, a certain degree of ethnic heterogeneity is evident among both the sedentary as well as the nomadic populations of modern KSA.

Genetic variations in the KSA were contributed earlier by nomadic or Bedouin tribes and clans (Gordon 2005) living in small groups of Persians, Turks, black Africans, and other ethnicities originating from sub-Saharan Africa along the Red Sea coast (Bowen 2014). In addition, the annual pilgrimage (Hajj) to Mecca has long brought hundreds of thousands of migrants representing various ethnic groups from Arab (Jordan, Iraq, Yemen), Asian, and Far Eastern countries to the KSA who overstayed and settled in and around Makkah, Jeddah, and Medina, (Fig. 1) (Ochsenwald and Philby 2016). Nevertheless, majority of the native Saudi population subgroups in the northern, central (excluding Riyadh), western (excluding Jeddah and Makkah), southern, and eastern regions remained genetically distinct because of their adherence to the consanguineous marriage practice (El-Hazmi et al. 1995).

Fig. 1
figure 1

Map of Saudi Arabia showing neighboring Arab populations

The Y chromosome polymorphism has been studied widely for human migrations, forensic applications, and paternity analysis (Jobling and Tyler-Smith 2000; Quintana-Murci et al. 2001). The Y-STR markers are inherited without recombination down the paternal line with a little mutation and gene conversion (Rozen et al. 2003; Trombetta et al. 2010). These markers not only provide information on the male lineage relationship (Lowery et al. 2013) but also help in studying the local population structure and its demographic history (Roewer et al. 2005). Y-STR typing has become an important tool in forensic investigations because of its discrimination power and marked genetic variations which produced highly informative Y chromosome STR haplotypes. Due to the greater sensitivity of non-recombining Y chromosomal markers to founder effects and genetic drift, Y-STRs are very powerful in detecting genetic differences between populations (Heraclides et al. 2017; Iacovacci et al. 2017; Li et al. 2016).

Studies regarding Y chromosome genetic lineage and population genetic structure in Saudi Arabia are limited (Abu-Amero et al. 2009; Alshamali et al. 2009; Khurbani et al. 2018, 2019). In the present paper, we present analysis of Y chromosome haplotypes in 125 native Saudi males from different geographic regions of Saudi Arabia, using the AmpFℓSTR® YFiler® Amplification kit (Life Technologies, USA). We also compared our Y chromosome STR haplotypes to previously published Y chromosome haplotype data from Saudi Arabia and seven neighboring Arab populations (Fig. 1). It is hoped that findings of this study will add to the existing state of knowledge about the population genetics and distribution of Y-STR haplotypes in Saudi Arabia.

Materials and methodology

Sample collection

Approval of the Institutional Ethical Committee to conduct this study was obtained well in advance. Buccal swabs were collected from 125 parentally unrelated, fully informed and consented, as per Helsinki Declaration, native (until three generations), and healthy Saudi males from all the regions of Saudi Arabia (Fig. 2), including Riyadh, Al Qassim in central; Tabuk, Al Jawf, Al-Hudud Al Shimaliyah, Hail in northern; Madinah, Makkah in western; Asir, Jizan, Najran in southern; and Dammam, Al-Khobar, Jubail in Eastern provinces. Their 3-generation ethnicity was established by looking at their respective national identification (ID) cards. Information regarding their birth places were provided by the donor. All buccal swab donors were adults and came from different walks of life including teachers, businessmen, policemen, and university students. They were recruited from universities, schools, police stations, and shopping centers. None of the donors underwent bone marrow transplant, radiotherapy, frequent blood transfusion, and chemotherapy in the near past. Most of them were married and none of the participants had any known Y chromosome abnormality.

Fig. 2
figure 2

Map of Saudi Arabia showing 5 geographic regions with corresponding sample sizes collected from those regions

DNA extraction

Genomic DNA was extracted from buccal swabs using Chelex® 100 as described by Walsh et al. (1991) and quantified in the 7500 Real-Time PCR System using Quantifiler® Duo DNA Quantification Kit (Applied Biosystems, USA) to regulate the input quantity of DNA for PCR amplification of respective Y-STR loci.

PCR and capillary electrophoresis

Extracted DNA were amplified for 17 Y chromosomal STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS438, DYS439, DYS437, DYS448, DYS458, DYS456, DYS635, and Y-GATA-H4) by the multiplex assay using AmpFℓSTR® YFiler® Amplification kit (Life Technologies, USA) in HID Veriti® 96-Well Thermal Cycler. Amplified Y-STR fragments were size separated by capillary electrophoresis (CE) using the 3130 Genetic Analyzer® (Life Technologies, USA) following the manufacture’s protocol. GeneScan 500 LIZ was used as an internal size standard. Fragment size of amplified fragments was determined by GeneMapper ID-X Version 1.4 (Applied Biosystems, USA). Allele designation was based on comparison with the allelic ladder provided in the AmpFℓSTR® YFiler® Amplification kit (Life Technologies, USA). Amplified fragment analysis and YSTR typing were carried out according to the quality assurance standards recommended by the Scientific Working Group on DNA Analysis Methods (SWGDAM, 2014).

Statistical analysis

An online tool, “STR Analysis for Forensics (STRAF)”, developed by Gouy and Zieger (2017), was used to calculate the Y-STR allele frequency, gene diversity (GD), haplotype diversity (HD), and discrimination capacity (DC). Using the online Y Chromosome Haplotype Reference Database (YHRD) tool (Willuweit and Roewer 2015; https://yhrd.org/), population pairwise genetic distance (RST) and associated probability values p < 0.05 were calculated using AMOVA (Analysis of molecular variance) tool (Roewer et al. 1996) and visualized in a multi-dimensional scaling (MDS) plot for the following neighboring Arab populations: Jordan (Qahtanit), 114; Jordan (Adnanit), 50; Iraq, 124; Kuwait, 285; UAE, 191; Bahrain, 156; Yemen, 128; and Egypt (Qena), 52 (Table 1). The DYS389I was subtracted from DYS389II as recommended by YHRD to calculate AMOVA and MDS.

Table 1 Neighboring Arab population’s reported Y-STR haplotypes and corresponding YHRD accession number

Results and discussion

YSTR profiling has been considered as a vital tool for forensic investigation of cases like sexual assault (Maiquilla et al. 2011), missing persons (Coble et al. 2009), and kinship (Barra et al. 2015). Other applications include population genetics, anthropology, and epidemiology studies investigating the risk of prostate cancer (Paracchini et al. 2003; Hameed et al. 2015). Because of its crucial geographical location in the Arabian Peninsula and in the Gulf of Oman, several authors have studied Y chromosome diversity in native Saudi population employing Y-STR technology (Cadenas et al. 2008; Abu-Amero et al. 2009; Alshamali et al. 2009; Khurbani et al. 2018, 2019). The current report presents the population data for 17 Y-STR loci among 125 adult, native, Saudi male volunteers recruited from different geographic regions of Saudi Arabia (Fig. 2).

The quality of the study sample greatly affects the outcome of the population genetics studies. For example, Shringarpure and Xing (2014) reported that the accuracy of population stratification and recovery of individual ancestry are greatly affected by the sampling bias in the data collection process. Other studies have shown that sample selection bias can affect population structure analysis of genotype data, genetic ancestry of individuals, and evolutionary history of a certain population (Rosenberg et al. 2002; Patterson et al. 2006). Most of the studies carried out in the Saudi population (Cadenas et al. 2008; Abu-Amero et al. 2009; Alshamali et al. 2009; Khurbani et al. 2018, 2019) are based upon the sample collected either from Saudi blood banks, hospitals, forensic casework samples, or from native Saudis living abroad who are mostly self-declared and are not subjected to any type of further verification therefore, lacking the reliable ethnic or demographic originality that may affect, to some extent, the outcome of population genetic parameters.

The present study is the first study from Saudi Arabia in which samples were collected through a well-designed questionnaire served by a trained field worker assuring the acquisition of accurate ethnic data up to three generations to confirm the actual geographic descent. Moreover, the geographic location of each participant was not recorded on the basis of his current place of residence (as done in most of the previous studies) but rather on the basis of the birthplace of the volunteer’s great grandfather. Therefore, slight differences in certain population genetic parameters are expected in the present study.

Distribution of Y-STR haplotypes in a sample of 125 native, unrelated Saudi individuals were analyzed, and 102 different Y-STR alleles and 106 Y-haplotypes were observed. Ninety-one (85.8%) of the 106 haplotypes were unique, while the remaining 15 (14.2%) were shared; 12/125 (9.6%) haplotypes were repeated twice and 3/125 (2.4%) haplotypes were shared by two individuals. The most frequent haplotype was H23 (14,10,30,23,13,11,12,13/18,10,11,14, 20,14,19,21,11) which was shared by four (3.2%) individuals (Table 2). Although the Arabian Peninsula is the region where numerous migrations between Africa and Asia took place since ancient times, our results showed an average degree of haplotype diversity among the Saudi Arabian population most probably due to consanguinity practice and moderate sample size.

Table 2 Y chromosome haplotype analysis of 17-YSTR markers in male Saudi population (n = 125)

Table 3 shows the distribution of YSTR alleles, their corresponding allele frequency, gene diversity (GD), haplotype diversity (HD), and FST or genetic distance. The maximum number of YSTR alleles (n = 11) was seen at the locus DYS 385b followed by DYS 385a (n = 10) and DYS 635 (n = 8) indicating their high degree of polymorphism. The least polymorphic YSTR loci were DYS 3891,391,437 and YGATA-H4 with each locus having 4 alleles. The maximum HD (0.817) was observed at the locus DYS 458 followed by the locus DS385b (0.787) and DYS 392 (0.684). The locus DYS 437 showed the least HD (0.155). The discrimination capacity (DC) calculated for 17 YSTR loci in the Saudi male population was 85.85%. In a recent report, Khurbani et al. (2018) studied a sample of 597 Saudi individuals from 5 geographic regions of Saudi Arabia using 27-YSTR Yfiler® plus and reported a DC of 95.3%. However, when they studied the same sample using 17-YSTR Yfiler® kit, their population DC declined to 74.7% which is considerably lower than what we have reported in the present study (85.85%) using the same 17-YSTR Yfiler® kit. This may be due to ethnic authenticity of our studied sample compared with the study of Khurbani et al. (2018) which had 15% of their Saudi volunteers recruited from the UK.

Table 3 Frequency Distribution of 17Y-STR haplotypes among native male Saudi Population (n=125)

In the present study, YSTR locus DYS385b showed the highest gene diversity (GD) (0.807) followed by DYS458 (0.800) and DYS385a (0.686). The loci with the least GD were DYS437 (0.222) preceded by DYS392 (0.299) and DYS389I (0.355) (Table 3). The diversity of the Y chromosome is affected by factors such as the effective male population size, genetic drift, male behavior, marriage systems, and male patterns of migration (Jobling and Tyler-Smith 2003). The range of polymorphism and associated mutational properties makes Y chromosome the best candidate to answer many forensic, anthropological, population genetics, and evolutionary questions (de Knijff 2000; Jobling and Tyler-Smith 2003). Previous studies suggest that Saudi Arabia has a strategic position between Asian and African populations (Luis et al. 2004). The genetic structure of Saudi Arabia has been modulated by gene flow from Asian and African surroundings (Abu-Amero et al. 2009).

A total of four null alleles appeared in our study, one each in the haplotype H30, H31, and H64 at the locus DYS 458 and one in H75 at the locus DYS 456 (Table 2). A previous study by Chandler has shown that the YSTR locus DYS 458 has the highest mutation rate of 0.00814 followed by the locus DYS 456 showing a mutation rate of 0.00735 (Chandler 2006). As well as in a worldwide collaborative study, 137 null alleles were identified at 17 of the 23 Y-STR loci. The occurrence of null alleles has been associated with the mutation rate of the locus in question.

It was also observed that DYS385a/b to be the most informative marker having 21 complete alleles. In addition, it also showed micro variant allele 17.1 at DYS 385a indicating one base pair deletion within or far from the repeat regions (Butler 2011). Such partial repeat variant occurring at a low frequency may be useful in understanding the Y chromosome diversity and recent migrations.

The haplotypes seen in our studied regions of Saudi Arabia were compared with the published data haplotypes of seven neighboring Arab populations using the YHRD database. As observed in the present study, the RST values of the Egyptian (Qena) and Iraqi populations are closer to the Saudi population and are at equigenetic distance (RST 0.0018) with Saudi Arabia. Yemenites are slightly distant (RST 0.0022) from Saudi Arabia as reported by Abu-Amero et al. (2009) and Alshamali et al. (2009), but still closer than Abu Dhabi (RST 0.0028) and Kuwaiti populations (RST 0.0028) which is parallel to Triki-Fendri et al. (2010). Bahrain, although being the geographically nearest country to the kingdom of Saudi Arabia, yet genetically the most distant country (RST 0.0155) from the Saudi population (Table 4, Fig. 3). The most distant population from Saudi Arabia are the Arab Qahtanits in Jordan showing an RST value of 0.0146, the highest in the present study, followed by the Adnanit Jordanians (RST 0.0106). Al-Zahery et al. (2011) described a common ancestral origin of the Marsh Iraqi Arabs from South Arabian Peninsula. Moreover, Jordan has 70% of its population of Palestinian origin (González et al. 2008; Flores et al. 2005). This therefore was reason for its high genetic distance.

Table 4 Matrix of the pairwise FST genetic distances between native Saudi population and eight neighboring Arab populations (below diagonal) obtained for 10,100 permutations (s.e. ≤ 0.0038)
Fig. 3
figure 3

Multidimensional scaling (MSD) based on pairwise RST genetic distances of native Saudi populations and seven other neighboring Arab populations (Egypt, Iraq, Jordan, UAE, Yemen, Kuwait, and Bahrain)

Conclusion

By providing the population data on the genetic variations at 17 YSTR loci in a sample of the native Saudi male population (n = 125), an attempt has been made to develop an understanding about the genetic relationship between Saudi Arabia and the neighboring Arab population. Our results show that the Saudi population is genetically closer to the Iraqi, Qena (Egypt), and Yemen (Sana) populations than the Kuwaiti, Abu Dhabi (UAE), Bahrain, and Jordan population. According to our findings, the Saudi population lacks patrilineal homogeneity across the entire region, being homogeneous at one place and partly heterogeneous in others (data not presented here). This may be due to the highly conserved social culture, practice of consanguineous marriages in certain regions, and religious or historical migration to Makkah, Medina, and Jeddah. Unfortunately, because of the limited sample size from different geographic regions of Saudi Arabia, an independent forensic and population statistics could not be performed. Further studies are, therefore, needed to establish precise patrilineal inheritance in the Saudi population and explore its relationship with neighboring Arab countries.