Introduction

Rice (Oryza sativa L.) or Asian cultivated species (indica and japonica) are among the top agronomic and nutritionally essential crops worldwide. The indica genotypes are tropical rice cultivars that are grown in lowland conditions. In contrast, the japonica genotypes can be either tropical cultivars adapted to rainfed upland conditions or temperate cultivars adapted to lowland conditions. Based on literary evidence [1], it is believed that rice was introduced into Kurdistan region-Iraq around the 12th century BC.

Rice is the leading consumed food in Iraq and Kurdistan. In the south and middle of Iraq, amber (long-grain) rice is cultivated, and in northern Iraq (the Kurdistan Region), long- and short-grain rice is cultivated [2]. Approximately 70% of rice is imported, and only 30% is produced locally. Improving the quality and quantity of rice production is an important point for farmers in the Kurdistan region (Erbil, Sulaymaniyah and Duhok).In Kurdistan, there are no published data about genetic diversity in rice. The only available information is the local name from the farmers based on the paddy and grain trait variation.

Rice consumption in Iraq was estimated to be 45.7 kg of milled rice per person in year 2017 according to Helgilibrary [3]. The production season “planting start at June–July and harvesting are on September–October” in the middle and south of Iraq [4]. In the Kurdistan Region, the planting season begins in April or May, and the plants are harvested in October [5].

According to Jeong et al. [6], the average of rice import have doubled in the world and Iraq is among the top ten countries of milled rice import. The total global rice trade expected growth of 1.37% annually over the next 10 years because of the high demand from import countries [6]. There is a lack of data for rice production in the Kurdistan region. Regarding the assessment of Ewaid et al. [7], the average rice production from 2007 to 2016 was 271,173 tons in all provinces of Iraq, excluding the Kurdistan region.

Researchers have conducted numerous successful studies worldwide defining the genetic diversity of rice using different molecular markers. The Random Amplified Polymorphic DNA (RAPD) was started by [8, 9]and then combined with the simple sequence repeat (SSR) [10,11,12]. Amplified fragment length polymorphism (AFLP) was also used for the analysis of the genetic variability of rice [13, 14] in conjunction with SSR [15]. Additionally, the single-nucleotide polymorphism (SNP) marker [16,17,18] was used along with the SSR [19, 20]. Among these DNA markers, SSR is considered the best marker used to study the genetic diversity of rice in the last two decades [21,22,23].

There is no published data about genetic diversity and breeding program of rice in the region. The only activity is to select plants that showed good performance under local climate conditions by the farmer. This study will be the first step toward building a breeding program and diversity analysis of rice accessions grown in the region.

Materials and methods

Plant materials

During 2019, samples of approximately 100 rice accessions were collected from the farmer fields and different research institutes of the Iraq and Kurdistan region.The samples were classified based on morphological characteristics, such as seed color, grain size, presence or absence of awns, life cycle, and geographic location (Table S1). A collection of 62 rice accessions (50 from farmers, 9 from the Directory of Agricultural Research in Sulaymaniyah, and 3 from the Al-Mishkhab Research Center) were selected and planted in 2020 for molecular investigation (Fig. 1).Seeds were soaked for 3 days in the lab and on May 20, transferredtofield at Preamagrun-Gaba villagelocation between Latitude 35° 43′ 14.35′′ N Longitude 45° 04′ 32.02′′ and harvested in late October. Cultural practices including irrigation, weed control and fertilizers were conducted during the season.Fresh leaves from each accession were collected from 25-day-old seedlings.

Fig. 1
figure 1

Iraq map showing provinces and locations of rice sampling points

DNA Isolation

Fresh, clean leaves from each accession ground with a pestle and mortar with liquid nitrogen. A total of 150 mg of the ground samples was used for genomic DNA extraction using the Quick-DNA™ Plant/Seed Miniprep Kit, Catalogue No. D6020 (Zymo Research, Irvine, CA, USA). The quantity and quality of DNA were determined by a NanoDrop ND-2000/2000c spectrophotometer (ThermoScientific, USA)and represented by 1% agarose gel in 1X TBE buffer. The gel was viewed using a Labnet gel documentation (LabNet International Inc., Edison, NJ, USA).

PCR amplification of SSR primers

A total of 37 polymorphic SSR markers were obtained from the Gramene database [24] based on their polymorphism information content (PIC) values in previous research (Table S2) after primer screening and used for all 62 rice accessions. The total PCR volume was optimized to 20 µl and included 2 µl of approximately 15 ng DNA template, 6.0 µl PCR master mix of AccuPower® PCR PreMix (Bioneer, Korea), 1.5 µl of each primer (forward and reverse primers), and 9.0 µl of double-distilled water. The PCR protocol was followed, amplification was carried out using a thermocycler (Applied Biosystems), and the PCR conditions were as follows: 5 min at 95 °C; 35 cycles of 50 s at 95 °C; 50 s at annealing temperature (Table S2); and 50 s at 72 °C, followed by 7 min at 72 °C. The amplified products were visualized by ethidium bromide-stained 3% metaphor agarose gels in 1% TBE along with a 100 bp DNA ladder (Add Bio). The gel was viewed using Labnet gel documentation (LabNet International Inc., Edison, NJ, USA), and the gel picture was analyzed using the sample band with a DNA ladder for the band size of each SSR primer. The SSR data were analyzed using POPGENE v1.32 [25] software to determine the allele frequency, Na, Ne and gene diversity per locus in each accession.

Structure analysis

To detect the subpopulations of the genotypes, STRUCTURE 2.3.4 software [26] was used. The run parameters were set up as 100,000 burn-in periods and 100,000 Markov chain Monte Carlo (MCMC) replications. The K value set up from 2 to 10 and 10 replicate runs were performed for each value of K. For selecting the best number of K (subpopulation), Structure Harvester was used [27].

Genetic diversity

A dendrogram of rice genotypes was generated based on the unweighted pair group method with arithmetic mean (UPGMA) method via PowerMarker v3.25 [28] and then visualized using MEGA X [29]. GenAlEx V6.5 [30]was implemented for calculating principal coordinates analysis (PCoA) and analysis of molecular variance (AMOVA).

Results

Allele frequency

Sixty-two rice accessions were investigated using 37 polymorphic SSR markers. All genotypes were collected from Iraq, 50 from farmers in the Kurdistan region (Erbil, Sulaymaniyah and Duhok), nine from the Directory of Agricultural Research in Sulaymaniyah, and three from the Al-Mishkab Rice Research Station near Najaf. A total of 37 polymorphic SSR markers sequence details are available at (Table S2) were selected for genotyping these collections after primer screening. Allele frequency results (Table 1) show Na 152 with an average of 4.1 alleles per locus in this investigation. The maximum Na of 7 alleles was observed for primers RM20, RM257, and RM294.A minimum of 2 alleles were in RM23, RM171, RM172, RM321, and RM7376.The sum of Ne was 75.166 with an average of 2.03 alleles. The highest values of Ne were recorded as 4.544 and 3.827 alleles for RM204 and RM250, respectively, and the lowest value was 1.105 for RM171. Gene diversity ranged from 0.780 (RM204) to 0.095 (RM171), with an average of 0.442. The average of SSR’s PIC was 0.404. Higher PICs of 0.748, 0.694, 0.665 and 0.659 were recorded for loci RM204, RM250, RM507 and RM294, respectively. While the lowest PIC value was 0.090 in locus RM171.

Table 1 SSR markers, chromosome locations and allele frequency across 62 accessions

Structure analysis

The population genetic structure of 62 rice genotypes was obtained from the STRUCTURE program using SSR genotypic data. The study presents crucial evidence for defining the right population structure of Kurdistan region rice genotypes. The best K (subpopulation) was estimated by Structure Harvester and is presented in Fig. 2, which indicated 3 (K = 3). Consequently, the K = 3 subpopulation results were chosen from STRUCTURE and illustrated in Fig. 2. The genotypes in the blue cluster (C1) represent 45 short-grain rice, the red color cluster (C2) represents eight short-grain rice, and the green color cluster (C3) represents nine long-grain rice. All genotypes assigned to these subpopulations were considered pure because they scored more than 0.80, as shown in Fig. 2.

Fig. 2
figure 2

a The number of estimate subgroups (K) ranging from 2 to 10 by Structure Harvester, and b populationstructure of 62 rice for K = 3 revealed by 37 SSR markers and structure program

Genetic diversity

A distance matrix (dissimilarity matrix) result was obtained from PowerMarker [28], and the tree was visualized using MEGA X [29]. The UPGMA clustering method for generating trees was implemented. As shown in Fig. 3, the results of the UPGMA circle tree with three main clusters, blue (45), red (8), and green (9), are the same as the STRUCTURE results. The present study shows a structural analysis of rice genotypes from Iraq with three clusters (C1, C2, and C3) using 37 SSR markers. Additionally, the phylogenetic tree result confirms the STRUCTURE result with the same three clear clusters. However, C1 and C2 were apparent in the same subdivision (Fig. 3), and C3 separated independently. In addition, all grains in the clusters (C1 and C2) are short-grained rice, but the C2 grains are colored, while the C3 samples are long-grained. Furthermore, from C1 and C2, only one genotype (R 49) is registered in the International Rice Genebank Collection under the name of Bazian (IRGC 9506). The Bazian accession is the preferred rice consumer locally among the short grains. In addition, (R42 and R54), which are locally known under the name of Tahalf or Alliance rice, were introduced to the region by the Food and Agriculture Organization (FAO) in the late 1990s. From C3, the genotypes (R43 and R44) are registered under the name Ambar (IRGC 9505). All three varieties from the Al-Mishkhab research center are clustered under the C3 indica subpopulation (R43-Ambar-furat, R44-Ambar-muaazra, and R45-Yasamin).

The genotypic data were rearranged based on the results obtained from STRUCTURE and PowerMarker. Then, PCoA and AMOVA analysis were performed using GenAlEx. Principal coordinates analysis (PCoA) showed an apparent variance between rice subpopulations (Fig. 3). They were clearly distributed, as shown in the central coordinates (1 vs. 2). Subpopulation C1 was allocated in quadrant 1, subpopulation C2 in quadrant 4, and subpopulation C3 in quadrant 2. Additionally, subpopulations C1 and C2 were more closely related than C3, which agrees with the phylogenetic tree output. The percentage of variation explained by the first 3 axes was 77.31% of the cumulative variation (Table 2).

Fig. 3
figure 3

a Circular dendrogram estimated through UPGMA, and b principal co-ordinates analysis of 62genotypes of rice based on 37 SSR markers

Table 2 Percentage of variation explained by the first 3 axes

Table 3 shows the results of AMOVA based on 37 SSR markers (input as allelic distance matrix for F-Statistics analysis). The percentage of molecular variance for the three subpopulation outputs revealed that 71% of the variance was due to the variation among populations and 29% within the individuals (Table 3). The estimated variance among individuals (within single populations) is zero (0). A fixation index (FST) value of 0.726 was recorded at a significant level (P value = 0.001), and the gene flow (Nm) value was 0.095. Furthermore, AMOVA suppresses the source among individuals within populations. A small change was obtained in 72 % (among populations) and 28% (within populations). Finally, the AMOVA results based on PhiPT (Table 4) revealed a large change in the percentage of molecular variance of 75% (among populations) and 15% (within populations).

According to the pairwise population FST results (Table 5) below the diagonal, pairwise FST values of 0.685, 0.745, and 0.751 were recorded between subpopulations C1 and C2, C1 and C3, and C2 and C3, respectively. The Nm results (Table 5) above the diagonal showed that the highest Nm occurred (0.115) between C1 and C2 and the lowest (0.083) between C2 and C3.

Table 3 Analyses of molecular variance for the studied rice accessions
Table 4 Analyses of molecular variance (PhiPT) for the studied rice accessions
Table 5 The pairwise FST (below diagonal), Nm (above diagonal) among rice subpopulations

Discussion

Evaluation of genetic diversity is an important factor for rice germplasm collection and breeding program. In this study, 37 SSR markers selected for genotyping 62 accessions of rice. The allele frequency values in the present study are similar to those found by [31, 32, 34]. They reported 2 to 7 alleles per locus in their genetic studies of rice from south Asia and India. The average gene diversity (0.442) in this study agrees with previous results [23, 32]. Most researchers investigating the genetic diversity of rice using SSR markers have utilized PIC as an indicator in their studies for the capability of markers to detect polymorphisms. Singh et al. [20]demonstrated that the PIC value depends on many factors, such as diverse germplasms, population sizes, genotypic methods and oligonucleotide marker loci in the genome. Therefore, different PIC values were reported as 0.240 [34], 0.416 [32], 0.420 [19], 0.483 [23], 0.560 [40], 0.570 [41], 0.630 [31], 0.704 [42], and 0.738 [43].

SSR is considered the best marker used to study the genetic diversity and characterizing of rice germplasm in the last two decades [21,22,23].The genetic structure analysis of any population is determined by the number of molecular markers that are used in any investigation. Zhang et al. [23] reported that 72 SSR markers are sufficient for population structure analysis of 150 rice varieties. Based on that, 30 SSR markers are sufficient for investigating the population structure analysis of 62 rice genotypes in the present study. However, we have used 37 SSR markers to be more reliable.

The short-grain rice accession clusters grouped under C1 and C2, and the long-grain accessions grouped under C3 based on a circular dendrogram (Fig. 3). The results suggest that clusters C1 and C2 are classified as japonica and cluster C3 indica subpopulations. These results are in agreement with those obtained by [21] in the eastern Himalayan region of northeast India, in Thailand [40], in Egypt [41], and by [32], who indicated two major (japonica and indica) subpopulations. In China, [43] identified three major groups: indica, temperate japonica, and tropical japonica.

In the present study, the allelic form of SSR was used, which is the most standard SSR method to obtain the AMOVA results by GenAlEx. A high level of genetic variation among populations of 71% was found among the studied rice accessions (Table 3). When the allelic form of SSR was used and there were differences among individuals within populations, there was no need to suppress this source of variance. However, when the estimated variance among individuals (within single populations) was set as zero (0) (Table 3), the data estimate was slightly negative (not significantly large differences among individuals were found).Then, within-individual analysis could be suppressed (Table 3). In addition, some researchers use PhiPT, where each diploid genotype is treated as a unit (in quantitative fashion). Because they want to know how different the genes are, from one individual to the next (track of individuals) rather than individual alleles, this will be notified by the degree of freedom (df) in the AMOVA table. In our case, when we used PhiPT to obtain AMOVA results, the percentage of molecular variance results varied. The variation among populations increased to 85, and within the population, it decreased to 15 (Table 4).Finally, all methods of AMOVA revealed high genetic differentiation among the populations, but it was low within the population. Similarly, Ab Razak et al. [16] reported 69 variance among the two populations and 29 variance among individuals in Malaysian rice varieties based on 916 SNP markers. Others reported considerable variance among individuals: 66 % [21], 74% [41], 86% [32].

Genetic differentiation based on FST values was determined by Wright [44]. They were low differentiation (FST = 0.00–0.05), moderate differentiation (FST = 0.05–0.15) and a high level of differentiation (FST of >0.30) [45]. In this study, a high level of genetic differentiation (FST = 0.726, p < 0.001) was indicated between subpopulations C1, C2 and C3 (Table 3). The lowest pairwise FST value of 0.685 was recorded between subpopulations C1 and C2, and the highest of 0.745 was recorded between subpopulations C2 and C3. Similarly, Verma et al. [46]Gouda et al. [47] Suvi et al. [48] reported a very high genetic differentiation FST of 0.827, 0.490 and 0.407 among subpopulations, respectively. The Nm value (0.095) indicates a very low or limited gene flow between subpopulations (C1, C2 and C3) (Table 4). According to [49],a value of Nm less than one will indicate the limitation of gene exchange between populations.

The PCoA, AMOVA, FST and Nm results were the key sources for finding the problem (very low gene flow) in the rice accessions in the region. There is a possibility that most of the individuals within a population are very close, and most farmers are not exchanging seeds. The government is distributing no seeds to farmers. Therefore, each year, the farmers keep a part of their seed for planting the following year.

Conclusions

In conclusion, the molecular diversity of the rice accessions in the region was divided into indica and japonica subpopulations based on a step-by-step analysis of STRUCTURE, PowerMarker, MEGA X, and GenAlEx software. Additionally, SSR proves its effectiveness for identifying a transparent background of rice genotypes in the region and shows that most rice accessions are very close in each subpopulation but under different local names. These findings in the present study are a perfect starting point for a rice breeding program and domestication of new species of rice in the region.