Genetic relatedness among Ethiopian Oryza longistaminata populations and other AA genome Oryza species

Oryza longistaminata is the only AA-genome Oryza species that is perennial via rhizome production. This undomesticated rice species, which is native to Africa, is hypothesized to be a good candidate for expanding the cultivated rice gene pool. However, its phylogenetic relationships with other Oryza members are still unresolved, and it is underutilized as a genetic resource in the breeding of cultivated rice (Oryza sativa L.). This study therefore genotyped 361 O. longistaminata, 35 cultivated rice, 1 Japonica weedy-type, 25 AA genome and 8 CC genome wild rice accessions by using 67 SSR markers. Genotypic grouping confirmed the distinctness of O. longistaminata from other rice accessions and the sub-differentiation of this population influenced by eco-geographical conditions. The higher genetic diversity within the O. longistaminata population also implies its candidacy as a donor of diverse traits of interest.


Introduction
The genus Oryza is considered an economically important plant group in the grass family Poaceae (Khush 1997). Rice (Oryza sativa L.) feeds over half of the world's human population (Nachimuthu et al. 2015). During domestication, the majority of the alleles of the ancestral progenitors of cultivated rice were lost (Sun et al. 2001). According to Ali et al. (2010), rice productivity is affected by its narrow gene pool. Hence, diverse sources of genes must be explored 1 3 (Joshi et al. 2000;Huang et al. 2019). Wild Oryza species can be important resources in rice improvement programmes (Brar and Khush 1997); however, it is often difficult to transfer agronomically important traits to cultivated rice (Ren et al. 2003). Thus, precise genetic manipulation of desirable traits requires the prior assessment of kinship relatedness (Nachimuthu et al. 2015).
Due to the relative ease of interspecific hybridization, species that share the AA genome of O. sativa are the most accessible genetic resources (Lu et al. 2000;Ren et al. 2003;Chen et al. 2017;Xu et al. 2020). The AA genome clade includes Oryza rufipogon, Oryza nivara, Oryza barthii, Oryza glumaepatulata, Oryza meridionalis, Oryza glaberrima, O. longistaminata and O. sativa. The African wild rice variety O. longistaminata Chev. et. Rohr is the only member of the AA gene pool that is perennial via rhizome production (Vaughan et al. 2005).
O. longistaminata is strongly rhizomatous (Ghesquiere 1985;Vaughan 1994). In addition to its high rate of selfincompatibility, morphological features such as high pollen production along with long and exerted stigma presumably facilitate outcrossing (Ghesquiere 1987;Jones et al. 1996). This African wild rice is resistant to bacterial blight, drought, and the nematode Meloidogyne graminicola (Second et al. 1977;Taillebois 1983;Khush et al. 1990;Warda 1997;Jones et al. 1996). For this reason, O. longistaminata was suggested for further utilization and application in conservation schemes (Melaku et al. 2013;Wambugu et al. 2013).
Most DNA marker-based genetic relationship studies of the genus Oryza or taxa in the AA gene pool have supported the taxonomic classification within the genus Oryza based on morphological traits (Duan et al. 2007). For instance, its self-incompatibility, rhizomatous character, and unique ligular characteristics make the African O. longistaminata unique in its AA genome group (Ghesquiere 1985;Vaughan 1994). However, the different marker types used in evolutionary studies of the AA genome Oryza are still limited and provide ambiguous and controversial relationships (Ren et al. 2003;Ge et al. 2004). This is likely due to marker types, sampling methods and polytomy in the clade (Duan et al. 2007). Thus, an adequate number of accessions and SSR primer pairs must be utilized to achieve a reliable analysis of genetic structure and an improved resolution over related species.

Plant material
A total of 430 accessions comprising 35 cultivated rice accessions from the Yunnan University Genebank (Table S1), 25 accessions of 5 AA genomes, 9 accessions from 5 species of the O. officinalis complex, one Japonica weedy-type accession from the International Rice Research Institute collections (Table S2) and 360 accessions from 12 Ethiopian populations of African wild rice (O. longistaminata) (Table S3) were included in this study. Based on their geographic location, mode of pollination and species type, the accessions were complexed and analysed as twenty-six independent groups.

Genomic DNA extraction and polymerase chain reaction (PCR)
Total genomic DNA was extracted from fresh leaves by using the CTAB protocol as described by Doyle and Doyle (1987). The quality of the extracted DNA was determined by electrophoresis in a 1% agarose gel, and quantification was carried out using a spectrophotometer. The extracted DNA samples were diluted to 20 ng/µl using Tris-EDTA (TE) buffer. A total of 67 SSR markers (Table S4) covering 12 rice chromosomes were selected for the analysis based on the reports from Panaud et al. (1996), Chen et al. (1997) and Temnykh et al. (2000).
PCR was performed in a 10 µl reaction mixture containing 4 µl of genomic DNA, 0.5 µl of each of two primers (at a concentration of 10 µM), 1.75 µl of a 10 × Taq buffer (with MgCl 2 at a final concentration of 2 mM), 0.5 µl of a 2.5 mM dNTP mixture, 0.1 µl of 2.5 U/µl Taq DNA polymerase and 3 µl of double-distilled water. The PCR amplification protocol consisted of 5 minutes of preheating at 94 °C followed by 34 cycles of denaturation at 94 °C for 45 s, annealing at 55 °C to 67 °C (depending on the primer type) for 30 s, extension at 72 °C for 30 s, and a final extension at 72 °C for 7 min.

Polyacrylamide gel electrophoresis and SSR allele scoring
The amplified PCR products were subjected to electrophoresis in an 8.0% polyacrylamide gel and detected through silver staining as described by Panaud et al. (1996), and their size was determined using a standard 50 bp DNA ladder (M1800, Solarbio). Based on the expected PCR product size indicated in the GRAMENE marker database (https :// www.grame ne.org), the sizes of the most intensely amplified bands were identified as the alleles of the SSR loci. Differently sized amplified bands were scored as genotypes. The bands were recorded as (11, 22, 33. .) to represent homozygous genotypes or (12, 13, 23.. .) to indicate heterozygous genotypes, and '?' was used to denote missing data (Fig. S1).

SSR data analysis
The number of alleles per locus, expected and observed heterozygosity and F statistics such as genetic differentiation (F st ), Wright's fixation index (F is ) and the total inbreeding coefficient (F it ) were calculated using GenAlEx 6.502 (Peakall and Smouse 2012). The major allelic frequency and polymorphic information content (PIC) of each marker were computed using Power Marker Version 3.25 (Liu and Muse 2005). Analysis of molecular variance (AMOVA) was performed for 426 accessions, excluding the Japonica weedytype accession, O. latifolia, O. officinalis and O. longistaminata from Niger (each of which was represented by only one accession). Both AMOVA and principal component analysis (PCA) were also conducted inGenAlEx6.502.
Based on their computed pairwise genetic distance values, a dendrogram was drawn for the 360 O. longistaminata accessions by using DARwin V6 software (Perrier and Jacquemoud-Collet 2006). The unweighted pair group method with arithmetic mean (UPGMA) was applied to the 12 O. longistaminata populations using Poptree2 software (https ://www.ualbe rta.ca/~fyeh/fyeh).
The population structure was inferred via the STRU CTU RE HARVESTER v. 2.3 model (Pritchard et al. 2000). Three replications were run with a burn-in period of 150,000 and 150,000 repeats of the MCMC method for K values of 1 to 26. The optimal number of subpopulations (K) was determined via the ΔK approach as described by Evanno et al. (2005) using STRU CTU REHARVESTER v0.6.8. (Earl and von Holdt 2011).

SSR genotyping
From the 430 wild, cultivated and weedy rice accessions, a total of 440 alleles, with a range of 2 (RM22 and RM171) to 19 (RM225) alleles per locus (average 6.57), were amplified (Table S5). The major allele frequencies over the 67 SSR loci varied from 0.337 (RM1) to 0.953 (RM60), with an overall mean of 0.656. Generally, from all SSR markers used in this study, 663 genotypes were inferred, with an average of 9.89 per locus.
As shown in Table 1, 42 heterozygous loci of the assayed SSR markers presented observed heterozygosity (Ho) and expected heterozygosity (He) values ranging from 0.04 to 0.52 and from 0.32 to 0.76, respectively. Based on their polymorphic information content (PIC), 26 (38.9%) of the SSR markers used in this study were highly informative (PIC > 0.5).

Extent of genetic diversity
At the group level, genetic parameters such as the observed number of alleles (Na), effective number of alleles (Ne), mean observed and expected proportions of heterozygous loci and Wright's fixation index (F is ) were calculated (

Genetic differentiation and variance analysis
For all 67 SSR loci, F st = 0.347, F is = 0.603 and F it = 0.741 (Table 2). All three F statistic values were highly significant (P > 0.001). In particular, the highly significant F st value showed higher divergence or differentiation among the assessed groups. The indirect estimation of gene flow using F st showed N m = 0.471 (Table 2). Based on the twentytwo study groups, AMOVA showed 35% variance among groups, 39% variance among accessions and 26% variance within accessions (Table 3).

Principal component analysis
A scatter plot of the first two principal components of all 430 accessions showed five clusters (Fig. 1). The first and second axes accounted for 8.96% and 7.26% of the molecular variance, respectively. When only the 12 groups of O. longistaminata from Ethiopia were considered, a clear geographic pattern emerged. A total of 180 O. longistaminata accessions of the six Gambella groups were located in the right half of Quadrant I and left half of Quadrant II. However, 1 3 the O. longistaminata accessions of the six Amhara groups resided in Quadrant III. Although very few Amhara1 and Amhara2 accessions were positioned at the very bottom of Quadrant II, all 30 accessions from Amhara6 formed their own distinct cluster near the middle of Quadrant III. PCA also illustrated the distinction between the two rice subspecies. Almost all Indica varieties along with both self-and open-pollinated O. rufipogon varieties were located between Quadrant I and Quadrant IV. However, all of the Japonica varieties were found in the centre of Quadrant IV, and the space between the Indica and Japonica varieties was occupied by the relatively overlapping outgroups and some other wild AA genome Oryza species.

Cluster analysis
Clustering analysis based on the UPGMA and 1000 bootstrap replicates separated the 26 Oryza groups into three main clusters, consisting of the Japonica weedy-type  (Fig. 3). As subcluster A of Fig. 2, in the main cluster of O. longistaminata accessions, the Amhara groups were subsequently partitioned from their Gambella counterparts. In the other main cluster, the outgroups widely deviated from Oryza groups based on Dstcorrected genetic distance the other accession. Further splitting within this group indicated the distinction of Japonica types and a higher association of Indica with the wild AA genome group. Regarding population admixture, almost all accessions resided in their specific populations. However, admixture was observed for some accessions of Gambella6 with Gambella5, Gambella2 with Gambella4 and Amhara2 with Amhara1. Most interestingly, no O. longistaminata accession were found to exhibit admixture in any other cluster (Fig. 3).

Population structure
STRU CTU RE analysis suggested that the optimal number of subpopulations for all individuals in the study was three (Fig. 4A). At K = 3, all Gambella O. longistaminata groups comprised one cluster, all Amhara O. longistaminata groups constituted the second, and all the wild and cultivated AA genome Oryza species along with the outgroups were clustered into a third group (Fig. 4B). In the third cluster, no distinction was detected between the CC genome outgroup and the AA genome Oryza species.

SSR genotyping and genetic diversity
In this study, a total of 440 alleles, showing an average of 6.6 alleles per locus, were generated. Using an equivalent number of SSR markers and 192 cultivated rice germplasm lines, Nachimuthu et al. (2015) reported 3 alleles per locus.
According to Brar and Khush (2003), domestication practices have caused significant genetic diversity loss of approximately 60%, while O. longistaminata collections from a wide agro-ecological area of Ethiopia show high allelic diversity (Melaku et al. 2013). Therefore, the higher number of alleles identified in this study could be attributed to the higher number of studied accessions of O. longistaminata. On average, a gene diversity value of 0.46 and observed heterozygosity of 0.14 were obtained from the assessed 67 SSR primers (Table 1). The differences from other studies may be caused by differences in the sampling schemes, the number of SSR markers used, the repeat size of the SSR markers and their positions across the genome, which complicates the comparison of such genetic diversity indices (Deu et al. 2008;Chen et al. 2018).

Genetic differentiation and partitioning of genetic variation
For all 67 SSR loci, highly significant F statistic values (F st = 0.347, F is = 0.603 and F it = 0.741) are reported (Table 2). A higher F st value or broader differentiation among the rice groups was expected, as genetic differentiation is not evenly distributed across the sampled area (Cao et al. 2006), and the assessed rice accessions represented different genome groups distributed all around the world (Tables 1, 2, 3). In contrast, AMOVA implicated a higher component of variance (39%) among the accessions (Table 3). Perhaps this is due to the representation of most outgroups by very few accessions. With the exception of these outgroups, all AA genome groups showed lower  (Table 1). However, the computed Wright's fixation index (F is = 0.603) was lower than the F is = 0.75 obtained for 360 O. longistaminata accessions from Ethiopia (Melaku et al., 2013) and the F is =0.958 obtained for 900 cultivated, wild and weedy rice individuals (Cao et al. 2006). Such a low fixation index for both wild and cultivated AA genome Oryza species indicated relatively less significant deviation from Hardy-Weinberg expectations.

Population structure and genetic relationships
The two-dimensional PCA explained 8.96% and 7.26% of the total genetic variation. A similar pattern of molecular variance for the first two principal coordinates was explained by Zhang et al. (2011). To the right of coordinate 1, a wideopen space separated all O. longistaminata accessions from the rest of the rice species (Fig. 1). Furthermore, almost all of the Ethiopian O. longistaminata populations were clustered on the basis of their geographic locations. Near the left corner of Quadrant IV, only 3 accessions from Gambella6 showed relative proximity to the O. longistaminata accession from Niger. This distribution highlights the genetic distinction of the Ethiopian O. longistaminata accessions.
The UPGMA clustering analysis merged O. longistaminata populations with their AA genome wild relatives in one main cluster and separated the CC genome Oryza species and the weedy rice in two independent clusters (Fig. 2). However, the hierarchical clustering of all accessions showed 2 main groups, with all O. longistaminata accessions sorted into one group and the remaining Oryza members in the other group (Fig. 3). The slight clustering difference between Figs. 2 and 3 could be associated with the generation of more balanced and fewer clusters by using Ward's minimum variance compared with UPGMA (Odong et al. 2011). However, all main clusters, including that of O. longistaminata, were subjected to further division on the basis of their respective geographic locations and genome types. As shown in Fig. 3, very few O. longistaminata accessions were admixed.
All distance-based clustering methods, with minor deviations, showed a generally similar trend. The model-based structure analysis was also in accordance with the principal coordinate analysis, the clustering pattern of WARD minimum variance and the UPGMA analysis. In this study, population structure results showed three main clusters resulting from the sorting of weedy rice, AA genome and CC genome Oryza members in one cluster and the Amhara and Gambella O. longistaminata groups from Ethiopia in two independent clusters (Fig. 4B). Interestingly, the CC genome Oryza species and the AA genome wild and cultivated rice populations were structured together. The lack of distinction in this cluster was, however, not obvious because of the genetic similarity of its members. Rather, the inclusion of fewer accessions in the outgroups made their separation less distinct.
From the STRU CTU RE analysis, it was very clear that the Amhara and Gambella O. longistaminata groups of Ethiopia represent two different gene pools (Fig. 4B). The distinct population structuring between the two regions may be a function of both geographical and climatic factors. The impact of climatic differences can be illustrated by the 1990-2007 G.C. climatic data, which indicate a 25.7 °C mean temperature and 896 mm annual rainfall for the Gambella region and a 19.2 °C mean temperature and 1468 mm annual rainfall for the Amhara region (Melaku 2011). Differences between the populations of the two ecogeographic regions may also be associated with diverse modes of adaptation to variable altitudinal levels, soil types and ecological conditions (Nachimuthu et al. 2015).

Conclusion
Sixty-seven microsatellite markers were used to assess the genetic relatedness among Ethiopian O. longistaminata accessions and other AA genome wild and cultivated rice species. The genotypic grouping through structure analysis, distance-based clustering and principal component analysis equally confirmed the distinctness of O. longistaminata from other AA genome Oryza species. The further subgrouping of O. longistaminata is in accord with geographical proximity.