Genetic diversity and population structure of some Nigerian accessions of Bambara groundnut (Vigna subterranea (L.) Verdc.,) using DArT SNP markers

Bambara groundnut is one of the crops with inadequate molecular research to show its full potentials. Previous studies showed morphological diversity with inadequate information to confirm genetic variations. In the quest to reveal the genetic potentials, deoxyribonucleic acid (DNA) of the selected accessions was extracted through leaf samples at 3 weeks old, using Dellaporta Miniprep for Plant DNA Isolation procedure. The high quality DNA was sequenced using Diversity Arrays Technology (DArT) markers to unlock diversity among Bambara groundnut of Nigerian origin. Cluster analysis (neighbor-joining clustering) of the single nucleotide polymorphisms (SNP’s) were used to generate sub-population to show relatedness and differences. Seven sub-populations were generated with 5927 (50.13%) high quality DArT markers out of the 11,821 SNPs generated. This revealed high genetic diversity existed among the selected Bambara groundnut accessions in Nigeria. This also revealed that DArT markers were highly efficient in classifying the accessions based on molecular expressions. This study also identified markers responsible for genetic variation that could facilitate the characterization of larger collections for further utilization of genetic resources and most importantly Bambara groundnut for the purpose of crop improvement.


Introduction
Bambara groundnut (Vigna subterranea [L.] Verdc., Syn: Voandzeia subterranea [L.] Thouars) is an under-utilized grain legume grown in Africa, for food security (Ntundu et al. 2006). The crop is an important legume in Africa after cowpea (Vigna unguiculata [L.] Walp.) (Atoyebi et al. 2017). Molecular research provides new insight on the population structure of African Bambara groundnut germplasm which will help in conservation strategy and management of the crop (Uba et al. 2021). Published research works relating to African orphan crops emphasized on the need for more molecular research to harness the hidden potentials (Mayes et al. 2013;Ntundu et al. 2006). Bambara groundnut seed is regarded as a balanced food because when compared to most food legumes, it is rich in iron and its protein contains Abstract Bambara groundnut is one of the crops with inadequate molecular research to show its full potentials. Previous studies showed morphological diversity with inadequate information to confirm genetic variations. In the quest to reveal the genetic potentials, deoxyribonucleic acid (DNA) of the selected accessions was extracted through leaf samples at 3 weeks old, using Dellaporta Miniprep for Plant DNA Isolation procedure. The high quality DNA was sequenced using Diversity Arrays Technology (DArT) markers to unlock diversity among Bambara groundnut of Nigerian origin. Cluster analysis (neighbor-joining clustering) of the single nucleotide polymorphisms (SNP's) were used to generate subpopulation to show relatedness and differences. Seven sub-populations were generated with 5927 (50.13%) high quality DArT markers out of the 11,821 SNPs generated. This revealed high genetic diversity existed among the selected Bambara groundnut accessions in high level of lysine and methionine (Adu-Dapaah and Sangwan, 2004;Massawe et al. 2005). This promising crop is a weapon against food and nutritional insecurity ravaging in Africa. The current world population is over 7 billion, Africa accounts for over 1.2 billion and Nigeria being the most populated African country, accounts for over 200 million, with an average population growth above 1% yearly (World population prospect, 2019). The logical connective is an increasing food demands and decreasing arable land in the world, hence, increasing number of people and decreasing arable land. This, if not checked, everyone becomes vulnerable to food and nutritional insecurity. Bambara groundnut can be considered as one of the crop with inadequate molecular research to show its full potential for improvement (Ntundu et al. 2006;Massawe et al. 2005), when compared with cereal crops. Most available researches are based on agronomic characterization (Mayes et al. 2013;Goli 1997), while some show results of tolerance to abiotic stresses (Kouassi and Zoro 2010). Bambara groundnut is cleistogamous, highly inbreeding and has 11 pairs of chromosomes (2n = 2x = 22). There are numerous Bambara groundnut accessions in the genebanks that needs to be evaluated for possible improvement in identifying traits with better performance and associated genes to the existing or identified ones. Amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR) and random amplified polymorphic DNA markers (RAPD) have been reported to reveal genetic diversity but high throughput markers are needed to study genetic diversity, especially, when studying larger diverse germplasm collections (Atoyebi et al. 2017;Mohammed et al. 2014;Olukolu et al. 2012;Amadou et al. 2001). Diversity Arrays Technology (DArT) was originally developed to detect polymorphism at the recognition sites of methylation sensitive restriction enzymes (Wenzl et al. 2004). A combination of DArT approaches with next generation sequencing technologies have resulted in low cost and high throughput genotyping by sequencing, facilitating detection of variations in single nucleotide polymorphisms (SNP) in a genome (Kilian et al. 2012). SNP as a co-dominant marker in DArT will help to unlock genetic diversity and understand the population structure of some Nigeria accessions of Bambara groundnut. This work focuses on the use of DArTseq to identify genetic diversity and associated significant genes to agro-morphological variation in Bambara groundnut to facilitate the utilization of existing genetic resources for enhanced adaptability to climate change.

Materials and methods
The study was carried out at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria and International Institute of Tropical Agriculture, research station, located at the Institute of Agricultural Research and Training (IAR&T) Ikenne, Nigeria for three years. The site (Ibadan) is located on coordinate's 7.38°N latitude and 3.94°E longitude and it is situated at elevation 181 m above sea level. The average annual temperature is 26.5 °C and about 1311 mm of precipitation falls annually with 81% mean relative humidity. Ikenne is on coordinate's 6.87°N latitude and 3.71°E longitude and 235.2 m above sea level, has an annual rainfall of 1200 mm, 65% mean relative humidity and 21.4 °C mean temperature respectively. Seeds of one hundred accessions of Bambara groundnut were collected from the Genetic Resources Centre (GRC), of IITA in 2017 and used for the field experiments (Table 1). The accessions were collected from the regions of Nigeria for years and conserved in the genebank of IITA for germplasm exchange.

Experimental design
The experiments were carried out in two locations (Ibadan and Ikenne) for three years (2017/2018, 2018/2019 and 2019/2020). The experiments were laid out in a Randomized Complete Block Design (RCBD) with three replications and total block size was 21 m × 50 m and each plot was 1 m × 2.5 m. Inter and intra row spacing was 1.00 m and 0.25 m respectively. Data were collected on 28 traits (Table 2).

DNA extraction and genotyping
Three weeks after planting, leaf samples of each accession were collected and plant stands from which samples were collected was tagged. DNA extraction was done at the Bioscience Centre, International Institute of Tropical Agriculture, Ibadan following the procedure of Dellaporta Miniprep for Plant DNA Isolation (Weigel and Glazebrook 2009). DNA was quantified at 260 nm with a Nanodrop spectrophotometer (NANODROP 8000, thermo scientific) and its quality was checked at 260/230 and 260/280 nm absorbance ratios and agarose gel electrophoresis respectively. The samples were sent to Diversity Arrays Technologies commercial service Ltd., Australia (www. diver sitya rrays. com) for a high throughput single nucleotide polymorphism (SNP) genotyping. The obtained DArT marker set was filtered on the basis of individual marker-related statistics by removing markers with inappropriate quality control parameters with call rate ≤ 80% and missing data ≥ 20% in TASSEL 5.0 software (Bradbury et al. 2007). The informativeness of the marker was determined using the polymorphic information content (PIC). The remaining high quality markers of 5927 (50.13%) out of 11,821 SNPs were retained for data quality and used in the analysis. DArTs were coded as 0/1 (presence and absence) and were used as different entries in TASSEL 5.0 software to construct neighbor join phylograms. The genome wide association study (GWAS) was conducted using the phenotypic data means for the three years (2017-2020), using the GLM and MLM-PCA batch commands in TASSEL 5.0 software (Bradbury et al. 2007). Marker-trait associations (MTAs) were analyzed for the quantitative traits using a generalized linear model and a population stratification (structure) correction based on principal component analysis (PCA) (Price et al. 2006). The significance of associations between SNPs and traits was based on the threshold P < 1.68 × 10 −4 , calculated by dividing 1 by the total number of SNPs (5927) in the analysis (Li et al. 2016).

2-Dimensional principal component analysis (2DPCA matrix quadrant)
2DPCA matrix quadrants (PC 1 and PC 2) were also carried out to reveal the distribution of the accessions based on their morphological traits expression. Classification based on 2-Dimensional principal component analysis is based on 2D image matrix, so the image matrix does not need to be transformed into a vector prior to feature extraction. Therefore, an image is constructed with the covariance matrix and its eigenvectors derived for image feature extraction which uses the first two highest contributing principal components (PC 1 and PC 2) to ensure that traits of the highest importance in contribution to variation are captured based on their influence and an image of subgroup membership of the genetic resources were constructed to reveal variations (Yanai and Ishii 2010).

Population stratification
DArTseq showed that 100 accessions of Bambara groundnut were grouped into seven clusters (Table 3 and Fig. 1). DArTseq based on SNP revealed 7 clusters to enrich the selection of accessions with different colour denoting subgroup membership, indicating genetic variation. Cluster I had twenty-six accessions that had similar SNPs, cluster II had nineteen accessions, cluster III had six accessions, cluster IV had nine accessions, cluster V had fourteen accessions, cluster VI had twelve accessions and cluster VII had twelve accessions, while the remaining 7 accessions had mixed allelic patterns that could not be assigned to any of the clusters or probably could not produce enough information to meet the quality check.
The accessions were clustered in accordance with relatedness and differences in their genetic constitution. Accessions in the same cluster reflect genetic relationships and most diverse accessions were out grouped.

Allele frequencies and proportion based on Single nucleotide polymorphism (SNPs) genotyping
The selected accessions used in this study showed allelic variation across multiple loci and such differences in the allele frequencies produced the genetic variation in the population. However, the values of proportion of alleles and diploids observed were small and allele frequencies very similar (Table 4). DNA quality ranges between 1.21 and 2.22.

Marker-trait association
The filtered DArTseq genotyping produced 5927SNPs from 98 lines. In the 5927 SNP filtered, 4095 SNP was unaligned on 'Mung bean genome'. In the analysis 1832 SNPs (30.90%) aligned on 'Mung bean genome'. Chromosome 1 had 158 aligned markers (8.62%), chromosome 2 had 119 aligned markers (6.49%), chromosome 3 had 120 aligned markers (6.55%), chromosome 4 had 86 aligned markers (4.69%), chromosome 5 had 203 aligned markers (11.08%), chromosome 6 had 148 aligned markers (8.07%), chromosome 7 had 203 aligned markers (11.08%), chromosome 8 had 199 aligned markers (10.86%), chromosome 9 had 98 aligned markers (5.34%), chromosome 10 had 132 aligned markers (7.20%) and chromosome 11 had 134 aligned markers (7.31%) while the remaining markers were scaffolds. General linear model (GLM) revealed 1504 markers were significant. Mixed linear model (MLM) revealed 611 markers were significant and MDS revealed 3589 markers were significant in the sequence at -log 10 4 for 21 traits. Total unfiltered SNP was 11,821 and 3387 SNP (28.65%) aligned on Mung bean genome. Genome-wide association studies indicated significant SNP markers that revealed quantitative traits loci (QTLs) associated with phenotypic traits. However, many of the quantitative traits used in this study associated with more than one marker. Table 5 revealed that marker 24,383,534|F|0-6:G > A-6:G > A associated with Chaff weight, plant height, shelled harvest per Plot and shelling percentage. Marker 37,320, 448|F|0-15:G > A-15:G > A also associated with Chaff weight, plant height, shelled harvest per Plot and shelling percentage.  100-seed weight had 11 significant markers associated, number of flower per peduncle had only 1 significant marker associated, number of days to flowering had 5 significant markers associated, number of pods per plot had 9 significant markers associated, number of seed per pod had 32 significant markers associated, pod length had 45 significant markers associated, pod width had 3 significant markers associated, petiole length had 4 significant markers associated, plant height had 4 significant markers associated, seed length had 7 significant markers associated, seed width had 25 significant markers associated, seed weight per plot had 5 significant markers associated, shelled harvest per plot had 13 significant markers associated, shelling percentage per plot had 13 significant markers associated, terminal leaflet length had 119 significant markers associated, terminal leaflet width had 179 significant markers associated, yield per hectare had 3 significant markers associated and yield per plant had 131 significant markers associated. In the associated traits, terminal leaflet width had the highest associated markers, followed by yield per plant and terminal leaflet length respectively. Figures 3, 4, 5, 6, 7 and 8 showed Manhattan plots of − log 10 4 versus chromosomal position of SNP markers associated with some of the phenotypic traits. A red line represents the significant threshold (− log 10 4 ) and higher chromosomal positions above the threshold showed loci significance to variations.

Discussion
DArTseq based on single nucleotide polymorphisms (SNPs) shown seven subpopulations, indicating wide genetic variation. This reiterated the suitability of DArT high throughput sequencing for diversity studies. Olukolu et al. (2012) also reported genetic diversity for Bambara groundnut accessions using DArT  markers. DArTseq revealed that accessions formed different subpopulation, indicating high genetic variation in the selected population. DArTseq also revealed some accessions in the same subpopulation; this might be due to related accession ancestry or resulting from generational out-crossing, hence, related genetic constitution. Highest number of accessions (26) in the subpopulation structure (cluster I), indicated that the highest genetic relationship existed in that subpopulation structure. Cluster III had the lowest number of accessions (6) in the subpopulation structure, indicating that the lowest genetic relationship existed in that subpopulation structure. This was also reported by Rex Bernardo, 2020 that using molecular data is a veritable tool in unlocking diversity. This indicated that accessions clustered in the same group have genetic relationship in the population. Diversity among the selected accessions shown in different subpopulation (cluster I-VII), in relation to variation expressed also indicated dissimilarities in the genetic makeup of the population. The consistency in the grouping of the accessions as expressed in the clusters indicated that the DNA bases expressed in the genetic makeup form the basis of heredity, hence, responsible for traits transfer to sustaining diversity in Bambara groundnut. Alleles percentage variation showed ≥ 1% genetic variation in the population data, where Cytosine (C), Guanine (G), Adenine (A) and Thymine (T) ranged from (23-24%), C and G had (24%) while A and T (23%) in the SNPs, which indicated large source of sequence variation in Bambara groundnut. Hence, SNPs are confirmative means to reveal genetic diversity. Cytosine and Guanine are the most common bases in the regions of the genome as well as in the genes with over 24% of the genetic makeup. This indicated that the differing allele frequencies are responsible for genetic variations. This was also supported by Matthew et al. (2004), who reported that SNPs with MAF greater than 5% in populations are responsible for genetic variations. Genome-wide Association Studies (GWAS) identified stable loci for 19 phenotypic traits; this indicated that these loci may be veritable tools in developing new cultivars with high yield and yield stability. The associated significant markers indicated high level of genetic variance in Bambara groundnut. This supports the findings of Choudhary et al. (2013), Huynh et al. (2013), Aliyu et al. (2016) that molecular genetic diversity analyses aid breeding decisions in crop species. The differing significant marker sites indicated variations in the haplotypes and identified exons. Stadler (2009), Somta et al. (2011) also reported high level of allelic diversity in Bambara groundnut. However, some of the significant markers associated to more than one trait confirmed pleiotropy or linkage of the genes. The alignment of the sequence polymorphic markers on 'Mung bean genome' by some of the significant markers associated with phenotypic traits indicated that chromosome positions 5, 7 and 8 had the strongest associations with the phenotypic traits and is crucial to be considered in genetic variation of Bambara groundnut which accounted for 33.02% of variations identified through the chromosomes. Similar result was also reported by Gahlaut et al. (2019) in wheat. The alignment of the Bambara groundnut sequence on the 'Mung bean genome also indicated that 1832 SNPs (30.90%) aligned, nevertheless, some of the markers were less informative and could not be assigned to chromosome locations in the sequence, hence tagged scaffolds, this reiterated the result of Ho et al. (2017) that 48% of Bambara groundnut sequence marker tags were mapped on the 'Mung bean genome'. GWAS also indicated that many of the phenotypic traits are associated with more than one marker in different or the same chromosomal positions, meaning that the phenotypic traits are polygenic; they are determined by the interaction between several alleles. This further reiterated the fact that each marker compared the trend in the quantitative traits to the trend in the genotypes and identify loci causing variation in the genome. Further observation showed that some of the markers coded for more than one trait, i.e. single marker coding for two or more phenotypic traits, this indicated the pleiotropic nature of genes. This was also supported by Wang et al. (2008), Yongsheng Chen (2011) who corroborated that genetic trait correlation can be due to pleiotropy or linkage disequilibrium. Although, discussion of genetic traits correlation is controversial but Falconer (1960) resolved that pleiotropy is possible to be the reason for traits association than linkage. In contrast Mather and Jinks (1971) advocated that associations between traits are due to linkage. Further observation in this study as indicated at the sequence polymorphic level, showed that genetic traits association is due to pleiotropy. This was also supported by Sibov et al. (2003), Zygier et al. (2005), Chen et al. (2008) and Wang et al. (2008). This study further revealed that some traits are associated with the same sequence polymorphic markers and chromosome position, meaning that the traits are correlated.

Conclusion
This study confirmed DArT genotyping based on SNP is effective to detect the genetic diversity and population structure of Bambara groundnut accessions. DArT markers used also revealed wide range of genetic diversity and making the selected population a potential source for Bambara groundnut improvement. Selection of high contributing morphological traits in Bambara groundnut showed differences between populations. The differing identified subpopulation of Bambara groundnut could be used in breeding improved cultivars in enhancing responsiveness to climate change. This could further be reaffirmed by using other high throughput markers in future research.