Introduction

Bambara groundnut (Vigna subterranea [L.] Verdc., Syn: Voandzeia subterranea [L.] Thouars) is an under-utilized grain legume grown in Africa, for food security (Ntundu et al. 2006). The crop is an important legume in Africa after cowpea (Vigna unguiculata [L.] Walp.) (Atoyebi et al. 2017). Molecular research provides new insight on the population structure of African Bambara groundnut germplasm which will help in conservation strategy and management of the crop (Uba et al. 2021). Published research works relating to African orphan crops emphasized on the need for more molecular research to harness the hidden potentials (Mayes et al. 2013; Ntundu et al. 2006). Bambara groundnut seed is regarded as a balanced food because when compared to most food legumes, it is rich in iron and its protein contains high level of lysine and methionine (Adu-Dapaah and Sangwan, 2004; Massawe et al. 2005). This promising crop is a weapon against food and nutritional insecurity ravaging in Africa. The current world population is over 7 billion, Africa accounts for over 1.2 billion and Nigeria being the most populated African country, accounts for over 200 million, with an average population growth above 1% yearly (World population prospect, 2019). The logical connective is an increasing food demands and decreasing arable land in the world, hence, increasing number of people and decreasing arable land. This, if not checked, everyone becomes vulnerable to food and nutritional insecurity. Bambara groundnut can be considered as one of the crop with inadequate molecular research to show its full potential for improvement (Ntundu et al. 2006; Massawe et al. 2005), when compared with cereal crops. Most available researches are based on agronomic characterization (Mayes et al. 2013; Goli 1997), while some show results of tolerance to abiotic stresses (Kouassi and Zoro 2010). Bambara groundnut is cleistogamous, highly inbreeding and has 11 pairs of chromosomes (2n = 2x = 22). There are numerous Bambara groundnut accessions in the genebanks that needs to be evaluated for possible improvement in identifying traits with better performance and associated genes to the existing or identified ones. Amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR) and random amplified polymorphic DNA markers (RAPD) have been reported to reveal genetic diversity but high throughput markers are needed to study genetic diversity, especially, when studying larger diverse germplasm collections (Atoyebi et al. 2017; Mohammed et al. 2014; Olukolu et al. 2012; Amadou et al. 2001). Diversity Arrays Technology (DArT) was originally developed to detect polymorphism at the recognition sites of methylation sensitive restriction enzymes (Wenzl et al. 2004). A combination of DArT approaches with next generation sequencing technologies have resulted in low cost and high throughput genotyping by sequencing, facilitating detection of variations in single nucleotide polymorphisms (SNP) in a genome (Kilian et al. 2012). SNP as a co-dominant marker in DArT will help to unlock genetic diversity and understand the population structure of some Nigeria accessions of Bambara groundnut. This work focuses on the use of DArTseq to identify genetic diversity and associated significant genes to agro-morphological variation in Bambara groundnut to facilitate the utilization of existing genetic resources for enhanced adaptability to climate change.

Materials and methods

The study was carried out at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria and International Institute of Tropical Agriculture, research station, located at the Institute of Agricultural Research and Training (IAR&T) Ikenne, Nigeria for three years. The site (Ibadan) is located on coordinate’s 7.38°N latitude and 3.94°E longitude and it is situated at elevation 181 m above sea level. The average annual temperature is 26.5 °C and about 1311 mm of precipitation falls annually with 81% mean relative humidity. Ikenne is on coordinate’s 6.87°N latitude and 3.71°E longitude and 235.2 m above sea level, has an annual rainfall of 1200 mm, 65% mean relative humidity and 21.4 °C mean temperature respectively. Seeds of one hundred accessions of Bambara groundnut were collected from the Genetic Resources Centre (GRC), of IITA in 2017 and used for the field experiments (Table 1). The accessions were collected from the regions of Nigeria for years and conserved in the genebank of IITA for germplasm exchange.

Table 1 List of the 100 Bambara groundnut accessions used for the experiment

Experimental design

The experiments were carried out in two locations (Ibadan and Ikenne) for three years (2017/2018, 2018/2019 and 2019/2020). The experiments were laid out in a Randomized Complete Block Design (RCBD) with three replications and total block size was 21 m × 50 m and each plot was 1 m × 2.5 m. Inter and intra row spacing was 1.00 m and 0.25 m respectively. Data were collected on 28 traits (Table 2).

Table 2 Quantitative traits observed on Bambara groundnut accessions

DNA extraction and genotyping

Three weeks after planting, leaf samples of each accession were collected and plant stands from which samples were collected was tagged. DNA extraction was done at the Bioscience Centre, International Institute of Tropical Agriculture, Ibadan following the procedure of Dellaporta Miniprep for Plant DNA Isolation (Weigel and Glazebrook 2009). DNA was quantified at 260 nm with a Nanodrop spectrophotometer (NANODROP 8000, thermo scientific) and its quality was checked at 260/230 and 260/280 nm absorbance ratios and agarose gel electrophoresis respectively. The samples were sent to Diversity Arrays Technologies commercial service Ltd., Australia (www.diversityarrays.com) for a high throughput single nucleotide polymorphism (SNP) genotyping. The obtained DArT marker set was filtered on the basis of individual marker-related statistics by removing markers with inappropriate quality control parameters with call rate ≤ 80% and missing data ≥ 20% in TASSEL 5.0 software (Bradbury et al. 2007). The informativeness of the marker was determined using the polymorphic information content (PIC). The remaining high quality markers of 5927 (50.13%) out of 11,821 SNPs were retained for data quality and used in the analysis. DArTs were coded as 0/1 (presence and absence) and were used as different entries in TASSEL 5.0 software to construct neighbor join phylograms. The genome wide association study (GWAS) was conducted using the phenotypic data means for the three years (2017–2020), using the GLM and MLM-PCA batch commands in TASSEL 5.0 software (Bradbury et al. 2007). Marker-trait associations (MTAs) were analyzed for the quantitative traits using a generalized linear model and a population stratification (structure) correction based on principal component analysis (PCA) (Price et al. 2006). The significance of associations between SNPs and traits was based on the threshold P < 1.68 × 10−4, calculated by dividing 1 by the total number of SNPs (5927) in the analysis (Li et al. 2016).

Analysis of population structure

Clustering approach was used to estimate the real number of subpopulations with correlated allele frequencies based on marker data by neighbor-joining method using TASSEL 5.0 software (Bradbury et al. 2007). Each genotype was assigned to subpopulation based on its membership probability.

2-Dimensional principal component analysis (2DPCA matrix quadrant)

2DPCA matrix quadrants (PC 1 and PC 2) were also carried out to reveal the distribution of the accessions based on their morphological traits expression. Classification based on 2-Dimensional principal component analysis is based on 2D image matrix, so the image matrix does not need to be transformed into a vector prior to feature extraction. Therefore, an image is constructed with the covariance matrix and its eigenvectors derived for image feature extraction which uses the first two highest contributing principal components (PC 1 and PC 2) to ensure that traits of the highest importance in contribution to variation are captured based on their influence and an image of subgroup membership of the genetic resources were constructed to reveal variations (Yanai and Ishii 2010).

Results

Population stratification

DArTseq showed that 100 accessions of Bambara groundnut were grouped into seven clusters (Table 3 and Fig. 1). DArTseq based on SNP revealed 7 clusters to enrich the selection of accessions with different colour denoting subgroup membership, indicating genetic variation. Cluster I had twenty-six accessions that had similar SNPs, cluster II had nineteen accessions, cluster III had six accessions, cluster IV had nine accessions, cluster V had fourteen accessions, cluster VI had twelve accessions and cluster VII had twelve accessions, while the remaining 7 accessions had mixed allelic patterns that could not be assigned to any of the clusters or probably could not produce enough information to meet the quality check. The accessions were clustered in accordance with relatedness and differences in their genetic constitution. Accessions in the same cluster reflect genetic relationships and most diverse accessions were out grouped.

Table 3 Clustering of Bambara Groundnut accessions based on Single Nucleotide Polymorphism (SNP) genotyping
Fig. 1
figure 1

Cluster analysis of Bambara groundnut collection based on 5927 DArTseq

Comparison between population structure as revealed by DArT markers and 2DPCA in three years

Figures 1, 2 Table 3 and revealed the accessions that were genetically and morphologically grouped in the same or different cluster as revealed by DArTseq or 2DPCA quadrant. This investigation is necessary to ensure if the observed morphological variation in the selected accessions were totally the same as the classification based on molecular variation or occurred as a result of chance. In cluster or subpopulation 1 based on DArTseq, accessions TVSu-333, TVSu-336, TVSu-2100, TVSu-2105, TVSu-2109, TVSu-589, TVSu-2106, TVSu-670 were classified in the same 2DPCA matrix quadrant (Q2), based on traits contribution to morphological diversity, meaning that the aforementioned accessions were not only morphologically related but also genetically related. In cluster IV based on DArTseq, accessions TVSu-14, TVSu-261, TVSu-12, TVSu-1242, TVSu-1252 and TVSu-365 were classified in the same 2DPCA matrix quadrant (Q1) based on morphological grouping, cluster II based on DArTseq showed that accession TVSu-262 and TVSu-256 were also morphologically clustered in quadrant 4 (Q4). Cluster I, II, III, V and VI also showed that accessions TVSu-269, TVSu-127, TVSu-263, TVSu-173, TVSu-181, TVSu-178, TVSu-1222, TVSu-659, TVSu-2108 and TVSu-355 were grouped in quadrant 3 (Q3) which indicated interrelationship in the population. This revealed that expression of morphological traits might be genetic and also indicated genetic similarities in accessions that were grouped in the same cluster and quadrant.

Fig. 2
figure 2

2D images of the first two principal components showing overall variation among hundred accessions of Bambara groundnut in the locations and years (2017/2018, 2018/2019 and 2019/2020)

Allele frequencies and proportion based on Single nucleotide polymorphism (SNPs) genotyping

The selected accessions used in this study showed allelic variation across multiple loci and such differences in the allele frequencies produced the genetic variation in the population. However, the values of proportion of alleles and diploids observed were small and allele frequencies very similar (Table 4). DNA quality ranges between 1.21 and 2.22.

Table 4 Gene summary based on Single Nucleotide Polymorphism genotyping

Marker-trait association

The filtered DArTseq genotyping produced 5927SNPs from 98 lines. In the 5927 SNP filtered, 4095 SNP was unaligned on ‘Mung bean genome’. In the analysis 1832 SNPs (30.90%) aligned on ‘Mung bean genome’. Chromosome 1 had 158 aligned markers (8.62%), chromosome 2 had 119 aligned markers (6.49%), chromosome 3 had 120 aligned markers (6.55%), chromosome 4 had 86 aligned markers (4.69%), chromosome 5 had 203 aligned markers (11.08%), chromosome 6 had 148 aligned markers (8.07%), chromosome 7 had 203 aligned markers (11.08%), chromosome 8 had 199 aligned markers (10.86%), chromosome 9 had 98 aligned markers (5.34%), chromosome 10 had 132 aligned markers (7.20%) and chromosome 11 had 134 aligned markers (7.31%) while the remaining markers were scaffolds. General linear model (GLM) revealed 1504 markers were significant. Mixed linear model (MLM) revealed 611 markers were significant and MDS revealed 3589 markers were significant in the sequence at –log104 for 21 traits. Total unfiltered SNP was 11,821 and 3387 SNP (28.65%) aligned on Mung bean genome. Genome-wide association studies indicated significant SNP markers that revealed quantitative traits loci (QTLs) associated with phenotypic traits. However, many of the quantitative traits used in this study associated with more than one marker. Table 5 revealed that marker 24,383,534|F|0–6:G > A-6:G > A associated with Chaff weight, plant height, shelled harvest per Plot and shelling percentage. Marker 37,320,448|F|0–15:G > A-15:G > A also associated with Chaff weight, plant height, shelled harvest per Plot and shelling percentage. Marker 24,383,752|F|0–61:G > T-61:G > T associated with 100-seed weight, Number of pods per plot, shelled harvest per plot and shelling percentage. Marker 24,346,601|F|0–67:T > C-67:T > C only associated with number of days to flowering. Marker 4,183,841|F|0–29:A > G-29:A > G only associated with number of flower per peduncle. Marker 24,383,752|F|0–61:G > T-61:G > T associated with number of pods per area, shelled harvest per plot, shelling percentage and 100-seed weight. Marker 24,385,974|F|0–26:C > G-26:C > G associated with number of seed per pod, terminal leaflet length and terminal leaflet width. Marker 24,346,965|F|0–27:C > T-27:C > T associated with pod length and terminal leaflet width. Marker 24,384,331|F|0–20:G > A-20:G > A only associated with pod width. Marker 27,640,259|F|0–9:A > G-9:A > G associated with petiole length, terminal leaflet length and terminal leaflet width. Marker 27,641,016|F|0–24:G > T-24:G > T associated with plant height, shelled harvest per plot, shelling percentage, terminal leaflet length, terminal leaflet width and petiole length. Marker 27,641,679|F|0–9:C > T-9:C > T only associated with seed length. Marker 24,383,723|F|0–24:C > T-24:C > T associated with only seed width. Marker 4,181,685|F|0–19:T > A-19:T > A associated with seed weight per plot and yield per hectare. Markers 24,383,752|F|0–61:G > T-61:G > T associated with Shelled harvest per plot, Shelling percentage, 100-seed weight and number of pods per area. Marker 24,383,752|F|0–61:G > T-61:G > T associated with Shelled harvest per plot, Shelling percentage, 100-seed weight and number of pods per area. Marker 24,346,321|F|0–14:G > A-14:G > A associated with terminal leaflet length and terminal leaflet width. Marker 24,346,321|F|0–14:G > A-14:G > A associated with terminal leaflet width and terminal leaflet length. Marker 4,181,685|F|0–19:T > A-19:T > A associated with seed weight per plot and yield per hectare. Marker 24,346,252|F|0–35:A > G-35:A > G only associated with yield per plant. MLM revealed 611 significant markers with associated phenotypic trait. Nineteen of the 21 phenotypic traits had significant markers associated with the traits in the analysis. Chaff weight per plot had two significant markers associated, 100-seed weight had 11 significant markers associated, number of flower per peduncle had only 1 significant marker associated, number of days to flowering had 5 significant markers associated, number of pods per plot had 9 significant markers associated, number of seed per pod had 32 significant markers associated, pod length had 45 significant markers associated, pod width had 3 significant markers associated, petiole length had 4 significant markers associated, plant height had 4 significant markers associated, seed length had 7 significant markers associated, seed width had 25 significant markers associated, seed weight per plot had 5 significant markers associated, shelled harvest per plot had 13 significant markers associated, shelling percentage per plot had 13 significant markers associated, terminal leaflet length had 119 significant markers associated, terminal leaflet width had 179 significant markers associated, yield per hectare had 3 significant markers associated and yield per plant had 131 significant markers associated. In the associated traits, terminal leaflet width had the highest associated markers, followed by yield per plant and terminal leaflet length respectively. Figures 3, 4, 5, 6, 7 and 8 showed Manhattan plots of − log104 versus chromosomal position of SNP markers associated with some of the phenotypic traits. A red line represents the significant threshold (− log104) and higher chromosomal positions above the threshold showed loci significance to variations.

Table 5 DArTseq SNP markers having significant association with more than one morphological trait
Fig. 3
figure 3

Manhattan plots of − log104 versus chromosomal position of SNP markers associated with plant height (PHT). Significant threshold (− log104)

Fig. 4
figure 4

Manhattan plots of − log104 versus chromosomal position of SNP markers associated with Terminal leaflet length (TLL). Significant threshold (− log104)

Fig. 5
figure 5

Manhattan plots of − log104 versus chromosomal position of SNP markers associated with Terminal leaflet width (TLW). Significant threshold (− log104)

Fig. 6
figure 6

Manhattan plots of − log104 versus chromosomal position of SNP markers associated with petiole length (PetL mm). Significant threshold (− log104)

Fig. 7
figure 7

Manhattan plots of − log104 versus chromosomal position of SNP markers associated with yield per plant (Yplant g). Significant threshold (− log104)

Fig. 8
figure 8

Manhattan plots of − log104 versus chromosomal position of SNP markers associated with number of pods per plot (NPdspar). Significant threshold (− log104)

Discussion

DArTseq based on single nucleotide polymorphisms (SNPs) shown seven subpopulations, indicating wide genetic variation. This reiterated the suitability of DArT high throughput sequencing for diversity studies. Olukolu et al. (2012) also reported genetic diversity for Bambara groundnut accessions using DArT markers. DArTseq revealed that accessions formed different subpopulation, indicating high genetic variation in the selected population. DArTseq also revealed some accessions in the same subpopulation; this might be due to related accession ancestry or resulting from generational out-crossing, hence, related genetic constitution. Highest number of accessions (26) in the subpopulation structure (cluster I), indicated that the highest genetic relationship existed in that subpopulation structure. Cluster III had the lowest number of accessions (6) in the subpopulation structure, indicating that the lowest genetic relationship existed in that subpopulation structure. This was also reported by Rex Bernardo, 2020 that using molecular data is a veritable tool in unlocking diversity. This indicated that accessions clustered in the same group have genetic relationship in the population. Diversity among the selected accessions shown in different subpopulation (cluster I-VII), in relation to variation expressed also indicated dissimilarities in the genetic makeup of the population. The consistency in the grouping of the accessions as expressed in the clusters indicated that the DNA bases expressed in the genetic makeup form the basis of heredity, hence, responsible for traits transfer to sustaining diversity in Bambara groundnut. Alleles percentage variation showed ≥ 1% genetic variation in the population data, where Cytosine (C), Guanine (G), Adenine (A) and Thymine (T) ranged from (23–24%), C and G had (24%) while A and T (23%) in the SNPs, which indicated large source of sequence variation in Bambara groundnut. Hence, SNPs are confirmative means to reveal genetic diversity. Cytosine and Guanine are the most common bases in the regions of the genome as well as in the genes with over 24% of the genetic makeup. This indicated that the differing allele frequencies are responsible for genetic variations. This was also supported by Matthew et al. (2004), who reported that SNPs with MAF greater than 5% in populations are responsible for genetic variations. Genome-wide Association Studies (GWAS) identified stable loci for 19 phenotypic traits; this indicated that these loci may be veritable tools in developing new cultivars with high yield and yield stability. The associated significant markers indicated high level of genetic variance in Bambara groundnut. This supports the findings of Choudhary et al. (2013), Huynh et al. (2013), Aliyu et al. (2016) that molecular genetic diversity analyses aid breeding decisions in crop species. The differing significant marker sites indicated variations in the haplotypes and identified exons. Stadler (2009), Somta et al. (2011) also reported high level of allelic diversity in Bambara groundnut. However, some of the significant markers associated to more than one trait confirmed pleiotropy or linkage of the genes. The alignment of the sequence polymorphic markers on ‘Mung bean genome’ by some of the significant markers associated with phenotypic traits indicated that chromosome positions 5, 7 and 8 had the strongest associations with the phenotypic traits and is crucial to be considered in genetic variation of Bambara groundnut which accounted for 33.02% of variations identified through the chromosomes. Similar result was also reported by Gahlaut et al. (2019) in wheat. The alignment of the Bambara groundnut sequence on the ‘Mung bean genome also indicated that 1832 SNPs (30.90%) aligned, nevertheless, some of the markers were less informative and could not be assigned to chromosome locations in the sequence, hence tagged scaffolds, this reiterated the result of Ho et al. (2017) that 48% of Bambara groundnut sequence marker tags were mapped on the ‘Mung bean genome’. GWAS also indicated that many of the phenotypic traits are associated with more than one marker in different or the same chromosomal positions, meaning that the phenotypic traits are polygenic; they are determined by the interaction between several alleles. This further reiterated the fact that each marker compared the trend in the quantitative traits to the trend in the genotypes and identify loci causing variation in the genome. Further observation showed that some of the markers coded for more than one trait, i.e. single marker coding for two or more phenotypic traits, this indicated the pleiotropic nature of genes. This was also supported by Wang et al. (2008), Yongsheng Chen (2011) who corroborated that genetic trait correlation can be due to pleiotropy or linkage disequilibrium. Although, discussion of genetic traits correlation is controversial but Falconer (1960) resolved that pleiotropy is possible to be the reason for traits association than linkage. In contrast Mather and Jinks (1971) advocated that associations between traits are due to linkage. Further observation in this study as indicated at the sequence polymorphic level, showed that genetic traits association is due to pleiotropy. This was also supported by Sibov et al. (2003), Zygier et al. (2005), Chen et al. (2008) and Wang et al. (2008). This study further revealed that some traits are associated with the same sequence polymorphic markers and chromosome position, meaning that the traits are correlated.

Conclusion

This study confirmed DArT genotyping based on SNP is effective to detect the genetic diversity and population structure of Bambara groundnut accessions. DArT markers used also revealed wide range of genetic diversity and making the selected population a potential source for Bambara groundnut improvement. Selection of high contributing morphological traits in Bambara groundnut showed differences between populations. The differing identified subpopulation of Bambara groundnut could be used in breeding improved cultivars in enhancing responsiveness to climate change. This could further be reaffirmed by using other high throughput markers in future research.