Introduction

Cowpea (Vigna unguiculata (L.) Walp.; 2n = 2x = 22) is a major legume crop produced worldwide under low-input production systems in arid and semi-arid agroecologies. It is an important annual pulse crop farmed in Latin America, South Asia, and Africa’s arid tropical climates (Boukar et al. 2019). The crop is mostly cultivated for its grain, which has high protein (20–32%) and carbohydrate content (50–60%). It is also rich in vitamin C, iron, and zinc found both in the grains and leaves (Gonçalves et al. 2016). According to Jayathilake et al. (2018), cowpea, which is known as “poor man’s meat” in many developing countries, plays a significant role in the human nutrition. The crop is also a key component of traditional cropping systems due to its ability to fix atmospheric nitrogen via symbiosis with rhizobium, which results in soil fertility enhancement, and important in smallholder farming systems where little or no fertilizer is applied (Bado et al. 2006).

Global production of cowpea is projected to reach 12.3 million tons by 2030 (Boukar et al. 2016). It is a versatile crop that provides revenue to millions of smallholder farmers as well as traders who sell healthy grain and leaves as fresh vegetables. Its production and consumption are concentrated in Sub-Saharan Africa, especially West Africa and East Africa, where its nutritional value and drought tolerance traits place it in a unique position in the continent’s attempts to develop nutrition-sensitive food systems (Boukar et al. 2016; Muñoz-Amatriaín et al. 2017). This is especially crucial for lowering malnutrition in pregnant or breastfeeding mothers and children under the age of five (Gomes et al. 2019).

Despite its African origins, the domestication center of cowpea remains unknown (Carvalho et al. 2017); however, it is thought to be in East or West Africa, where there is substantial morphological and genetic diversity is the centre of orgin, followed by a sub-domestication zone in India (Xiong et al. 2016). According to Bado et al. (2006), the selection/domestication of two more subspecies in India, the yard-long bean (Vigna unguiculata subsp. sesquipedalis (L.) Verdc.), and the casting accessions from Europe often cluster with those from West Africa.

Cowpea breeding began in the 1960s, primarily in African nations such as Nigeria, Senegal, Uganda, and Tanzania (Hall 2012). Cowpea breeding has encountered various challenges, including a narrow cowpea gene pool due to a genetic bottleneck during domestication, reduced genetic variability due to “founder effects,” and limited germplasm interchange (Carvalho et al. 2017). Furthermore, the crop has received little attention and has remained an orphan crop in terms of research in a number of countries (Ketema et al. 2020). Efforts have been made in Uganda to enhance cowpea genetic diversity by introducing exotic genetic materials from the International Institute of Tropical Agriculture (IITA), University of California Riverside, and other African national breeding programs, including Ghana, Kenya, and Nigeria, with most of them being breeding lines.

According to Boukar et al. (2019) and Fatokun et al. (2018), the recent advancement in genomics and next-generation sequencing (NGS) technologies have resulted in the generation of high, medium, and low-density molecular markers that are frequently utilized for analyzing genetic diversity in germplasm, genetic fingerprinting, QTL mapping, and gene discovery. In cowpea, several marker technologies have been deployed including random amplified polymorphic DNA (RAPD) (Olufisayo et al. 2016), restriction fragment length polymorphism (RFLP) (Boukar et al. 2016), amplified fragment length polymorphism (AFLP) (Olufisayo et al. 2016; Wamalwa et al. 2016), single sequence repeat (SSR) (Ragul et al. 2018), and single nucleotide polymorphisms (SNPs) (Chidebe et al. 2018).

Among the molecular markers, SNPs have been widely used for analyzing genetic diversity of populations compared to other markers such as AFLP and SSR (Varshney et al. 2007) because of their abundance in the genome of plants and other organisms as well as high throughput (Mammadov et al. 2012). Among the plethora of SNP genotyping platforms, the Diversity Array Technology (DArT-Seq) has been widely used by breeders to generate thousands of SNP markers and to assess the genetic diversity of panels and breeding populations in many crops including maize (Ayesiga et al. 2022), yams (Agre et al. 2019; Bhattacharjee et al. 2020; Amponsah et al. 2023), and cassava (Xia et al. 2005; Adu et al. 2021). DArT-Seq is a hybridization-based DNA sequencing technology that is high-throughput, highly reproducible, at low-cost. In addition, DArT-Seq does not require prior sequence information for detecting SNPs associated with loci/alleles for traits of interest (Nadeem et al. 2018).

The objective of this study was to determine the level of genetic diversity and population structure of cowpea parental lines assembled for breeding in Uganda. Assessing genetic diversity within cowpea germplasm assembled from different geographical locations will help in the selection of a core cowpea germplasm for breeding purposes. This will help guide future development of improved cowpea varieties for Uganda and the region.

Materials and Methods

Genetic Materials and Leaf Sampling

A total of sixty-two (62) genotypes of Vigna unguiculata (L.) Walp, originating from Uganda (23 genotypes; 37%), Ghana (3 genotypes; 0.05%), Nigeria (31 genotypes; 50%), Kenya (2 genotypes; 0.03%), and the United States (3 genotypes; 0.05%), were sampled (Table 1) from a collection available at the Makerere University’s Makerere Regional Centre for Crop Improvement (MaRCCI). The bulk of the genotypes (77.4%) were breeding lines actively utilized in breeding efforts from their various origins, while 11.3% of the genotypes were landraces.

Table 1 List of genotypes used for the study

The 62 genotypes were planted in a completely randomized design at MaRCCI in the screen house. Cowpea leaf samples were collected using a method described by KBS-9370–001 for leaf sample collection for DNA extraction. Using a leaf puncher, leaf samples from three (3) tagged plants were collected onto each of the 96-well plates five weeks after planting. The leaf samples were then oven-dried at 80 °C and sent for genotyping at SEQART AFRICA, based at the International Livestock Research Institute (ILRI), Nairobi.

Cowpea DNA Extraction and DArTseq Genotyping

Cowpea genomic DNA was extracted from each leaf sample using a NucleoMag Plant DNA extraction kit (Mag-Bind® Plant DNA DS 96 Kit). The DNA quality for each leaf sample was quantified using NanoDrop 2000c spectrophotometer (Thermo Scientific, Waltham, MA, USA), and visualization on 0.8% agarose gel before the concentrations of genomic DNA was adjusted to 100 ng/µl. DNA library construction was performed using the genomic complexity reduction technology (Kilian et al. 2016), which uses a combination of PstI and MseI enzymes for DNA digestion. Following the ligation of barcoded adapters and common adapters, adapter-ligated fragments were amplified using PCR. The libraries were purified and quantified for cluster creation using an automated clonal amplification system (cBOT Illumina) and then sequenced using the Illumina Hiseq2500, an NGS platform (Kilian et al. 2016). The DArTseq markers were scored using DArTsoft14, which is an in-house marker scoring pipeline (Kilian et al. 2016). SilicoDArT markers and SNP markers were both scored as binary for the presence/absence (1 or 0, respectively) of the restriction fragment with the marker sequence in the genomic version of the sample.

Filtering, Genetic Diversity Analyses, and Determination of Population Stratification

TASSEL (v5.2.52) was used for SNP data quality verification and filtering as explained by Bradbury et al. (2007). SNP markers having more than 20% missing data, a minor allele frequency (MAF) of <0.05, with unknown physical position on the chromosomes across samples (genotypes) were deleted. The k-nearest neighbor genotype imputation approach was used to further impute the SNP data in TASSEL (v5.2.52) (Bradbury et al. 2007). A total of 2,746 SNPs were selected and utilized in subsequent analysis. TASSEL (v5.2.52) was used to analyze SNP marker information and minor allele frequencies, and PowerMarker (v3.25) was used to estimate polymorphism information content (PIC) (Liu and Muse 2005). The “adegenet” package in R was used to calculate observed and expected heterozygosity (R Core Team 2020).

Bayesian model-based clustering of the STRUCTURE program (v2.3.4) was used to estimate the number of hypothetical subpopulations (K) (Evanno et al. 2005; Porras-Hurtado et al. 2013). The STRUCTURE analysis used a burn-in period of 10,000 Markov-chain Monte Carlo iterations using an admixture model based on the Hardy–Weinberg equilibrium and its associated allele frequencies.

Ten population numbers (K ranging from 1 to 10) were evaluated, and each K was separately run ten times. The structure outputs were further analysed using STRUCTURE HARVESTER (Earl and vonHoldt 2012), which allowed the optimum K value to be identified as a distinct peak in the change of likelihood (K). To categorize the sixty-two genotypes in the different population groups given by STRUCTURE HARVESTER, a membership threshold (qi) of 0.6 was used, and genotypes that fell below the threshold for all groups were deemed admixture genetic material.

In addition, to complement the STRUCTURE analysis, a discriminant analysis of principal components (DAPC) was performed in R using the “adegenet” package (R Core Team 2020) to determine the optimal number of clusters inferred using the K-means analysis by varying the possible number of clusters from 1 to 10. DAPC scatter plots were prepared for the clusters identified by K-means using the first ten main components.

Furthermore, GenAlEx 6.503 (Peakall and Smouse 2012) was used to perform an analysis of molecular variance (AMOVA) (Reyes-Valdés et al. 2013) to examine population divergence among the genetic groupings found by the cluster analysis. The marker datasets were numerically coded as A = 1, C = 2, G = 3, and T = 4 before AMOVA (Blyton and Flanagan 2006) was performed. AMOVA with 999 permutations was used with the numerically coded marker data. The genetic differences were divided into two categories: variance across populations (PhiPR) and variation within populations (PhiPT). Finally, phylogenetic connections among genotypes were constructed in PowerMarker (v3.25) software using a Euclidean distance matrix (Liu and Muse 2005). Molecular Evolutionary Genetics Analysis (MEGA-X v10.1.8) software was used to show the generated tree (Kumar et al. 2018).

Results

Quality, Diversity, and Functional Characterization of DArTseq-SNPs on Cowpea Chromosome

The 62 cowpea genotypes yielded 7,304 unfiltered SNPs, of which 2,746 good-quality SNPs (37.5%) spread over the 11 chromosomes of Vigna unguiculata were retained for further analysis after filtering (Fig. 1 and Table 2). Chromosome 3 had the highest number of SNPs (338 SNPs; 12.3%), followed by chromosome 4 (332 SNPs; 12.2%) and chromosome 7 (331 SNPs; 12.1%), while chromosome 2 (164 SNPs; 0.06%) had the fewest SNPs (Fig. 1 and Table 2). The polymorphism information content (PIC) as a measure SNP quality indicator varied from 0.47 to 0.51 with a mean value of 0.49. The expected heterozygosity ranged from 0.48 to 0.56, with a mean value of 0.53. On the other hand, the observed heterozygosity varied from 0.04 to 0.07, with a mean of 0.06. Likewise, the average minor allele frequency (MAF) was 0.25 and ranged from 0.21 to 0.29 (Table 2).

Fig. 1
figure 1

Distribution of 2,746 SNPs across 11 cowpea chromosomes

Table 2 Quality and summary statistics of DArTseq-SNPs on cowpea chromosomes

Population Differentiation and Genetic Relation Between Cowpea Genotypes

The analysis of molecular variance (AMOVA) found that 97% of the variability was within each genetic group and just 7% was among the five populations/genetic groupings based on geographic origin. Cluster analysis was used to classify the 62 cowpea genotypes into two major clusters (clusters A and B) with sub-clusters within the main clusters (Fig. 2). According to the clustering approach utilized, only one genotype (AC20C042) originated from Nigeria, whereas all six genotypes in cluster A were derived from Uganda. The largest cluster, however, was cluster B, which included genotypes from all geographical regions represented. Cluster B was divided into two subclusters, with one Nigerian genotype standing out (Fig. 2). There was no link between the origin of the cowpea genotypes and the cluster grouping (Table 3).

Fig. 2
figure 2

Hierarchical clustering of 62 cowpea genotypes based on 2,746 SNPs

Table 3 Analysis of molecular variance within and among the 62 genotypes assessed with 2,746 SNP markers

Population Structure and Discriminate Analysis of Principal Coordinate of Cowpea Genotypes

The STRUCTURE analysis calculation of the populations at each K-value and the membership coefficients (qi) was highly informative. Simulations (logarithm probability relative to standard deviation, K) computed using SNP markers revealed a strong peak at K = 3 and K = 6, which indicated the optimal number of sub-populations (K = 6).

At K = 3, POP I, POP II, and POP III (S1 Table) had 37 (59.7%), 11 (17.7%), and 14 (22.6%) genotypes, respectively. Most genotypes detected in POP I were from Uganda (26 genotypes; 70.2%), followed by Nigeria (8 genotypes; 21.6%) and Kenya (one genotype; 0.27%) (Fig. 3B and Table S1). Most of the genotypes discovered in POP II (7 genotypes; 63.6%) originated from Nigeria, with the remaining sites each having only one genotype. Similarly, genotypes derived from Nigeria dominated POP III (12 genotypes; 85.7%), with the remaining sources represented by only one genotype each.

Fig. 3
figure 3

Population structure in the cowpea germplasm. A Likelihood of ΔK showing the best K value (K = 2). B Population structure obtained for 62 cowpea genotypes based on 2,746 SNPs

At ΔK = 6, all the genotypes in POP I (5 genotypes; 0.8%) originated from Nigeria and were the least. POP II had the most genotypes in K = 6, with 16 genotypes (25.8%). Nigeria (8 genotypes) and Uganda (7 genotypes) had the most genotypes in this population, with Kenya having only one genotype. This population was followed by POP IV, which had a total of 15 genotypes (24.2%), the majority of which came from Nigeria (7 genotypes) and Uganda (7 genotypes). POP III and POP VI each had 9 genotypes (14.5%), and the bulk was from Nigeria, with Ghana and Kenya each having only one genotype (Fig. 3B and Table S1). Within the highest peak delta at 6, the expected heterozygosity ranged from 0.06 to 0.38 for POP II and POP III (Table S1), respectively. The measure of population differentiation due to genetic structure (Fst) from the studied population was least observed for POP III (Fst = 0.01) and the highest was in POP II (Fst = 0.96) (Table 4).

Table 4 Pairwise comparison (Fst) and expected heterozygosity at ΔK = 6 subpopulations using the model-based clusters

Discriminant Analysis Principal Component (DAPC)

The DAPC method was further conducted to determine the sub-clusters at K = 2 (Fig. 4A), K = 3 (Fig. 4B), K = 4 (Fig. 4C), and K = 5 (Fig. 4D). The summary of the DAPC cluster groupings and probabilities of cluster membership allocations of cowpea genotypes at K = 2, 3, 4, and 5 are presented in the Table S2. Based on the possibility of cluster membership assignment, DAPC cluster grouping both K = 3 and K = 5 represented a good fit.

Fig. 4
figure 4

Discriminant analysis of principal components (DAPC). A Discriminant analysis of principal components with K = 2. B Discriminant analysis of principal components with K = 3. C Discriminant analysis of principal components with K = 4. D Discriminant analysis of principal components with K = 5. The axes represent the first two linear discriminants. Each color represents a cluster and each does represent an individual

The DAPC biplot, together with the plot of individual densities on the first discriminant function, demonstrated an unambiguous division of the cowpea genotypes into three clusters with minimum admixed individuals at K = 3. At K = 3, cluster one had 37 genotypes (59.7%), including 11 genotypes (29.7%) from Nigeria and the bulk (21 genotypes; 56.8%) from Uganda, followed by Ghana (2 genotypes; 5.4%), Kenya (2 genotypes; 5.4%), and the United States (one genotype, 2.7%) (Fig. 4B). Cluster two included just 8 genotypes (12.9%). However, in cluster three, 17 genotypes (27.4%) were found, the bulk of which were from Nigeria (14 genotypes). The results of the DAPC analysis agreed with the conclusions of the STRUCTURE analysis.

Furthermore, the five discriminant functions (Fig. 4D) produced at K = 5 that described cluster one (29 genotypes; 46.8%), cluster two (4 genotypes; 6.5%), cluster three (8 genotypes; 12.9%), cluster four (8 genotypes; 12.9%), and cluster five (13 genotypes; 20.9%) were highly informative. Cluster one which contributed the highest number of genotypes (29 genotypes; 46.8%) consisted of primarily cowpea genotypes sourced from Uganda (12 genotypes; 41.4%) and Nigeria (12 genotypes; 41.4%) with only 2 genotypes (7.3%) sourced from Ghana and Kenya, respectively. However, the cluster groupings and probabilities of cluster membership allocations of genotypes in K = 5 were maintained and equally represented in K = 4 with only cluster three with Nigeria genotypes (Fig. 4D).

Discussion

Genetic diversity is a critical aspect of plant breeding as it provides a pool of variation that can be used to develop improved cultivars with desirable traits (Grzebelus et al. 2014). In the case of cowpea, which is a staple crop in many African countries, understanding the genetic diversity of parental lines is crucial for developing high-yielding and disease-resistant cultivars (Seo et al. 2020). Information on germplasm diversity is key to guiding effective and long-term breeding strategies. In this study, we examined genetic diversity among core parental cowpea genotypes, using informative molecular markers as a critical step toward management, genetic improvement, and conservation of cowpea germplasm in Uganda.

Over 2,000 filtered SNPs were used in this study to determine the degree and distribution of genetic diversity among cowpea genotypes assembled from five different geographical sources. Of the total markers genotyped, only 55% were observed to be polymorphic (MAF ≥ 2%). However, 76% and 63% polymorphism have been reported using inter simple sequence repeats (ISSR) markers within cowpea collections from Nigeria and Brazil (Dias et al. 2015) and indigenous cowpea variants from Africa (Ghalmi et al. 2010). ISSR markers’ prominent level of polymorphism may be due to their substantial number of alleles and abundance in the genome (Varshney et al. 2007).

In this study, the observed genetic distance between pairs of parental cowpea genotypes was minimal, ranging from 0.07 to 0.26 with a mean value of 0.21. Fatokun et al. (2018) previously observed low genetic diversity from an investigation of 370 worldwide cowpea samples. Similarly, Wang et al. (2016) discovered that cowpea accessions in the USDA germplasm collection have little genetic diversity and a short genetic distance. Huynh et al. (2013), on the other hand, discovered a wider range of genetic distances ranging from 0.01 to 0.72 based on common alleles among cowpea landraces from 56 countries. According to Muñoz-Amatriaín et al. (2017), greater genetic diversity is identified when there are more accessions from diverse origins; moreover, cowpea is a selfing crop; thus, the low genetic diversity could be attributed to its reproductive biology and not only because they came from narrow background. Only genotypes from five countries, Uganda, Nigeria, Kenya, Ghana, and the USA, were utilized in this investigation, with moderate genetic diversity. The low genetic distance observed between pairs of parental cowpea genotypes can have several implications for Uganda’s cowpea breeding program, including reduced genetic diversity, difficulty in tracking desirable traits, and increased susceptibility to diseases and environmental stress (Talabi et al. 2017). To overcome these challenges, the breeding programs could consider incorporating more diverse sources of genetic variation, such as wild relatives, into the breeding program to increase the genetic diversity of the parental lines.

In this study, the PIC ranged from 0.47 to 0.51, with an average of 0.49. Desalegne et al. (2016) observed high mean PIC values ranging from 0.23 to 0.68 when employing SSR markers to assess genetic diversity in Ethiopian cowpea germplasm. The moderate variability among the parental lines might be attributed to the self-pollinated nature of cowpea or to the species’ very modest effective population size, which could reduce the diversity of genetic variation that is passed down from one generation to the next (Damarany et al. 2018; Ndjiondjop et al. 2017) and this bottleneck could be induced by a single domestication event in this crop (Gbedevi et al. 2021).

The low amount of polymorphism in cowpea breeding efforts can lead to a small pool of variation that can be leveraged to create better cultivars. Finding parents that exhibit the favorable qualities required for a particular breeding aim, such as high yield, disease resistance, or adaptability to certain growing circumstances, may become more difficult as a result (Acquaah 2012). Cowpea breeding strategies may consider including wild relatives or other various sources of variation in the breeding program to optimize the potential for improvement. This would enhance the likelihood of identifying parents with the desired qualities and provide researchers with a larger variety of variations to work with (Boukar et al. 2019). Additionally, the favorable agronomic and horticultural characteristics of the original cultivar may be preserved while the required traits are incorporated into elite lines using backcrossing and recurrent selection processes.

AMOVA analysis revealed a moderate but significant divergence in cowpea accessions from five different geographical locations. Significant variations were seen among populations and individuals, as well as across individuals (p = 0.05). Individual genetic variance (97% of total variation) was significantly larger than population variance (3% of total variation). Fatokun et al. (2018) and Chen et al. (2017) discovered related results, with the greatest variability identified among accessions compared to within and across populations. As a result, individual variation rather than geographical alignment accounts for most of the genetic variation seen in cowpea.

The findings of this study on genetic variations across cowpea genotypes might have significant ramifications for Uganda’s cowpea breeding efforts. High genetic variability across genotypes can provide a huge pool of variation for developing superior cultivars, but low genetic diversity might restrict the possibility of improvement (Asiedu et al. 1998). These findings can help guide breeding decisions and the selection of parental lines for crosses to accomplish certain breeding objectives. The significant variation among individual genotypes, on the other hand, is due to germplasm sharing either among breeding programs or among smallholder farmers across geographic areas. Furthermore, the limited degree of diversity found across areas might be due to strong gene flow within regions, leaving limited opportunity for genetic differentiation along geographical lines like previous findings (Wamalwa et al. 2016). Because of the low levels of divergence across geographic areas and the high levels of variation within regions, Xiong et al. (2016) indicated that a large random sample would capture the majority of the genetic variation among cowpea genotypes in each region. In addition, excessive germplasm exchange can also lead to a loss of genetic diversity, as farmers may adopt a limited range of varieties and abandon other, potentially valuable, varieties (Teeken et al. 2018). This can reduce the pool of genetic resources available for crop improvement and limit the adaptability of crops to changing environmental conditions. The high amount of germplasm exchange among smallholder farmers across geographic areas can have both positive and negative implications for crop improvement and genetic diversity. To maximize the benefits of germplasm exchange while minimizing potential negative impacts, it is important to promote exchange among farmers while also promoting the conservation of local genetic resources. This can help to ensure that farmers have access to a wide range of genetic resources and that crop improvement efforts are sustainable in the long term.

The grouping patterns of 62 cowpea genotypes from various geographic locations indicated the presence of two separate groups. The observed clustering pattern was not entirely compatible with the genotypes’ geographic origins. The accessions were categorized in the current investigation regardless of where they were collected. The proximity of the accessions in the UPGMA tree revealed a similar genetic composition among several of the genotypes. This might be attributed to seed exchange among breeders and the farmer-to-farmer seed trade system, both of which are common in Sub-Saharan Africa.

SNP-based population structure analysis was useful for preserving and monitoring the genetic variety necessary for a strong breeding program (Eltaher et al. 2018). The SNP markers utilized in this investigation were informative, indicating that they were appropriate for analyzing genetic diversity across and within the cowpea genotypes studied. The population structure was determined using two methodologies (STRUCTURE and DAPC), with the STRUCTURE analysis revealing the presence of six major populations (K = 6) and the DAPC analysis revealing five significant clusters (K = 5) for the 62 cowpea genotypes.

The high fixation index (Fst) value among genotypes between clusters verified the observed genetic differentiation. This contrasts with studies on cowpea by Fatokun et al. (2018) and cassava by Rabbi et al. (2015), which both showed a low fixation index. The current study’s low projected heterozygosity may be ascribed to the genotypes’ complicated breeding history in selected geographical origins. Even though the genetic variants discovered in this study were low, it provided vital information that might be used in breeding and research in Uganda. The germplasm may be examined for characteristics and used to drive genotype selection for genomic prediction and genome-wide association studies.

Conclusion

Our research found significant genetic heterogeneity in cowpea genotypes from Uganda, Nigeria, Ghana, Kenya, and the United States. SNP markers obtained from DArT-Seq were useful in evaluating genetic diversity and relatedness among parental cowpea lines used at the breeding program at MaRCCI, Uganda. The findings of this molecular characterization might be useful in Ugandan cowpea improvement efforts, serving as a guide for selecting accessions with desired features for breeding reasons. The genetic diversity discovered in the germplasm would be beneficial to the country’s cowpea development effort. The genetic diversity of the parental lines assembled for cowpea breeding in Uganda is a crucial aspect of the breeding program as it will provide the genetic material for developing improved cultivars with desired traits such as high yield, disease resistance, and adaptability to local growing conditions.