1 Introduction

Bambara groundnut is indigenous to Africa and its annual production is 0.3 million tonnes per annum. Average yield of 850 kg/Ha is achievable depending on the variety and environmental conditions had been reported [51, 52]. Unshelled mean yield of 3 tonnes/Ha had been reported with landraces cultivated in Nigeria [51, 52]. Diversity has been established in Bambara groundnut (Vigna subterranea [L.] Verdcourt, Syn: Voandzeia subterranea [L.] Thouars) of Nigerian origin [34]. Studies based on morphological characterization of some Nigerian accessions of Bambara groundnut have shown variation in the expression traits such as plant height, terminal leaflet length, terminal leaflet width, seed yield and number of days to flowering [11, 26], but the available research output of work done on candidate genes associated with Bambara groundnut flowering, and understanding this concept remains a milestone in legume research [51]. Hence, are limiting factors in achieving improved production through modern biotechnological technique to produce improved varieties. Bambara groundnut (2n = 2x = 22) is still classified as an orphan crop due to limited research and hence, it full potentials on production and commercialization have not been fully explored [53]. Flowers are reproductive structures in Bambara groundnut that transform to seeds, which successfully pass genetic material to the next generation. Flowering in Bambara groundnut raises hope for anticipated yield, as bambara groundnut flowering provide the pollen that is needed for fertilization for seed production [54, 55]. The emergence of flowers in plants is a gene-environment activity [29]. Researchers have also shown differential gene expression analysis in certain environmental conditions, and Bambara groundnut responded to different sets of genes [1, 4, 7, 10, 24, 30]. The need to understand the differentially expressed genes associated with flowering in underutilized legumes is imperative. It is an insight to cell responses due to protein presence or absence [53]. Differentially expressed genes are essentials of naturally occurring chemical compounds that play important roles in various aspects of plant growth and development, most especially at the flowering stage [8].

Gene presence and activity varies in plant growth and development, and some are present at specific times of plant development to complete a cycle such as flowering, maturation and ripening [54]. Although, advances in molecular biology and quantitative data have developed large reference data with associated gene effects to reveal phenotypic differences in plants. These resources are online to tap, but geneticists are more concerned about accurate and interaction of loci affecting phenotypic variation [14]. The functions of the genes are not limited to shoot growth, floral identity, colour of flowers, emergence of leaves but also elongation of cells, and all these determine the traits associated with plants [32]. Bambara groundnut is predominantly self-fertilizing and there is need to investigate the genetic make-up of the cleistogamous flower through associated genes for the crop improvement. This study identified differently expressed genes associated with Bambara groundnut flowering and proteins encoding the genes. The findings of this study will guide researchers in the identification of candidate genes of desired traits in molecular research of underutilized legumes.

2 Materials and methods

Field evaluation was carried out at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria (N7̊ 30′ 5.1264′′, E 3̊ 54′ 35.712′′) and International Institute of Tropical Agriculture, research station, located at the Institute of Agricultural Research and Training (IAR&T) Ikenne, Nigeria for three years (2017–2020). Seeds of one hundred accessions of Bambara groundnut were collected from the Genetic Resources Centre (GRC) (N7̊ 30′ 5.1264′′, E 3̊ 54′ 35.712′′), of IITA in 2017. The seed were sourced from the Genetic Resources Centre, International Institute of Tropical Agriculture, Ibadan under the terms of the International Treaty on Plant Genetic Resources with standard material transfer agreement (SMTA) as included in multilateral system which regulates the uses and exchanges of plant genetic materials. The accessions can be found in [34].

3 Experimental design and data collection

The experiment was laid out in a Randomized Complete Block Design (RCBD) with three replications and total block size was 21 m × 50 m. Each plot measured 2.5 m × 1 m. Inter row spacing was 1.0 m and intra row spacing was 0.25 m. Data were collected on morphological traits such as plant height (cm), terminal leaflet width (cm), terminal leaflet length (cm), peduncle length (cm), petiole length (cm), number of leaves/plant, number of seed/pod, chaff weight (g), seed weight per plot (g), 100-seed weight (g), shell thickness (mm), days to 50% flowering, days to first flowering, number of flower per peduncle, seed texture, seed eye colour, seed length (mm), seed width (mm), growth pattern, and plant spread.

4 Data analysis

4.1 DNA extraction and genotyping

Morphological data and accession grouping were described, the accessions revealed variation in the morphological expression of the traits observed and in the hierarchical clustering of groupings in the SNP data [34]. Three weeks after planting, leaf samples of each accession were collected and the plant stands from which the samples were collected was tagged. DNA extraction was done at the Bioscience Centre, International Institute of Tropical Agriculture, Ibadan following the procedure of Dellaporta Miniprep for Plant DNA Isolation [44]. The high-quality DNA (100 ng/µL) samples were shipped to DArT Pty Ltd., Canberra, Australia (35 16′ 55.2036′′ S and 149̊ 7′ 44.3928′′ E), for genotyping using the whole genome profiling service of DArTseq technology. Diversity Arrays Technology (DArT) was used to identify high-throughput single nucleotide polymorphism (SNP) genotyping. The obtained DArT marker set was filtered on the basis of individual marker-related statistics by removing markers with inappropriate quality control parameters with call rate ≤ 80% and missing data ≥ 20% in TASSEL 5.0 software [6, 39]. The informativeness of the marker was determined using the polymorphic information content (PIC). A total of 11, 821 DArT seq SNPs was generated, out of which 5927 (50.13%) high quality markers were retained for data quality at a call rate of 80% and used in the analysis. Four informative markers were associated with flowering out of the 5927 DArTseq SNP significant markers. Marker-trait associations (MTAs) were analyzed for flowering traits (number of days to flowering and number of flowers per peduncle) using the Genome wide association study (GWAS) (Figs. 1 and 2). The significance of associations between SNPs and flowering traits was based on the threshold P < 1.68 × 10−4, calculated by dividing 1 by the total number of SNPs (5927) in the analysis [22]. In the absence of Bambara groundnut genome, trimmed sequences of filtered SNPs were aligned to the Vigna radiata reference genome [21].

Fig. 1
figure 1

Manhattan plots of − log104 vs. chromosomal position of SNP markers associated with number of days to flowering (NoDtoF)

Fig. 2
figure 2

Manhattan plots of − log104 vs. chromosomal position of SNP markers associated with number of flowers per peduncle (NFpP). A red line represents the significant threshold (− log104)

4.2 Identification of candidate genes in Bambara groundnut flowering

The identified significant markers associated with flowering traits through GWAS were queried with the available protein signature on the available online sequence databases using the legume Information system (LIS) for the identification of candidate expressed protein names, controlling flowering in Bambara groundnut [21]. The sequences were submitted for blast on Vigna radiata genome. The blast search was performed for the trimmed nucleotide sequences (60-80 bps) of significant Bambara groundnut SNPs on Vigna radiata database in the legume information system [21]. After marking the annotated positions in the genome database, the scroll was zoomed to 1 Mb to identify surrounding candidate genes and twenty (20) encoding proteins were identified, and searched existing knowledge to know if they actually regulate flowering in Bambara groundnut [21].

5 Results

5.1 Marker traits association and gene functions

Table 1 revealed the identification of significantly associated markers and four markers associated with Bambara groundnut flowering in the list of the 5927 significant markers in this study. The identified associated markers with Bambara groundnut flowering, include 24385352|F|0–28:T > C-28:T > C; 27641816|F|0–17:C > T-17:C > T; 24384204|F|0–24:C > T-24: C > T and 24346601|F|0–67:T > C-67:T > C and were significant at P < 1.68 × 10−4 at chromosomes 11, 7, 4 and 5. Hence, only 0.06% of the total number of significant markers was associated with flowering in Bambara groundnut.

Table 1 Significant markers and nucleotide sequences found on Vigna radiata genome and the encoding proteins of genes for flowering

Marker 24385352|F|0–28:T > C-28:T > C identified candidate (variant) protein on chromosome 7, gene nomenclature was Vradi07g07630 within the region of Vr07:17881455..17882105, with protein name ‘Polyketide cyclase/dehydrase’ (IPR019587) associated with flowering in Bambara groundnut. Marker 24385352|F|0–28:T > C-28:T > C was also identified as candidate (variant) protein on chromosome 7, gene nomenclature was Vradi07g07680 within the location of Vr07:18234488..18236837, with protein name ‘Polyketide synthase, enoylreductase’ (IPR020843) involved in plant biosynthesis. Marker 24385352|F|0–28:T > C-28:T > C also identified candidate (variant) proteins on chromosome 7, gene nomenclature was Vradi07g07680 and Vradi07g07750 with gene region of Vr07:18234488..18236837 and Vr07:18641936..18645271, with protein name ‘Polyketide synthase, enoylreductase’ (IPR020843) and ‘Transketolase, C-terminal/Pyruvate-ferredoxin oxidoreductase’ (IPR009014) respectively, involved in the biosynthesis of a variety of plant which enhances autotrophic growth. Marker 24385352|F|0–28:T > C-28:T > C also identified a candidate (variant) protein on chromosome 7, gene nomenclature was Vradi07g07600 within the location of Vr07:17644695..17646647, with protein name ‘Transcription factor MYC/MYB N-terminal’ (IPR025610) involved in biosynthesis of secondary metabolites including flavonols and lignin and morphogenesis of flowers. Marker 24385352|F|0–28:T > C-28:T > C also identified a candidate (variant) protein on chromosome 7, the gene nomenclature was Vradi07g07740 within the region of Vr07:18443116..18446924, with protein name ‘Deoxyxylulose-5-phosphate synthase’ (IPR009014) involved in the transportation of the molecules across the membrane in the cell. Marker 24385352|F|0–28:T > C-28:T > C also identified a candidate (variant) protein on chromosome 7, gene nomenclature was Vradi07g07700 within the region of Vr07:18282445..18287676, with protein name ‘Rhamnogalacturonate lyase’ (IPR008979) involved in the slow degradation of the petal cell wall in flowers.

Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 11, gene nomenclature was Vradi11g06830 within the region of Vr11: 6871551..6876657, with the protein name ‘DHHC-type zinc finger protein’ involved in the regulation of yield increase (Table 1). Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 11, gene nomenclature was Vradi11g06000 within the region Vr11: 5890615..5896336, with protein name ‘Putative S-adenosyl-L-methionine-dependent methyltransferase’ (IPR004159) involved in a large group of plant development including flowering in plants. Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 11, gene nomenclature was Vradi11g06250 within the region Vr11: 6211377..6212072, with protein name ‘histone H1-3’ (IPR005818) conciliates in chromatin folding, proper stomata functioning, genetic expression, and cellular differentiation. Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 11, gene nomenclature was Vradi11g06490 within the region Vr11:6517125..6519306, with protein name ‘Ribosomal protein L2’ (IPR002171) present in mitochondrial, involved in the flowering of plants. Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 07, gene nomenclature was Vradi07g00920 within the region Vr07:1912863..1918248, with protein name ‘D-galactoside/L-rhamnose binding SUEL lectin domain’ (IPR000922) also present in Arabidopsis thaliana, involved in the flowering development (GO:0009908). Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 07 gene nomenclature was Vradi07g00940 within the region Vr07:1979503..1989132, with protein name ‘Microspherule protein, N-terminal domain’ (IPR025999). N-terminal domain synthesizes chloroplast proteins in mitochondria in plants. Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 07, gene nomenclature was Vradi07g01680 within the region Vr07:2821523..2823634, with protein name ‘Lipase, GDSL’ (IPR001087) involved in flower development. Marker 27641816|F|0–17:C > T-17:C > T identified a candidate (variant) protein on chromosome 07 gene nomenclature was Vradi07g01700 within the region Vr07:2841425..2849603, with protein name ‘Histone deacetylase superfamily’ (IPR000286), regulates flowering and fruit development in plants.

Marker 24384204|F|0–24:C > T-24:C > T identified candidate (variant) protein on chromosome 04, gene nomenclature was Vradi04g03640 within the region Vr04:7184338..7189633, with protein name ‘Basic-leucine zipper domain’ (IPR004827), delayed flowering. Marker 24384204|F|0–24:C > T-24:C > T identified candidate (variant) protein on chromosome 04, gene nomenclature was Vradi04g03560 within the region Vr04:7074159..7080129, with protein name ‘TUP1-like enhancer of split’ (IPR011494), mediates repression in outer whorls of flower. Marker 24384204|F|0–24:C > T-24:C > T identified candidate (variant) protein on chromosome 04, gene nomenclature was Vradi04g03760 within the region Vr04:7635166..7638397, with protein name ‘Zinc finger, ZZ-type’ (IPR000433), regulates flowering time. Marker 24384204|F|0–24:C > T-24:C > T identified candidate (variant) protein on chromosome 04, gene nomenclature was Vradi04g03850 within the region Vr04:7965793..7996880, with protein name ‘Homeodomain-like’ (IPR009057). Homeodomain-like regulates flowering time in higher plants. Marker 24384204|F|0–24:C > T-24:C > T was also identified candidate (variant) protein on chromosome 04, gene nomenclature was Vradi04g03610 within the region Vr04:7140389..7141382, with protein name ‘Phosphatidylethanolamine-binding protein PEBP’ (IPR008914). Phosphatidylethanolamine-binding protein PEBP is a flowering locus protein in plants that controls flowering.

Marker 24346601|F|0–67:T > C-67:T > C also identified as candidate (variant) protein on chromosome 05, gene nomenclature was Vradi05g02840 within the region Vr05:3602575..3635224, with protein name ‘Leucine-rich repeat’ (IPR001611). Leucine-rich repeat is involved in developmental processes, among which are cell proliferations and flower development.

6 Discussion

The significant markers revealed proteins controlling flowering in Bambara groundnut and indicated that flowering in Bambara groundnut is controlled by the interplay of many genes. Polyketide cyclase/dehydrase as a candidate protein was reported to controls flowering in Arabidopsis thaliana [40]. Mitochondria ribosomal proteins (L2-L4) presence or absence in mitochondria was involved in the regulation of flowering in soybean, cotton, tomato, and Arabidopsis [19]. Uniprot database confirmed the involvement of a candidate protein ‘D-galactoside/L-rhamnose binding SUEL lectin domain’ in flower development (GO:0009908) in Arabidopsis thaliana plants [39]. N-terminal domain proteins in Brassica napus was reported and identified that terminal flowering gene negatively regulate flowering time in Brassica napus [38]. Zinc finger, ZZ-type, was identified to regulate flowering time in Chrysanthemum [47]. Over expression of terminal flower1-genes in perennial ryegrass delays flowering, as it is expressed in inflorescence shoots and vegetative meristems [16]. Basic-leucine zipper domain delayed flowering in maize [28]. TUP1-like enhancer of split mediated repression in the outer whorls of flower [17]. GDSL Lipase is required for Anther and Pollen Development [14, 43]. Histone deacetylases are essential for gene expression in plant development and revealed the involvement of histone deacetylases in the transcriptional regulation of multiple developmental processes [22]. This was also confirmed by [20] that ‘histone H1-3’ conciliates in chromatin folding, proper stomata functioning, genetic expression and cellular differentiation. Histone deacetylase superfamily’ regulates flowering and fruit development in Capsicum annuum [49]. Basic leucine zipper domains are involved in fundamental seed development processes from the flowering stage [2]. Tup1 families play important roles in developmental processes in floral organ identity specification [50]. Zinc finger protein regulates flowering time by regulating gibberellin (GA) biosynthesis under both long and short days [47]. Homeodomain-like gene regulates flowering time in pepper [42]. Phosphatidylethanolamine-binding protein PEBP in Arabidopsis flowering locus T. (FT) gene as a controller of flowering in plants [27]. Pentatricopeptide repeat protein affects flowering in Arabidopsis thaliana [12]. Putative S-adenosyl-L-methionine-dependent methyltransferase’ as a large group involved in plant development including flowering in Lonicera japonica [48]. Leucine-rich repeat receptor kinases (LRR-RKs) regulate a wide variety of developmental and defense-related processes including cell proliferation, stem cell maintenance, hormone perception, host-specific as well as non-host-specific defense responses, wounding responses, and symbiosis [18]. Transketolase enhances autotrophic growth in Rhodopseudomonas palustris [8]. Transcription factors MYC/MYB N-terminal are key factors controlling development in plants [37]. Pentatricopeptide repeat (PPR) superfamily protein associated with seed size and shape of African yam bean (AYB) [33].

Several type III PKSs have been found in plants and all of them participate in the biosynthesis of secondary metabolites such as inhibition of flowering [15]. Type III polyketide synthases (PKSs) are key enzymes involved in the biosynthesis of a variety of plant specialized metabolites, including flavonoids, stilbenes, and sporopollenin [9]. The presence of MYB genes in higher plants, and Arabidopsis thaliana is estimated to contain more than a hundred MYB genes in their control of gene expression, especially in cellular proliferation and development [25]. The increasing dynamics of molecules with the presence of Deoxyxylulose 5-phosphate synthase through Methylerythritol 4-phosphate Pathway in Arabidopsis [23] was also reported. Flowering and grain yield of transgenic rice increased by DHHC-type zinc finger protein genes [5]. RGL regulates pollen and flower development in higher plants [41]. Methyltransferase (MTases) are biotechnology tool in crop improvement [3]. Histone dynamics were also reported to have resulted in variants responsible for gene regulation in plants, although different biochemical techniques in the analysis of histones have been developed to identify specific histones and its regulatory roles [13]. Deoxyxylulose-5-phosphate synthases were also reported to be involved in the transportation of molecules through biosynthesis in Pinus massoniana [35]. N-terminal domain synthesizing chloroplast proteins from mitochondria in plants [36] were also reported to be involved in developmental stages in plants.

7 Conclusion

This study is the fundamental research which initialized the genetic basis of Bambara groundnut flowering with four significant markers genetically associated, they include 24385352|F|0–28:T > C-28:T > C; 27641816|F|0–17:C > T-17:C > T; 24384204|F|0–24:C > T-24:C > T and 24346601|F|0–67:T > C-67:T > C found on the Vigna radiata genome and should be recognized as candidate markers. The markers also revealed functional genes that control flowering in Bambara groundnut, including histones, Polyketide cyclase/dehydrase, Transcription factor MYC/MYB N-terminal, Rhamnogalacturonate lyase, DHHC-type zinc finger protein, Putative S-adenosyl-L-methionine-dependent methyltransferase, Ribosomal protein L2, D-galactoside/L-rhamnose binding SUEL lectin domain, Lipase GDSL, Histone deacetylase superfamily, Basic-leucine zipper domain, TUP1-like enhancer of split, Zinc finger ZZ-type, Homeodomain-like, Phosphatidylethanolamine-binding protein PEBP, Leucine-rich repeat. This research revealed genes associated with bambara groundnut flowering, hence established insight on how such proteins presence or absence may be used in the improvement of underutilized legumes. Further efforts to validate the findings using other high throughput technologies and databases should be targeted in identifying functional genes in Bambara groundnut flowering towards marker assisted selection.