Introduction

Common bean was domesticated in two already separated ancestral genepools in Latin America about 7000–8000 years ago (Gepts et al. 1986; Gepts and Debouck 1991; Freyre et al. 1996; Mamidi et al. 2011). The domestication events created two known centres of origin for the divergent Andean and Mesoamerican genepools (Singh et al. 1991a, 1991b, 1991c). The domesticates belonging to the two genepools were then introduced to the eastern Africa region in the seventeenth century and their cultivation started in the nineteenth century (Greenway 1945; Gentry 1969; Gepts and Bliss 1988). After the introduction, farmers began to apply selection pressure and to conserve genotypes with valuable traits (Wortmann et al. 1998; Sperling 2001). Adoption of crop varieties and continuous selection under different cultivation practices and user preferences create and maintain on-farm genetic diversity of landraces and improved varieties (Worthington et al. 2012; Pautasso et al. 2013; Soleri et al. 2013; Wilkus et al. 2018). Landraces are valuable sources of genes for improvement of modern cultivars for resistance or tolerance to production challenges including insect pest, disease, low soil fertility, drought and other effects arising from climate variability (Kimani et al. 2001; Blair et al. 2010; Ddamulira et al. 2014; Assefa et al. 2019; Piechota et al. 2019).

Bean improvement efforts involving national and international research institutions continue to promote introduction of novel genes through exchange of germplasm between different breeding programmes from other continents or within sub-Saharan Africa and this has affected the overall genetic composition of modern varieties (Mamidi et al. 2013; CIAT 2019). Continued existence of Andean and Mesoamerican genepools in eastern Africa either through original or subsequent introductions makes the region a secondary centre of genetic diversity (Wortmann et al. 1998; Sparling et al. 2001). Accessions from different geographical regions within and across countries in eastern Africa reveal minimal gene flow across Andean and Mesoamerican genepools (Asfaw et al. 2009; Fisseha et al. 2016). The limited gene flow between the genepools and races is due to low levels of natural crossing arising from incompatibility among genepools (Kwak and Gepts 2009). A lack of correlation between genetic distance and geographical origin in common bean in Africa is attributed to seed systems (Kwak and Gept 2009), which are largely non-formal and characterized by free movement of seed across regions (Wilkus et al. 2018).

Beans belonging to both Andean and Mesoamerican genepools are in demand in eastern Africa. However, preference is skewed towards beans of Andean origin, with a demand which ranges from 73 to 83% (Gepts et al. 1988; Blair et al. 2010; Bellucci et al. 2014; Wilkus et al. 2018) despite their relatively lower yield compared to Mesoamerican counterparts (Beebe 2012). A high value is placed on specific market classes of beans which is attributed to seed size, seed shape, seed colour or even canning qualities. In Kenya, beans belonging to Nuava Granada race of Andean genepool such as red mottled or calima (locally known as Nyayo beans), red kidneys, rosecoco (pink mottled), cranberry (sugar), large red oblong or round seed types and large seeded yellow beans fetch a premium price (van Rheenen 1979). The two significant Mesoamerican bean market-classes (grain types) are small red (red haricot) and small white (navy/canning beans) which equally draw high commercial value (van Rheenen 1979; Buruchara et al. 2011). The demand for specific market classes of beans has led to realignment of breeding objectives to focus on multiple trait introgression to pyramid genes for grain type, yield accumulation, resistance to major biotic and abiotic constraints, culinary and canning qualities (Hillocks et al. 2006; Buruchara et al. 2011; Hussein 2017).

Bean fly (Diptera: Agromyzidae), alternatively referred to as bean stem maggot (BSM) is a significant insect pest of common bean which causes high yield losses ranging from 30 to 100% (Greahead 1968; Ojwang et al. 2009). Ophiomyia spencerella (Greathead) is the major bean fly species causing significant yield reduction in high altitude agro-ecologies (Songa and Ampofo 1999). Control methods for bean fly include: cultural practices, genetic diversity in the form of landraces, planting of varietal mixtures on-farm (Ampofo and Massomo 1998; Abate et al. 2000; Ssekandi et al. 2016), use of chemical and botanical pesticides and by biological means. Chemical control is effective but its use is limited by prohibitive costs among resource disadvantaged smallholder farmers in sub-Saharan Africa who constitute the bulk of common bean producers. Novel sources of resistance have been identified from landraces (Kornegay and Cardona 1991; Ojwang et al. 2010; Nkata et al. 2021b), however, the use of landraces in breeding is known to introduce genetic backgrounds of genes responsible for reduced crop’s agronomic quality thereby complicating the breeding process. Genetic architecture of common bean resistance to bean fly is predominantly additive (Mushi and Slumpa 1998; Ojwang et al. 2011; Nkhata et al. 2021a). However, there has been a slow progress in improving common bean for resistance to bean fly due to dependence of the national breeding programmes on conventional breeding methods (Nkhata et al. 2019). Efficiency in introgression of quantitative trait loci (QTLs) that confer resistance to bean fly would be enhanced if molecular tools are developed to aid in accelerated genetic gain during selection by means of DNA markers (Nkhata et al. 2019).

The level of genetic variation present in parental genotypes and heritability of target traits are crucial in successful implementation of breeding objectives. Association mapping is useful in identifying molecular markers or QTLs associated with important biotic and abiotic resistance and agronomic traits for use in marker-enabled breeding in common bean (Assefa et al. 2019). The deployment of naturally diverse set of germplasm with broad allelic coverage makes it possible to bypass the tedious process of developing bi-parental or multi-parental mapping populations (Xu et al. 2017; Assefa et al. 2019). With improvements in microarray-based marker technology, Diversity Arrays Technology (DArT) markers have gained popularity and are being deployed in genetic diversity analyses, QTL mapping and construction of high-density maps predominantly because of their reduced cost and high effectiveness (Gupta et al. 2008). Creation of the DArTseq platform through combining complexity reduction of the DArT approach with high-throughput next generation sequencing (NGS) technologies marked the beginning of a new success of sequencing of complexity-reduced representations (Gupta et al. 2008). As a result, DArTseq single nucleotide polymorphism (SNP) genotyping has been effectively employed (Wenzl et al. 2004) to facilitate linkage mapping, genome wide association studies and genetic diversity studies in common bean (Valdisser et al. 2017; Gunjača et al. 2021).

Analysis of population structure among market classes of common bean belonging to both Andean and Mesoamerican genepools using DArTseq maker loci will allow understanding of the organization of genetic diversity of common bean in eastern Africa. Such knowledge would be significant in formulation of future breeding objectives to design demand-led common bean variety ideotypes (Hussein 2017; Kimani 2017; Yao et al. 2017) while simultaneously addressing constraints affecting production in risk prone farming systems (Asfaw et al. 2009). Therefore, the aim of this study was to determine the genetic diversity and population structure alongside genetic architecture of bean fly resistance in a diverse collection of 276 common bean genotypes with respect to predominant market classes.

Materials and methods

Common bean genotypes

We assembled a total of 276 common bean genotypes which included 259 landrace accessions from Genetic Resources Research Institute (GeRRI) gene bank of the Kenya Agricultural and Livestock Research Organization (KALRO), seven bean fly resistant varieties alongside two resistant landraces from East and Central Africa Bean Research Network (ECABREN) (a regional CIAT affiliated organization) and eight local user preferred varieties developed by Egerton University Kenya (EUK) and KALRO (Table 1 and supplemental Table S1). The diversity panel represented varied range of important market classes and non-commercial seed types and comprised of different seed colours (Table 1).

Table 1 Classification of 276 common bean genotypes by seed market class, seed

Phenotypic assessment

Distribution of bean fly species varies with location where Ophiomyia spencerella is the predominant species in the high altitude areas. O. spencerella is easily identified since it is the only species with black pupae. The Egerton University Kenya site was ideal for phenotypic evaluation because it is situated in a high altitude area characterized by relatively cooler temperatures and has a history of high prevalence of O. spencerella populations. All the 276 genotypes were planted at Agronomy Research and Teaching Field at Egerton University in Njoro, Kenya. The trial was planted in an Alpha Lattice Design, with two replicates for two years (cropping seasons) during long rains (LR) 2016 (April to August) and LR 2017. However, only 256 entries were tested in 2017 because some genotypes were heavily damaged by bean fly in year one (2016) and hence had insufficient or no seed for planting. Planting was delayed by three weeks from the onset of rainfall to allow high infestation since delayed planting results in increased bean fly population (Songa and Ampofo 1999). The reaction of common bean to bean fly was assessed under natural infestation in the field by recording the number of dead plants per plot which were removed once every week, beginning at 14 days up to 48 days after planting for computation of plant mortality. Plant mortality was then computed as the cumulative total number of dead plants resulting from bean fly attack per plot over the entire period of data collection as presented below;

$${\text{Plant}}\;{\text{mortality}} = \frac{{{\text{Total }}\;{\text{number }}\;{\text{of }}\;{\text{dead}}\;{\text{plants }}\;{\text{per }}\;{\text{plot}}}}{{{\text{ Total }}\;{\text{number }}\;{\text{of }}\;{\text{plants}}\;{\text{per}}\;{\text{plot }}\;{\text{at}}\;{\text{emergence}}}}$$

The number of oviposition/feeding punctures on leaves were recorded once at 14 days after emergence. Bean fly oviposition/feeding punctures on leaves indicate preference or non-preference for specific common bean genotypes which is attributed to antixenosis (avoidance) traits (Cardona and Kornegay 1999). For further assessment, visual scoring was performed to classify bean accessions for resistance reaction based on the stem damage inflicted by bean fly larvae on the vascular tissues on the stem near the soils surface. The damage of plant stem to bean fly for each genotype was evaluated at 28 days after emergence based on a scale of 1 to 9, where 1 = resistant and 9 = highly susceptible. The scores were modified from Kornegay and Cardona (1991) as shown in Table 2. The scores were grouped into five categories: 1–2 = resistant, 3–4 = moderately resistant, 5–6 = moderately susceptible, 7–8 = susceptible and 9 = highly susceptible (Table 2).

Table 2 Rating scale for scoring stem damage at the junction between the root and the stem and plant reaction to bean fly attack.

Phenotypic data analysis

Oviposition/feeding punctures data were square root transformed while plant mortality data were arcsine square root (angular) transformed before analysis. However, stem damage score data were not transformed. We subjected all the data to residual (or restricted) maximum likelihood (REML) analyses (Patterson and Thompson 1971) to obtain the variance-components using a computer software programme, GenStat 15th edition (VSN International, Hemel Hempstead). Replicate and year were considered as fixed effects while genotype and block nested within replicate and year were random terms in the statistical model. The statistical model is given below:

$$Y_{ijkl} = \mu + \rho_{l} + \beta_{k} + G_{i} + E_{j} + GE_{ij} + \varepsilon_{ijkl}$$

where \(Y_{ijkl}\) is the mean performance of genotype i in environment j, \(\mu\) is the grand mean, \(\rho_{l}\) is the fixed replicate l main effect, \(\beta_{k}\) is the random block k effect, \(E_{j}\) is the fixed environment (year) j main effect, \(G_{i}\) is the random genotype i main effect, \(GE_{ij}\) the random genotype-by-year interaction effect, whereas \(\varepsilon_{ijkl}\) is the error term. The residuals from analyzed data were subjected to Shapiro–Wilk test and Levene’s test to confirm normality of the distribution and equality of variance, respectively. The resultant BLUPs from REML analyses of transformed data were back-transformed to their original scale for presentation in figures and tables.

Heritability estimates were obtained for all the resistance traits measured as:

$$H^{2} = \frac{{\sigma_{g}^{2} }}{{\sigma_{g}^{2} + \left( {{\raise0.7ex\hbox{${\sigma_{gy}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{gy}^{2} } y}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$y$}}} \right) + \left( {{\raise0.7ex\hbox{${\sigma_{e}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{e}^{2} } r}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$r$}}} \right)}}$$

where \(\sigma_{g}^{2}\) is the variance component for genotype, \(\sigma_{gy}^{2}\) is the variance component for genotype-by-year interaction, \(\sigma_{e}^{2}\) variance component for the residual, y is the harmonic number of years per genotype and r is the harmonic mean number of replications per genotype (Holland et al. 2003).

Genomic DNA extraction and genotyping using silicoDArT and SNP markers

We collected leaf samples from six individual plants within each accession of common bean. The leaf samples were collected from 14 days old seedlings which were immediately sealed in eppendorf tubes filled with silica gel. The silica gel was used as a desiccant to preserve the leaves before transporting to Biosciences eastern and central Africa (BecA) molecular laboratory at the International Livestock Research Institute (ILRI) hub in Nairobi Kenya for DNA extraction. The DNA was extracted following the procedure reported by Mace et al. (2003). Total genomic DNA was extracted using the ZYMO research Quick-DNA™ Plant/Seed 96 Kit, for each entry. The DNA concentrations were quantified by determining the optical density at A260 and A280 wavelengths with a Nanodrop Spectrophotometer (ND-1000; NanoDrop Products, Wilmington, DE). Thereafter, 30 μl of DNA of each sample were sent to Diversity Arrays Technology (DArT) Pty Ltd, Australia (http://www.diversityarrays.com) for genotyping.

The high-throughput DArTseq technology has been widely used to genotype many crops. We applied, the PstI-based complexity reduction method to enrich genomic representation with single copy sequences as described by Wenzl et al. (2004). The digestion of DNA sample was carried out with a rare cutting enzyme PstI, paired with a set of secondary frequently cutting restriction endonucleases (RE), followed by ligation with site-specific adapters, and amplification of adapter-ligated fragments. Post digestion was done with a PstI-RE pair and ligation done with a PstI overhang compatible oligonucleotide adapter (5′-CAC GAT GGA TCC AGT GCA-3′ annealing with 5′-CTG GAT CCA TCG TGC A-3′) (Wenzl et al. 2004). The adapter-ligated fragments were amplified following the prescribed standard procedures according to Wenzl et al. (2004). In order to develop SNP and silicoDArT markers, the optimization of DArTseq technology was carried out using two PstI-compatible adapters corresponding to two different RE overhangs. Procedures described by Kilian et al. (2012) were used to generate the genomic representations and the most appropriate complexity reduction method selected was PstI + HhaI. Next-generation sequencing (NGS) technology was employed using HiSeq2000 (Illumina, USA) to detect SNPs and silicoDArT markers. DarTsoft14 was used for analysis of sequence data, an automated genotypic data analysis programme and DArTdb, a laboratory management system (Alam et al. 2018). Markers were scored as ‘1’ for presence, and ‘0’ for absence and ‘-’ for failure to score.

Marker data quality analysis

The SNP markers were tested for reproducibility with a threshold of 95%, call rate of 80%, polymorphism information content (PIC), one ratio and 50% missing values over samples (Allan et al. 2020). Scoring of reproducibility was based on the proportion of technical replicate assay pairs for which the marker score showed consistency. Call rate established the success of reading the marker sequence across the samples and was determined by the percentage of samples for which the score was ‘0’ or ‘1’. PIC is the level of diversity of the marker in the population and revealed the importance of the marker for association analysis. One ratio constituted the proportion of the samples for which genotype scores corresponded to ‘1’.

Data filtering and quality control

Filtering was done on SNP data (supplemental Table S2) to eliminate bad SNPs and genotypes using PLINK 1.9 software. Genotypes with > 30% missing data, SNP loci with > 10% missing data and rare SNPs with < 1% minor allele frequencies (MAF) were excluded. Data filtering was further done using statgenGWAS package in R software to remove duplicate SNPs. After filtering, we retained 276 genotypes and 4329 informative DArTseq SNPs for population structure and GWAS analyses.

Analysis of population structure and genetic diversity

The programme STRUCTURE 2.3.4 (Pritchard et al. 2000) was deployed to investigate the population structure and relationship among 276 common bean accessions belonging to diverse seed types in both the Andean and Mesoamerican genepools. According to Pritchard et al. (2000), the programme uses Bayesian clustering based on molecular marker data to infer the population structure. As an initial step, STRUCTURE was run a single time for each K value ranging from 1 to 10 (Kwak and Gepts 2009). Every run was done using 1000 replicates for burn-in and 3000 during the analysis. Based on a priori knowledge of common bean genepools, we considered the K = 2 analysis to differentiate between Andean and Mesoamerican accessions. Subsequently, five independent runs were conducted using the admixture model and 5,000 replicates for burn-in period length and 50,000 for Markov Chain Monte Carlo (MCMC) repetitions during the analysis. Among the five independent runs, the run with the lowest likelihood value was selected and the accessions with a posterior assignment probability \(\ge\) 0.90 for the Mesoamerican cluster were allocated to the Mesoamerican genepool (Cichy et al. 2015) (supplemental Table S1). Similarly, the accessions with a posterior assignment probability \(\ge\) 0.90 for the Andean cluster were allocated to the Andean genepool. Those with a posterior assignment probability \(<\) 0.90 were considered admixed. Admixed genotypes are equally significant and could aid in uncovering the structure of population in common bean genepools for breeding purposes.

Subsequently, 10 simulations per K value were carried out using K = 1 to K = 10 based on 5000 replicates for burn-in and 50,000 MCMC repetitions during the analysis (Porras-Hurtado et al. 2013). Inference for the number of clusters among the accessions was made based on ΔK (change in the log-likelihood between K values) statistical test (Evanno et al. 2005) using the programme Structure Harvester (Earl and vonHoldt 2012). The graphical presentation of population structure by seed type was developed using the Structure Plot v2.0 software package (Ramasamy et al. 2014). Population structure was further assessed using a Neighbour-joining (NJ) tree method (Saitou and Nei 1987) in TASSEL version 5.3.1 (Bradbury et al. 2007). The NJ tree graphic was modified using FIG TREE version 1.4.0 (Rambaut 2012). Principal Component Analysis (PCA) by means of a correlation matrix was implemented in R environment (R Core Team 2021). Estimates of genetic diversity based on subgroups of common bean comprising genepool, germplasm source, seed type and cultivation status were generated using the Power Marker programme (Liu and Muse 2005).

Genome-wide association study

Best linear unbiased predictions (BLUPs) for each line combined over two environments were calculated using the REML procedure in GenStat software 15th edition (VSN International, Hemel Hempstead, UK) and used for marker-trait associations in the panel. To identify QTLs associated with bean fly resistance, we performed association mapping using genotypic data (supplemental Table S2) and phenotypic data for stem damage score and plant mortality across the years (supplemental Table S3). We used Fast single trait Genome Wide Association Studies (GWAS) following the method described by Kang et al. (2010) and implemented in statgenGWAS R package (van Rossum et al. 2020). A runSingleTraitGWAS was performed in statgenGWAS to accomplish a single-trait GWAS on phenotypic and genotypic data (supplemental Tables S2 and S3). A covariance matrix was computed based on the EMMA algorithm (Kang et al. 2008. A Generalized Least Squares (GLS) approach was used to estimate the markers effect and corresponding P-values. We estimated kinship coefficients based on identity by state (IBS) method option within statgenGWAS package. We used a fixed threshold of −log10 (P value) > 3 at α = 0.05. The equation used in Fast single trait GWAS analysis was as follows:

$$Y = X\beta + Z\mu + \varepsilon$$

where Y is the vector of observed phenotypes, X is the design matrix for fixed effect of the SNP; β is the vector of coefficient of fixed effect, Z is the incidence matrix assigning individuals to the genotypes, μ is the vector of genetic random effects, with variance (μ) = \(\sigma_{g}^{2} K\), where K is the marker based relatedness matrix and ε is the vector of residual (non-genetic) effects.

We used Jbrowse on Phytozome v13 (Goodstein et al. 2012) on P. vulgaris genome sequence v2.1 annotation database from Joint Genomic Institute (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Pvulgaris) to investigate the regions surrounding significant SNPs for associated potential positional gene candidates. We identified possible candidate genes based on window size of 200 kb (maximum ± 100 kb) (Blair et al. 2018; Patishtan et al. 2018; Raggi et al. 2019). The 200 kb window size was based on the average linkage disequilibrium (Blair et al. 2018; Patishtan et al. 2018; Raggi et al. 2019). The functional annotation for the gene was then subsequently identified to postulate the role of the gene in controlling bean fly resistance.

Results

Phenotypic diversity and heritability for bean fly resistance traits

Residual maximum likelihood analysis (REML) revealed significant (P < 0.001) genotype main effect variance-component for stem damage, oviposition/feeding punctures and plant mortality across the years (see supplemental Table S4 for details). Low heritability estimates were recorded for all traits with stem damage having 0.18, Oviposition/feeding punctures 0.22 and plant mortality 0.31 (Table 3).

Table 3 Means and ranges for three agronomic traits measured on common bean genotypes grown at Agronomy Research and Teaching Field for two years at Egerton University in Njoro, Kenya

Phenotypic variation among the test genotypes existed for all the traits evaluated with oviposition/feeding punctures ranging from 10.8 to 21.8 with a mean of 15.9, stem damage from 2.9 to 7.4 with a mean of 5.0 while plant mortality measurements were between 15 and 86% with a mean of 49% (Table 3 and Fig. 1). The distribution of data was symmetrical for all the three traits studied (Fig. 1). Combined data over two years revealed significant bean fly attack where the most susceptible genotypes GBK 035109 and GBK 035352 recorded damage scores of 7.1 and 7.4, respectively and plant mortality of 86% for both traits (supplemental Table S3). Twenty two out of 276 genotypes which represented 8% of the germplasm used in the study, showed resistance to O. spencerella bean fly species with mean stem damage scores of < 4 and mortality of < 30% across the two years (Table 4). The resistant genotypes were similar to the resistant checks G21212, Jesca and AFR 701 which had stem damage scores of 3.3, 3.6 and 3.8, respectively, and plant mortalities of 24%, 26% and 31% across the years. Susceptible check varieties Chelalang, Tasha and Ciankui showed higher number of oviposition/feeding punctures of 18.7, 19.4 and 19.4, respectively, compared to the resistant genotypes indicating that they were visited more by the bean flies. (Table 4 and supplemental Table S3). Ten out of the 22 identified resistant genotypes were members of the Andean genepool, nine were Mesoamerican while three were admixed (Table 4). The resistant genotypes represented varied seed types of major importance in eastern Africa such as red mottled, sugar, pink mottled, red kidney and small red alongside non-commercial seed types. A pattern of reaction was visible among the commercial varieties where the KALRO varieties used as local checks to benchmark performance, showed moderate resistance to moderate susceptibility with stem damage scores ranging from 4.2 to 4.6 compared to EUK varieties which were moderately susceptible with scores ranging from 5.2 to 5.6 (Table 4).

Fig. 1
figure 1

Histograms of phenotypic distribution of evaluated bean fly resistance traits over two years in 2016 and 2017 at Egerton University, Kenya: number of oviposition/feeding punctures per plant, stem damage rated on 1–9 scale and plant mortality on plot mean basis expressed as number of dead plants divided by total number of plants per plot

Table 4 Stem damage score and plant mortality for elite genotypes showing resistance to bean fly with mean stem damage score of < 4 and plant mortality of < 30% alongside checks evaluated under field conditions for two years

Population structure and genetic diversity

The partitioning of the population based on STRUCTURE results (Figs. 2, 3), Neighbour-joining (NJ) tree constructed from genetic distance (Fig. 4) and principal component analysis (PCA) (Fig. 5) revealed two major clusters consistent with Andean and Mesoamerican genepool separation. As implemented in STRUCTURE, for K ranging from 1 to 10, and by making inference based on ΔK of Evanno et al. (2005) as shown in Fig. 2, the 276 genotypes were effectively allocated to Andean or Mesoamerican genepool of origin for K = 2. Based on posterior assignment probabilities of P \(\ge\) 0.90 for pure genotypes and P values \(<\) 0.90 for admixed genotypes (supplemental Table S1), at K = 2, 150, 74 and 52 genotypes fell into Andean, Mesoamerican and admixed clusters, respectively. Over 90% of the local varieties developed by KALRO and EUK were largely assigned to the Andean genepool. On the other hand, the introduced germplasm from ECABREN were assigned equally to both Andean and Mesoamerican genepools with one of introductions showing admixture. Although more than 60% of the landraces originating from GeRRI gene bank based at KALRO, were assigned to the Andean genepool, some landraces tended to exhibit a substantial amount of admixture while below 30% belonged to the Mesoamerican genepool (Fig. 3).

Fig. 2
figure 2

STRUCTURE results with 4329 polymorphic SNP marker loci for 276 common bean genotypes with ΔK specified as a function of K (the number of clusters)

Fig. 3
figure 3

STRUCTURE programme analysis for genetic relatedness of 276 common bean genotypes. K = 2 gives the best separation with green representing the Andean genepool while blue represents Mesoamerican genepool

Fig. 4
figure 4

Principal Component Analysis for common bean genotypes based on 4329 SNP markers

Fig. 5
figure 5

Source of seed presented by colour of the node; black for GeRRI, blue for KALRO/EUK and green for CIAT/ECABREN. Information regarding individual genotypes presented in the NJ tree is available in supplemental Table S1

Neighbour-joining (NJ) tree showing genetic relationship between common bean accessions obtained from the Genetic Resources Research Institute (GeRRI) and breeding programmes within Kenya [Kenya Agricultural and Livestock Research Organization (KALRO) and Egerton University, Kenya (EUK)] and regional CIAT affiliate programme [East and Central Bean Research Network (ECABREN)]. Group of germplasm highlighted in green was composed of bean genotypes belonging to the Andean genepool and wide-ranging seed types, the cluster of genotypes highlighted in blue consisted of the Mesoamerican genepool with diverse seed types while the ones highlighted in red were admixed and also had varied seed types. Tip labels by seed type; Kidney (K), Red (R), Red mottled (RM), Pink (PM) mottled, Sugar (S), Other large (O), Small red (SR), White (W) and Other small (OS).

Evaluation of the population structure based on the principal component analysis (PCA) is illustrated by a 2D scatter plot (Fig. 4). A large proportion of variance present was accounted for by PC1 (46.9%), while only 4.8% of the overall variance was attributed to PC2 (Fig. 4). From PCA, the genotypes were divided into Andean and Mesoamerican subgroups alongside the admixed subgroup similar to STRUCRUE results. Further, the PCA equally isolated the admixed lines from the Andean and Mesoamerican genepools.

Assessment of the population structure within the germplasm using a Neighbour-joining (NJ) tree method similarly uncovered the Andean and Mesoamerican subgroups (Fig. 5). Depending on the origin, genotypes from local breeding programmes alongside those from ECABREN tended to cluster in their distinctive groups. The GeRRI germplasm were distributed among the two clusters, this was probably due to the large diversity presented among the landraces. However, within the genepools, the group membership tended to depend on seed type too. For instance in the Mesoamerican genepool, the white and the small red seeded types were separately pooled into their sub clusters. Similar observations occurred for the Andean genepool in which the pink and the red mottled also fell into separate nodes. The admixed cluster was largely composed of the sugar seed type. A few exceptions to the pattern of population membership of germplasm from similar origin or of same seed type were visible. However, this was possibly attributed to the variance existing in plant materials.

Measures of genetic diversity were performed by grouping the germplasm on basis of the source, genepool, seed type and cultivation status as shown in Table 5. Bean varieties sourced from KALRO and EUK had the highest major allele frequency (0.79) but displayed the lowest gene diversity (0.30), heterozygosity (0.04) as well as low polymorphic information content (PIC) (0.24). Differences were detected for genetic diversity between the genepools where the Mesoamerican genepool (0.21) was more diverse compared to the Andean genepool (0.11). However, the Andean-Mesoamerican admixed group was the most diverse (0.36). A similar trend was observed among the genepools for heterozygosity with Andean-Mesoamerican admixed group showing the highest heterozygosity of 0.14.

Table 5 Estimates of diversity of subgroups of common bean

The most user preferred seed types in Kenya and largely in eastern African were captured within the study germplasm. Generally, the Andean seed types are more preferred over the Mesoamerican seed types (Gepts et al. 1988; Blair et al. 2010). We included the key seed types and from the Andean genepool we had kidney, pink mottled, red, red mottled and sugar while from the Mesoamerican genepool we had the small red and the small white (Table 5). Comparisons within the seed type subgroup revealed the highest genetic diversity within the white seeded type (0.28) belonging to the Mesoamerican genepool while the pink mottled (0.08) belonging to the Andean genepool were the least diverse. The pattern of heterozygosity was similar across the seed types except for the small red (0.13) seed type which was the most heterozygous (Table 5). The land races were more diverse compared to the commercial varieties.

Genome-wide association analysis

Association mapping for stem damage score and plant mortality were carried out to uncover genomic regions significantly associated with bean fly resistance in diverse common bean germplasm. Figures 6 and 7 illustrate the quantile–quantile (QQ) and Manhattan plots, respectively and they explain the results of the performed GWAS (supplemental Table S5). Significant regions associated with O. spencerella resistance within the common bean germplasm were found on chromosomes 1 (Pv01) and chromosome 9 (Pv09) for both stem damage and plant mortality (Figs. 6, 7 and supplemental Table S5). The proportion of variance explained by the significant SNPs ranged from 8.9% for SNP358_3384567 on Pv01 to 18.6% for SNP11732_3378498 on Pv09 for both stem damage and plant mortality, respectively (Table 6). The SNPs mapped to the same regions on the bean genome for both two traits studied due to significant correlation (r = 0.97; P-value < 0.001). Oviposition punctures trait was neither correlated with stem damage (r = 0.025; P-value = 0.677) nor plant mortality (r = 0.039; P-value = 0.514) and did not show meaningful associations. The SNP373_3376280 had the highest logarithm of odds (LOD) scores of 4.25 and 4.29 and P-values = 5.58 × 10–5 and 5.15 × 10–5 for stem damage score and plant mortality, respectively. The SNP373_3376280 was on Pv01, located at 10,560,778 base pairs (bp) physical position and explained 15.6% and 15.2% of the variation for both traits, respectively (Table 6). High LOD scores obtained suggest that the SNP is in linkage disequilibrium with the postulated candidate gene for bean fly resistance and are likely to be inherited together. All the significant SNP marker loci had positive effects except SNP359_3384439 on Pv01 which had a negative effect for both stem damage and plant mortality. Minor allele frequency (MAF) > 1% ranged from 10.7% for SNP373_3376280 on Pv01 to 30.8% for SNP11732_3378498 on Pv09.

Fig. 6
figure 6

QQ and Manhattan plots showing candidate SNP marker loci and P-values from GWAS for bean fly resistance based on stem damage. The dotted line on Manhattan plot is the fixed significance threshold of −log10 (P value) > 3 at α = 0.05

Fig. 7
figure 7

QQ and Manhattan plots showing candidate SNPs and P-values from GWAS for bean fly resistance based on plant mortality. The dotted line on Manhattan plot is the fixed significance threshold of −log10 (P value) > 3 at α = 0.05

Table 6 Genome wide significant associations of single nucleotide polymorphisms (SNPs) with stem damage and plant mortality

Candidate gene discovery

The search of possible candidate genes for the six significant SNPs on both Pv01 and Pv09 using Phytozome (v13) on a window size of 200 kilobase (kb) pairs (maximum ± 100 kb) for plant mortality and stem damage traits revealed a total of 49 different hits (supplemental Table S6). The SNP markers, SNP358_3384567 and SNP359_3384439 on Pv01 on the sequenced fragment of the Phaseolus vulgaris reference genome were in spatial proximity and separated by 92 kb. The proximity of these two SNPs revealed five co-localized genes Phvul.001G071100, Phvul.001G071200, Phvul.001G071300, Phvul.001G071401 and Phvul.001G071500 (supplemental Table S6). Among these genes, Phvul.001G071100 was the most significant hit which was in linkage disequilibrium with SNP358_3384567 and was embedded within the gene. However, SNP359_3384439 was 26 kb upstream of Phvul.001G071500 gene and 11 kb downstream of Phvul.001G071600.

The alignment of SNP373_3376280 on the P. vulgaris reference genome identified Phvul.001G074900 as a potential candidate gene for the marker. The gene is located 12 kb upstream of the SNP encoding a PTHR31717:SF2 - CCT motif family protein. Significant SNP379_3377589 detected on Pv01 is located 12 kb downstream of the candidate gene Phvul.001G076400 which modulates NADH-glutamate synthase.

Among the 14 hits revealed on Pv09, our scrutiny pointed to two meaningful associated flanking genes, Phvul.009G233900 situated 16 kb downstream and Phvul.009G234000 positioned 8 kb upstream of SNP11732_3378498. Phvul.009G233900 encodes PF10533—plant zinc cluster domain while Phvul.009G234000 modulates PTHR31752:SF4—auxin efflux carrier component 2.

Discussion

Enhancing common bean germplasm for resistance to bean fly is a key strategy towards managing this significant insect pest in sub-Saharan Africa region. Bean fly damage causes plant mortality which leads to yield loss of up to 100% under marginal conditions, most of which occur at seedling plant growth stage (Greathead 1968; Abate and Ampofo 1996). Phenotypic evaluation revealed resistant genotypes among the landraces and introductions from East and Central Africa Bean Research Network (ECABREN) (Table 4). The resistance was uncovered from both Andean and Mesoamerican genepools as well as admixed accessions and covered divergent seed types of important bean market classes. Sources of resistance were found in common bean accessions from eastern and central Africa (Kornegay and Cardona 1991; Ojwang et al. 2010; Nkhata et al. 2021b). Introgression of resistance genes into locally adapted materials via marker-enabled selection promises to accelerate genetic gain and reduce the length of breeding time associated with breeding for pest resistance (Kliebenstein 2017; Brzozowski and Mazourek 2020; Nkhata et al. 2020). Further, the existence of bean fly resistance across genepools and seed types provides an opportunity for breeding varieties with potential market demand. Deployment of insect resistance into the production system remains the basis for integrated pest management (Mitchell et al. 2016).

Insects such as bean fly that destroy seedlings are of economic significance even if they occur at low densities (Edwards and Singh 2006). The number of bean flies visiting the beans varied for different genotypes based on the number of oviposition/feeding punctures on the bean leaves (Table 4). The susceptible checks showed higher number of feeding/oviposition pictures compared to the resistant genotypes which indicated they were preferred by the bean flies. Variations in oviposition and feeding punctures indicate preference or non-preference to specific common bean genotypes attributed to antixenosis (avoidance) traits (Cardona and Kornegay 1999). Plant structural and chemical defences can act directly on the herbivorous pests by discouraging the herbivore feeding (antixenosis) (Clement et al. 1994).

Trait heritability alongside selection intensity and phenotypic variance determine the magnitude of genetic gain that can be realized in a selection programme (Moose and Mumm 2008). In selection for herbivorous insect pest resistance such as bean fly, plant breeders are challenged mainly by long breeding cycles, low heritability and genetic architecture of inheritance which is largely polygenic and inadequate knowledge on mechanisms for resistance (Kliebenstein 2017). However, successful breeding for resistance to insect pests has been in most cases confined to instances where heritability is high and the genetic architecture is oligogenic, as reported in the case of gene-for-gene interactions in some insect pest belonging to diptera and hemiptera orders of the class insecta (Stuart 2015). Improving heritability is rarely tenable when relying upon natural pest infestation in evaluation for resistance due to stochastic fluctuations in pest density or confounding effect arising from pest behaviour. Low heritability values ranging from 0.18 to 0.31 (Table 3) obtained in the current study possibly arose from high genotype-by-year interaction variance component due to natural fluctuation of bean fly populations across the years. The bean fly infestation was higher in 2016 compared to 2017 (supplemental Table S3). Low heritability estimates were also reported across environments by Badji et al. (2020) in their study to detect genomic regions controlling resistance to multiple pests in maize (Zea mays L.). Low heritability estimates were also obtained for resistance to the striped cucumber beetle (Acalymma vittatum) of squash (Cucurbita pepo) (Brzozowski and Mazourek 2020). Uniform spread in insect population density would improve if phenotypic evaluation is conducted using artificial infestation method under cages covered with insect proof nets. However, it would be necessary to have an efficient mass raring technique in place to enable a high throughput system for generation of an adequate number of insects to be used for evaluation of a large number of genotypes. However, an effective method for rearing bean fly is currently lacking.

Knowledge of genetic diversity within the germplasm alongside the underlying genetic basis of bean fly resistance is key to the exploitation of germplasm to develop suitable varieties. The pattern of genetic diversity identified in this study concur with previous reports from eastern and central Africa which grouped bean germplasm into Andean and Mesoamerican subgroups with most of the genotypes falling into the Andean genepool (Figs. 2, 3, 4, 5) (Blair et al. 2010; Wilkus et al. 2018; Nkhata et al. 2020) except for studies by Asfaw et al. (2009) which reported germplasm originating from Ethiopia as predominantly of Mesoamerican genepool. The significance of the Andean market classes of beans in sub-Saharan Africa is attributed to the high value attached to household consumption beside commercialization (Cichy et al. 2015).

The Mesoamerican genepool displayed greater diversity compared to the Andean genepool, however the admixed group was the most diverse (Table 5). Previous studies specified similar patterns of genetic diversity with Mesoamerican genotypes exhibiting superior diversity compared to Andean set of germplasm (Beebe et al. 2001; Blair et al. 2010; Cichy et al. 2015). Mesoamerican seed types had higher gene diversity relative to Andean market classes. However, sugar seed type was the Andean market class with greater diversity. This is possibly because, most of the genotypes with sugar seed type were admixed as revealed by the NJ tree in Fig. 5. Landraces were more diverse compared to commercial varieties. Adoption of improved varieties has led to the loss of traditional varieties. However, despite a number of farmers opting to grow new varieties, some farmers still grow landraces and this is owing to their adaptive advantage (Ojwang et al. 2009), hence considerable genetic diversity still exists. Studies by Wilkus et al. (2018) suggested that rural households have maintained the use of some of the traditional varieties in their production systems because of preferred attributes leading to conservation of useful genetic diversity. Landraces are major repositories for important sources of novel QTLs for resistance against biotic stresses including insect pest and disease (Ojwang et al. 2010; Ddamulira et al. 2014).

Novel sources of resistance to O. spencerella were equally distributed between Andean and Mesoamerican genepools. Ten resistant genotypes belonged to the Andean genepool, nine belonged to the Mesoamerican genepool while only three admixed genotypes out of a total of 22 were resistant (Table 4). Notably, an introduced genotype G21212, a Mesoamerican landrace from CIAT was resistant to O spencerella. Previous work identified G21212 as having resistance to another significant bean fly species O. phaesoli (Mushi and Slumpa 1996; Ojwang et al. 2010). Such multiple resistance is important to common bean improvement for resistance to the two predominant bean fly species in the eastern and southern Africa. Local varieties expressed mixed reaction to O. spencerella (Table 4) with KALRO varieties showing moderate resistance and EUK varieties largely expressing moderate susceptibility. In contrast, studies by Kiptoo et al. (2016) classified a number of these varieties including Tasha, Chelalang and Ciankui as resistant, however, the species of bean fly involved was not specified in their study. Differential reaction of a given genotype to bean fly could result from environmental changes due to year or location effect (Songa and Ampofo 1999). The varieties Tasha, Chelalang and Cinakui were developed by EUK bean breeding programme but had not been selected for bean fly resistance. Sources of resistance identified should provide useful genetic material for future breeding to improve commercial varieties. The resistant Mesoamerican and admixed seed types may be used as a channel to move useful resistance genes into the Andean seed types. Good-by-good crosses using identified resistant Andean genotypes of acceptable seed types can quicken the development and selection of recombinant inbred lines for future deployment into the bean production system. This is because Andean bean seed types are already in high demand in eastern Africa region and therefore introducing resistant varieties would reduce yield losses arising from bean fly.

From GWAS analysis, we explored the possibility of existence of candidate genes involved in conferring bean fly resistance based on the physical position. Candidate gene Phvul.001G071100 which encodes non-specific serine, threonine protein kinase and threonine-specific protein kinase was aligned to SNP358_3384567 marker. This gene regulates protein phosphorylation that plays a role in cell growth and development. SNP373_3376280 marker was located 12 kilobases (kb) downstream of Phvul.001G074900 Arabidopsis thaliana gene orthologue. Tagged candidate gene Phvul.001G074900 is CCT domain-containing gene that encodes transcription co-factors which are the major genetic bases that modulate flowering time (Li and Xu 2017; Diaz et al. 2020) but it is postulated to have pleiotropic effects on plant responses after repeated exposures to biotic and abiotic stress conditions (Li and Xu 2017). Significant SNP379_3377589 detected on Pv01 was located 12 kb downstream of the candidate gene Phvul.001G076400 which modulates NADH-glutamate synthase. It is a gene involved in controlling defense responses during symbiotic interactions and maintaining cell integrity of the uninfected root nodule cells during nodulation in common bean (O’Rourke et al. 2014). Phvul.009G234000 positioned 8 kb upstream of SNP11732_3378498 was the most meaningful candidate gene associated with the marker. It modulates auxin efflux carrier component 2 and belongs to a major class of transport proteins which are involved in root-specific auxin transport whose particular localization suggest a role in the translocation of auxin towards the elongation zone (Mi et al. 2019). Tagging of conceivable additional resistance loci on other chromosomes would be vital for future gene pyramiding. For improvement, higher mapping resolutions can be attained by increasing the number of genotypes for the study combined with an increased number of molecular markers. The results obtained in this study suggest the possible role of the six discovered candidate genes in bean fly resistance. The marker loci discovered and the associated candidate genes suggest the possibility of deploying marker-enabled breeding approach to bean improvement for possible pyramiding of genes for resistance to bean fly along with yield and yield related traits.

Conclusions

The study uncovered the existing sources of resistance to bean fly from a diverse set of bean genetic resources in both Andean and Mesoamerican genepools and associated seed types of major importance in eastern Africa. These sources of resistance will be essential for future breeding efforts aimed at developing bean fly resistant varieties depending on prevalent bean fly species and user preferences for given seed types in different regions within target countries. Besides maintenance of bean diversity, it will be possible for introgression of resistance genes into adapted cultivars. The GWAS results and the presumed location of bean fly resistance genes discovered on Pv01 and Pv09 will potentially have implication in future breeding efforts and will provide opportunity for application of marker-enabled common bean improvement. The physical locations and the candidate genes suggested in the current study will serve as a basis for developing functional markers for potential deployment in breeding. Due to problem of low heritability estimates obtained in this study, we suggest that apart from having a sound experimental design, plant breeders in collaboration with entomologists to implement a robust artificial insect raring and screening techniques for bean fly resistance phenotyping. Evaluation and selection based on correlated traits such as chemical or physical plant defenses may results in comparatively higher heritability estimates which could translate to improved genetic gains through indirect selection.