Introduction

Cowpea [Vigna unguiculata (L.) Walp.] is a key legume in the semi-arid regions of Sub-Saharan Africa (SSA) because of its significant contribution to food and nutritional security in the region. The crop provides a cheap source of quality protein and minerals to both rural and urban communities in Africa (Ajeigbe et al. 2012; Dube and Fanadzo 2013). The grains and leaves are both good sources of protein ranging from 21 to 33% and from 27 to 43%, respectively (Ahenkora et al. 1998; Boukar et al. 2011; Ddamulira et al. 2015). Cowpea predominance in the dry zones of Africa is attributable to its inherent drought tolerance and capability to grow in marginalized soils where other crops fail (Ehlers and Hall 1997; Ewansiha and Singh 2006; Agbicodo et al. 2009; Hall et al. 2010; Fatokun et al. 2012). In the dry savannas of West Africa, cowpea is regarded as a dual purpose crop providing both human food and animal fodder (Singh et al. 2003; Kamara et al. 2012). Additional attractiveness of cowpea is seen in its ability to fix nitrogen in the soil, making it a key component of the traditional intercropping systems (Kyei-Boahen et al. 2017). A recent report also revealed cowpea’s medicinal properties, particularly anti-cancer, anti-hyperlipidemic, anti-inflammatory and anti-hypertensive properties (Jayathilake et al. 2018). These unique properties make cowpea a focus crop with potential to curb both the dynamic climate and malnutrition challenges in SSA.

Cowpea is largely produced and consumed in west and central Africa, with Nigeria leading the production at a rate of 2.14 million metric tonnes annually (FAOSTAT 2017; Boukar et al. 2018). However, farmers in west Africa have not been able to exploit the crops’ yield potential, given that the average grain yield is about 492 kg/ha compared to a possible yield of between 2,000 and 3,000 kg/ha demonstrated on experimental station (Carsky et al. 2001; Agbicodo et al. 2009; Ahmad et al. 2010; Boukar et al. 2013, 2018). The production and consumption of cowpea is challenged by numerous biotic and abiotic factors including insects, diseases, parasitic weeds, extreme and intermittent water and heat stresses (Agbicodo et al. 2009; Boukar et al. 2013, 2018; Togola et al. 2017).

Concerted efforts are being placed on cowpea to boost its productivity including deployment of modern quantitative genetics and genomic tools (Ehlers et al. 2012; Boukar et al. 2016, 2018). These are expected to accelerate the rate of genetic gain, allowing farmers to benefit from the full genetic potentials of the crop. Additionally, the need to meet consumers’ demand has revolutionized breeding, now requiring breeding for clearly defined product targets and profiles (Ragot et al. 2018). Grain yield, fodder potential and maturity duration are key components of each product target among other traits. Consequently, breeders may have to create and parallelly manage multiple populations of genetic materials in the breeding programs to suit specific product targets. Breeding lines emerging from several crosses may be fragmented based on maturity groups or other traits. In cases where multiple breeding sets are created, it is important to understand the genetic potentials of each set of materials or populations in terms of genetic variability and expected genetic advance for key product traits like grain yield to warrant continued investment in advanced testing across the target environments (Allier et al. 2019). The approach to define the usefulness or the genetic worth of a set of genetic materials or a cross has been described (Bernado 2010; Allier et al. 2019) and the concept has been largely applied in maize breeding to identify the best populations for extraction of superior inbred lines (Tabanao and Bernardo 2005). In this approach, the genetic usefulness (U) of a population for a given quantitative trait is determined by its mean (μ) and expected genetic gain (iHσp) as follows: U = μ + i*H*σp where i is the selection intensity which depends on the selection pressure, σp is the phenotypic standard deviation, and H is the broad sense heritability (Tabanao and Bernardo 2005; Bernado 2010). For instance, mean and genetic variance components of grain yield and other traits were deployed to dissect the usefulness of nine (6 synthetic and 3 F2) maize populations (Fountain and Hallauer 1996). In cowpea and soybean, Meenatchi et al. (2019) and Johnson (1955) exploited the genetic variability parameters: phenotypic coefficient of variation (PCV), genotypic coefficient of variation (GCV), broad sense heritability (H) and genetic advance as a percentage of mean (GAPM) for grain yield and component traits to understand the extent of genetic variability using F2 populations, although the usefulness criterion was not used. Two early generation populations of cowpea were examined based on genetic variance, heritability and genetic advance expressed as a percentage of mean to gauge the degree of genetic variability for grain yield and fodder traits (Kumar et al. 2017; Dinakar et al. 2018). However, when dealing with multiple populations, a combination of the means and genetic advance becomes handy to ease decisions in choosing the best sets of materials to advance in the breeding program (Schnell and Utz 1975; Tabanao and Bernardo 2005; Bernado 2010). The use of these genetic parameters is key in predicting the genetic worth of different sets of breeding populations and therefore reinforcing the decisions to focus resources for advanced testing on lines from populations with high genetic value. The objective of the present study was to decode the genetic potential of eight sets of cowpea breeding materials evaluated in preliminary yield trials to ascertain effective extraction of the best lines for further testing in advanced yield trials and/or for recycling as parents in the hybridization nursery. The study exemplified an effective use of quantitative genetic concepts to make selection decisions in a breeding program.

Materials and methods

Site description

Field experiments were conducted during the 2019 cropping season in 2 locations at IITA experimental farms in Minjibir, Kano State, Nigeria, and at the National Animal Production Research Institute (NAPRI), Shika, Kaduna State, Nigeria (Table 1). Minjibir (12° 08.997 ′N, 8° 39.733′ E) is in the Sudan savanna agroecology. The area has one wet season which commences in May/June, ending in October, with mean annual rainfall of about 674 mm and annual temperature range of 26–32 °C. Shika (11° 15′N, 7° 32′E) is in the Northern Guinea Savanna agroecology, in the sub-humid zone of Nigeria. The zone has a unimodal wet season which begins in April/May and finishes by mid-October, with average annual rainfall of 1050 mm. Maximum temperature in Shika during the cropping season varied between 27 and 35 °C. Fertilizer was applied in both fields at a rate of 100 kg of NPK (15–15–15) per ha.

Table 1 Descriptions of experiements and sites used for evaluation of 614 cowpea lines, grouped into 8 sets, in year 2019 across two location in Northern Nigeria

Plant genetic materials

Sets of lines belonging to eight cowpea populations intended for preliminary yield tests (PYT), derived from multiple crosses in the breeding program and targeting different product profiles were used in this study. The crossing structure, pedigrees and agronomic features of parental lines are presented in Supplemental File 1. The creation of the multiple sets of test lines was based on maturity duration meant to suit different agro-ecolozies in cowpea growing corridors of Northern Nigeria. Consequently, the sets were categorized as: extra early and early maturity targeting the short duration production in the Sahelian and Sudan Savanna zones of West Africa, Medium and late maturity groups meant for the Medium and late duration product profiles suitable for the Guinea Savanna zone of West Africa. These maturity groups in addition to striga resistance status of the lines gave rise to the eight sets used in the present study. Smarmily, the sets were created by making several bi-parental crosses using specific elite parents per maturity group; that is, two sets for short duration group: Prelim7 and 10, two sets of medium duration group: Prelim2 and 5, and three sets of late duration group: Prelim2, 3 and 8. The crosses generated F1s that were self-pollinated and between 200 -300 F2 derived lines per set were advanced by single seed descent (SSD) until F5 generation. At this stage lines were planted in a striga infested observation plot and susceptible lines within each set were dropped and resultant sets of F6 genotypes belonging to the different maturity groups were then used in the present study (Supplemental File 1). Included in the study is an extra early duration set of F6 lines referred to as Prelim11 that came from the inter-mating of eight parents. The sets had variable population sizes ranging from 60 to 90 and totaling to 614 genotypes (Table 1). Additionally, the crosses producing the eight sets of genetic materials involved parental lines capturing key traits of focus in the breeding program: High grain yield potential, large seed size, varying maturity (extra-early, early, medium and late), striga resistance, bacterial blight resistance and aphid resistance. The populations were developed by the cowpea breeding program over a period at the International Institute of Tropical Agriculture (IITA), Kano Station, Nigeria.

Experimental layout

At both Minjibir and Shika experimental sites, the eight populations were laid out as separate experiments in one mega experimental field per location. Materials were planted on ridges spaced at 0.75 m apart, with 0.2 m hill spacing within row. All experiments consisted of four rows per plot, each measuring 4 m long, arranged as an alpha lattice design, with two replications per experiment and the number of incomplete blocks within a replication varied depending on the number of lines within each of the eight populations (Table 1). The experiments at both locations were planted at varying dates in between June and August 2019 depending on suitable cropping period of the location (Table 1).

Data collection

Plant stand was determined two weeks after seedling emergence and at harvest. Date to 50% flowering (DT50FL) was recorded when 50% of plants in the middle two rows in a plot had flowered and the number of days were computed with reference to the planting date. At maturity, the middle two rows in a plot were harvested, threshed and weighed to obtain grain yield (GY) in grams per plot. The grain yield per plot was then converted to kilograms per hectare (kg/ha), considering the spacing and the plot length. Seed samples were taken from each plot and used to generate the one hundred seed weight (HSDWT) data, measured in grams.

Data analysis

Traits distribution

The R statistical software, version 3.5.2 (R Core Team 2018) was used to generate and summarize a graphical visualization using box plots and histograms of traits distribution within and between populations. The means from two locations were used to generate the box plots for the sets while the histograms were generated using individual plot data for the two locations. Scripts used have been provided in Supplemental File 2.

Mean squares

Analyses of variances (ANOVA) were performed in two steps; first with merged data of all sets, across two locations to assess differences between sets, and second for each population independently to assess variances within the sets. The following models were implemented in R using agricolae and lme4 packages (Bates et al. 2015; Mendiburu 2020) to obtain mean squares (MS), coefficient of variations (CV) and standard errors of means for the traits:

  • (a) Between set Model \(P_{ijkh} = \mu + set_{i} + l_{j} + \left( {set*l} \right)_{{ij{ }}} \, + \,set\left( g \right)_{{ik{ }}} \,+ \,(set\left( g \right)*l )_{{ijk{ }}} + pooled error\) Where \({P}_{ijkh}\) is the observed value of the ith genotype in the jth location, \(\mu\) is the general mean,\({set}_{i}\),\({g}_{i}\),\({l}_{j}\),\({(g*l)}_{ij}\), \({set(g)}_{ik}\) and \(({set\left(g\right)*l )}_{ijk}\) represent the effects of the genotype, location, the interaction between genotype and location, the effect of genotypes nested within sets and the interaction between genotypes within set by location effect respectively. The between sets ANOVA was performed on a cell mean basis and later converted on a plot basis by multiplying the MS by a common factor\(n=p/\sum (1/{r}_{i})\), and the pooled error inserted in the ANOVA was estimated from the experimental error mean squares (EMS) of the individual trials as: \(\sum (r*EMS)/\sum r\) (Cochran and Cox 1957). In both expressions mentioned above, \({r}_{i}\) is the number of replications in each trial and \(p\) is the number of trials. The approximate degree of freedom for the poled error term was obtained following the Welch–Satterthwaite equation: \({(df \approx (\sum {k}_{i}{EMS}_{i})}^{2}/\sum {(({k}_{i}{EMS}_{i})}^{2}/{v}_{i})\) where\({k}_{i}=1/{(v}_{i}+1)\), \({v}_{i}\) is the error degree of freedom of individual trials and \({EMS}_{i}\) is the error mean square of individual trials (Satterthwaite 1946). When conducting F-tests, the denominator term for Set was Set*Loc, while \(set\left(g\right)*l )\) was used as a denominator term for the following factors: Loc, Set*Loc and Set(Geno). The pooled error MS was used as a denominator F-test for the Set(Geno)*Loc term.

  • (b) Within set Model \(Ps_{ijkh} = \mu + g_{i} + l_{j} + l\left( r \right)_{jk} + (l\left( {r\left( b \right)} \right)_{jkh} + \left( {g*l} \right)_{{ij{ }}} + e_{ijkh}\) Where \({P}_{ijkh}\) is the observed value of the ith genotype in the jth location, \(\mu\) is the general mean, \({g}_{i}\), \({l}_{j}\), \({l(r)}_{jk}, {(l(r\left(b\right))}_{jkh}\) and \({(g*l)}_{ij}\) represent the effects of the genotype, location, replication nested within location, block and replication nested within location, and the interaction between genotype and location respectively;; and \({e}_{ijkh}\) is the residual effect. The denominator F-test for Loc, Loc(Rep) and Loc(Rep (Block) were \(l(r)\), \(l(r\left(b\right)\) and EMS respectively while lattice effective error (LEE) was used as a denominator test for Geno and Geno*Loc. The LEE was obtained from the standard error of the mean (SEM) estimates of the Geno*Loc term as: \(LEE={n*SEM}_{g*l}^{2}\) where \(n\) is the number of values used to estimate the Geno*Loc means which is equal to the number of replications in this case. The R scripts used for these analyses are provided in Supplemental File 2.

Variance components

To obtain variance components within each set, a linear mixed model (lmer) function in R was implemented using lme4 package (Bates et al. 2015). Variance components for the major sources of variation were estimated as;

  • Error variance\(\left( {\sigma _{e}^{2} } \right){\mkern 1mu}={\mkern 1mu} MS_{e}.\)

  • Genotype × location variance component\(\left( {\sigma^{2}_{G \times L} } \right)\, = \,(MS_{G \times L} - MS_{e} )/r.\)

  • Genotypic variance component \((\sigma^{2}_{G} )\, = \,[(MS_{G} {-}r \, \sigma^{2}_{G \times L} )]/(r\, \times \,l).\)

  • Phenotypic variance \(\left( {\sigma^{2}_{p} } \right)\, = \,\sigma^{2}_{G} \, + \,\left( {\sigma^{2}_{G \times L} } \right)/l\, + \,\left( {\sigma^{2}_{e} } \right)/r*l.\)

Where, MSG, MSG × L and MSe are the respective mean squares for genotypes, genotype × location interaction and the error, while r is the number of replications and l is the number of locations.

Genotypic and phenotypic variability

The extent of dispersion or the degree of variability within each breeding set was estimated using the formula proposed by (Johnson 1955) as;

Genetic coefficient of variation \(\left( {GCV} \right)\, = \,[\left( {\sigma^{2}_{G} } \right)/\mu ]\, \times \,100.\)

Phenotypic coefficient of variation\(\left( {PCV} \right)\, = \,[\left( {\sigma^{2}_{P} } \right)/\mu )]\, \times \,100\); Where; \(\mu\) is the grand mean.

Broad sense heritability (H2), was computed from the variance components, expressed on an entry mean basis as:

$$H^{2} = \sigma^{2}_{G} /[(\sigma^{2}_{G} + \, (\sigma^{2}_{G \, \times \, L} )/l \, + \, (\sigma^{2}_{e} )/r \, \times \, l]$$

where σ2G, σ2G × L and σ2e are variance components for genotype, genotype x location interaction and the error respectively while r and l are number of replications and locations respectively.

Genetic advance and usefulness

Expected genetic advance (GA) and genetic advance expressed as a percentage of the mean (GAPM) for each trait was computed according to (Allard 1960) as;

$$G_{A} = \, k_{i} x \, H^{2} x\sigma_{P} ; G_{APM} = \, (G_{A} /) \, \times 100$$

where, ki is a standardized selection differential (assuming 10% selection intensity for prediction of genetic advance, H2 is the broad sense heritability, σP is the phenotypic standard deviation, and \(\mu\) is the grand mean.

The genetic worth or usefulness (U) of each population was then estimated based on the mean and genetic advances according to (Schnell and Utz 1975; Tabanao and Bernardo 2005; Bernado 2010) as: U = \(\mu\)  + GA.

Where, U is the genetic usefulness of a population, \(\mu\) is the mean of the population and GA is the expected genetic advance.

Principle component analysis (PCA)

Three parameters namely, yield, seed weight and days to 50% flowering were used to conduct PCA on the sets of breeding lines in R using vqv/ggbiplot package developed by Vincent (2011). PCA scores of the three variables namely GY, HSDWT and DT50F were generated and used to determine the contribution of each variable to the total variations within and among the sets. PCA plots were then generated to visualize the scatter pattern of sets and genotypes within sets along the X and Y axes.

Results

Traits distribution

The frequency distributions of lines in each population according to traits are presented in Fig. 1 and Supplemental Fig. 1. The box plots revealed different levels of dispersion within each breeding set with Prelim5 being the most variable set with high median GY, followed by Prelim10 and Prelim1, while Prelim11 had the least dispersion and the lowest median GY (Fig. 1a). The median seed weights (HSDWT) were within close ranges for most of the breeding sets, although Prelim8 stood out with the highest values while Prelim11 had the lowest (Fig. 1b). Prelim11 was earlier than other sets with median DT50FL of about 45 days while Prelim8 took more than 50 days on average to flower (Fig. 1c). The depictions from the histograms showed variations for grain yield, 100 seed weight and days to 50% flowering within the eight sets of breeding materials thus portraying continuous distributions typical of quantitative traits (Supplemental Fig. 1). The eight breeding sets responded uniquely to the environments based on their performances for GY, HSDWT and DT50FL, with each breeding set showing differential performances (high or low) between the two locations as depicted in the individual location boxplots presented in Supplemental Fig. 1.

Fig. 1
figure 1

Phenotypic distributions: Box plots showing the dispersion quartiles within each of the eight sets of advanced breeding materials. a Grain yield (GY). b 100 seed weight (HSDWT). c Days to 50% flowering (DT50FL), generated using means from two locations. Histograms reflecting the distributions within each breeding set and boxplots for individual location dispersions are presented in Supplemental Fig. 1

Classification of breeding sets

Results of PCA conducted among and within breeding sets are presented in Fig. 2 and Supplemental Fig. 2. In general, PCA has showed diversity both within and between breeding sets based on GY, HSDWT and DT50FL, with PC1 and PC2 between sets accounting for 91.2% of total variation in the data. PCA showed the three traits (GY, HSDWT and DT50FL) to be distinct enough and provided good discrimination among and within the breeding sets. For variation among sets, PC1 was strongly associated with HSDWT (PC1 = 0.65) and DT50FL (PC1 = 0.73) and therefore, Prelim sets with high positive scores for PC1 were promising for these two traits, while PC2 was correlated with GY (PC2 = -0.90) hence, sets with high negative scores for PC2 were good for GY. When the data was grouped sequentially by each trait, clusters of breeding sets with potential for GY, HSDWT and DT50FL became apparent. Prelim1, 5 and 10 clustered in a group with GY above 1,200 kg/ha while Prelim11 was alone in the low yielding category (< 1000 kg/ha), the rest being intermediate (Fig. 2a). For HSDWT, Prelim7, 2, and 8 were categorized as having seed weights above 15.9 g, other sets being between 15.0 and 15.9 g, while Prelim10 had a mean HSDWT of less than 15 g (Fig. 2b). PCA for DT50FL revealed two groups with Prelim7,10 and 11 falling in the early flowering category with less than 48 days while the rest of the sets were categorized as flowering later than 48 days after planting (Fig. 2c).

Fig. 2
figure 2

Principal component anaysis showing the diversity between and within eight sets of cowpea lines clustered based on grain yield (GY), 100 seed weight (HSDWT) and days to 50% flowering (DT50FL). a PCA cluster grouped by grain yield and highlighting sets with the highest (GY > 1,200 kg/ha), intermediate (1,000 < GY < 1,200 kg/ha) and the lowest (GY < 1,000 kg/ha). b PCA cluster grouped by seed weight and highligting sets with the highest (HSDWT > 15 g), intermediate (HSDWT = 15 g) and the lowest (HSDWT < 15 g). c PCA cluster grouped by days to 50% flowering and highlighting sets with early (DT50FL < 48 days) and late DT50FL > 48 days) maturing durations. d PCA cluster for one of the breeding sets (Prelim1) grouped by grain yield showing diversity within the set and highligting genotypes that are high, intermediate and low yielding within the population. The arrows pointing to the variables (GY, HDSWT and DT50FL) indicate the direction of traits contribution to variation explained by PC1 and PC2. At the top right corner of each PCA plot, are PC scores for the three variables, reflecting the magnitude of contribution of each trait to the variations explained by PC axes. In plots a, b and c the DT50FL accounted for most of the variation explained by PC1 while GY explained a greater proportion of variance on the PC2 axis. In plot d, GY and HSDWT accounted for most of the variance on the PC1 axis while DT50FL was associated with variation on the PC2 axis

When we performed PCA within each of the eight sets, it was clear that potentially high yielding genotypes (> 1500 k/ha) that overlap with high seed weight and earliness could be extracted from these populations (Fig. 2d and Supplemental Fig. 2). Except for Prelim7 and 10, GY was strongly correlated with PC1 and accounted for most of the variation among genotypes within sets, therefore, genotypes with high positive scores on PC1 axis were good for GY. For Prelim7 and 10, HSDWT and DT50FL contributed most to the variations explained by PC1. The PCA results further revealed that despite the genotypes being clustered as the best performers for a particular trait, there were still enough diversity among genotypes within each cluster (Fig. 2d and Supplemental Fig. 2). A summary chart for the proportion of best genotypes that could be extracted from each set for GY, HSDWT and DT50F is presented in Fig. 3. It was evident that no genotype with GY above 1500 kg/ha could be obtained from Prelim11 while Prelim5 had the highest number of high yielding genotypes (Fig. 3a). For HSDWT, Prelim7 had a higher number of genotypes with seed weight above 20 g compared to other breeding sets (Fig. 3b). Genotypes with less than 45 DT50F were frequent in Prelim11 while all the genotypes in Prelim3 flowered later than 45 days (Fig. 3c). When the three traits were considered together, more genotypes combining GY > 1,200 kg/ha, HSDWT > 15 g and DT50FL < 45 days could be extracted from Prelim 5 than in other sets (Fig. 3d).

Fig. 3
figure 3

Proportion of best performing genotypes within breeding sets. Bar chats show the number of genotypes out of the population size of each set with; a Grain yield (GY) above 1,500 kg/ha. b Hundred seed weight (HSDWT) above 20 g. c Days to 50% flowering (DT50F) less than 45 days. d combination of the three traits (GY ≥ 1,200 kg/ha, HSDWT ≥ 15 g and DT50FL ≤ 45 days)

Variance between and within breeding sets

Analysis of variance depicted the eight breeding sets not to be statistically different for GY, HSDWT and DT50FL, indicated by non-significant mean square values for sets (Table 2). However, numerically, Prelim5 and 8 had higher GY and HSDWT respectively than others. The mean values showing numerical performance of the breeding sets in each of the two locations have been presented in Supplemental Table 1. When we considered genotypes nested within sets, the genotypic differences were highly significant (P < 0.001) for all the three traits (Table 2). In addition, the effect of location was highly significant for all the three traits (P < 0.001) and consequently, the overall responses of the breeding sets were highly influenced by environment as portrayed by significant Set-by-Location interaction effects for all the three traits (Table 2). The effects of genotypes nested within breeding sets were also statistically significant at P < 0.001) for all the three traits, signaling the apparent difference among genotypes within the sets (Table 2). However, the interaction between nested effect of genotype within set and location was highly significant suggesting the presence of genotype-by-environment interaction.

Table 2 Mean squares among eight sets of advanced breeding materials for grain yield (GY), hundred seed weight (HSDWT) and days to 50% flowering (DT50FL), evaluated across two locations in Northern Nigeria during the rainy season of year 2019

When mean squares for variation within breeding sets were compared, significant genotype effects were observed for GY, HSDWT and DT50F with at least a probability value of P ≤ 0.05 for all the eight breeding sets (Table 3 and Supplemental Table 2). Location effects were significant, with at least P ≤ 0.05 for most of the traits with exceptions in some breeding sets which shown no statistical significance (Supplemental Table 2). The interactions between genotypes and location were also highly significant in all the eight breeding sets for all the three traits (Table 3 and Supplemental Table 2).

Table 3 Mean squares within eight sets of advanced breeding materials for grain yield (GY), hundred seed weight (HSDWT) and days to 50% flowering (DT50FL), evaluated across two locations in Northern Nigeria during the rainy season of year 2019

Partitioning of variance within breeding sets

When variance components were computed for GY, Prelim2 (δ2G*L = 25,658; δ2G = 45,897) and Prelim10 (δ2G*L = 6,231; δ2G = 15,848) displayed low magnitude of variance due to genotype-by-Location interaction relative to the genetic variance component (Table 4). Consequently, Prelim2 recorded the highest genotypic coefficient of variations (GCV = 19.58%) for GY, followed by Prelim7 (GCV = 17.94%), Prelim5 (GCV = 17.38%), Prelim8 (GCV = 17.36%) and Prelim1 (GCV = 14.37%). However, Prelim; 3, 10 and 11 had low genotypic and phenotypic coefficient of variations (Table 4).

Table 4 Variance component estimates and proportion of genetic and phenotypic variability for grain yield (GY), 100 seed weight (HSDWT) and days to 50% flowering (D50FL) within eight sets of advanced breeding materials evaluated in 2019 at two locations in Northern Nigeria

For 100 seed weight, all the breeding sets showed a generally lower magnitudes of variance attributed to Genotype-by-Location interaction (δ2G*L) relative to the respective genotypic variances (δ2G), with Prelim 11 (δ2G = 7.137 vs δ2G*L = 0.39) having the highest genetic variance component (Table 4). Breeding sets with high genotypic variability for seed weight included Prelim11 (GCV = 16.33%; PCV = 17.97%), Prelim10 (CGV = 15.32%; PCV = 16.37%), and Prelim2 (GCV = 14.72%; PCV = 14.72%). The rest of the breeding sets were intermediate while Prelim8 had the lowest variability for seed weight with GCV and PCV of 10.38% and 11.87% respectively (Table 4).

The partitioning of variance within sets for DT50FL revealed variable magnitudes of variances attributed to genetic components, with Prelim1 (δ2G = 4.52 vs δ2G*L = 3.79) and Prelim5 (δ2G = 4.54 vs δ2G*L = 6.56) having high values of genetic variance components relative to the Genotype-by-Location interaction terms (Table 4). Breeding sets that showed high genotypic and phenotypic variability for days to 50% flowering included Prelim1 (GCV = 4.43%; PCV = 5.75%), Prelim5 (GCV = 4.39%; PCV = 6.27%), Prelim3 (GCV = 4.20%; PCV = 5.04%) and Prelim10 (GCV = 4.08%; PCV = 4.53%). Breeding set with the lowest genotypic variability for HSDWT was Prelim7 which had a GCV of 2.16% (Table 4). For all the traits and breeding sets, the differences between the two parameters (PCV and GCV) were minimal, yet by judging from the standard deviation of genetic variance components, these differences are significant.

Genetic advance and usefulness of breeding sets

Results for expected genetic advance within the eight breeding sets computed based on broad sense heritability and assuming 10% selection intensity are present in Table 5. Heritability for grain yield computed on an entry mean basis ranged from 0.21 for Prelim3 to 0.57 for Prelim2. Overall, the expected genetic advance (GA) for GY was more dependent on heritability than on genetic variance. When GA for grain yield was expressed as a percentage of population means (GAPM), Prelim2 emerged with the highest percentage of expected genetic advance (GAPM = 24.59%; GA = 269.05Kg/ha). This was followed by Prelim7 (GAPM = 21.84%; GA = 239.91Kg/ha) and Prelim5 (GAPM = 19.77%; GA = 251.27Kg/ha). Other intermediate breeding sets were Prelim8 (GAPM = 17.91%; GA = 211.12Kg/ha) and Prelim1 with GAPM and GA of 14.21% and 175.83Kg/ha respectively while Prelim3 had the lowest GAPM (Table 5).

Table 5 Predicted genetic advance and genetic usefulness of eight sets of advanced breeding materials evaluated in 2019 at two locations based on grain yield (GY), hundred seed weight (HSDWT) and days to 50% flowering (DT50F)

Consequently, when usefulness criterion was used to compare breeding sets based on grain yield, most of the sets that had high percentage of genetic advance also recorded high usefulness (Up) values (Table 5). For instance, Prelim5 had the highest Up of 1522.33 kg/ha, flowed by Prelim1 (Up = 1413.54 kg/ha), while Prelim11 (Up = 874.13Kg/ha) and Prelim3 (Up = 1110.93Kg/ha) were the least useful sets for grain yield (Table 5). For HSDWT, the heritability values were relatively high across all breeding sets in the range of 0.76 for Prelim8 to 0.93 for Prelim2 (Table 5). It was observed that breeding sets with high heritability values also showed relatively high prediction values of genetic advance, the best sets being Prelim11 (GAPM = 28.54%; GA = 4.47 g) and Prelim2 (GAPM = 24.94%; GA = 4.00 g), while Prelim8 (GAPM = 15.97%; 2.68 g) registered the lowest percentage value of expected genetic advance (Table 5). Usefulness criterion revealed Prelim11 and Prelim2 as the most useful breeding sets for HSDWT with Up values of 20.12 g and 20.04 g respectively, while prelim10 had the lowest usefulness value even though it had moderate percentage value of expected genetic advance. When it came to DT50F, the heritability values were variable between the breeding sets ranging from low (0.20 for Prelim8) to intermediate (0.49 for Prelim5) and high (0.81 for Prelim10). Consequently, breeding sets that showed high predicted genetic advance were Prelim10 (GAPM = 6.49%; GA = 3.01 days), Prelim1 (GAPM = 6.01%; GA = 2.88 days) and Prelim5 (GAP = 5.41%; GA = 2.63 days). Prelim2 and Prelim11 had intermediate proportion of expected genetic advance while Prelim8 and 3 had the lowest prediction of genetic advance for DT50F (Table 5). To make sense of usefulness criterion for DT50F, the expected genetic advance was deducted from the mean DT50F, thereby, revealing Prelim10 (Up = 43.44 days), Prelim11 (Up = 43.59 days) and Prelim1 (Up = 45.08 days) with high genetic potential for early flowering (Table 5).

Discussion

Decisions in plant breeding are continuously becoming more complex given the dynamic consumer demands and preferences, and the current issues of climate change. As the human population continues to surge, breeders are constantly under pressure to release improved varieties with high yields and other preferred traits. Consequently, a typical active breeding program often handles multiple populations intended for varied purposes or product targets (Witcombe and Virk 2001). This introduces complex deliberations and challenges relating to handling large sizes of genetic materials, resource allocations and selection decisions at every breeding stage (Witcombe and Virk 2001; Sun et al. 2011). Therefore, Sun et al. (2011) noted that, careful choice of genotypes at each step in a breeding program is key in determining the ultimate success in the next selection stages for genetic advancement. The present study elucidated the genetic worth of eight sets of cowpea breeding materials evaluated in preliminary yield trials across two locations in Northern Nigeria, deploying the concepts of genetic variance, heritability, genetic advance and usefulness criterion to aid in making selection decisions for advancement of materials.

We began by examining the distributions of the three traits; GY, HSDWT and DT50FL, within each of the eight breeding sets. The traits variation approximated continuous distributions within the sets, suggestive of quantitative inheritance. Sinnott (1937) argued that, when phenotypic variation is presumably environmental and or conditioned by multiple genes with minor effects, the distribution is essentially symmetrical. In cowpea, grain yield, seed weight and flowering time are complex traits that exhibits quantitative variations in nature (Lopes et al. 2003; Ishiyaku et al. 2005; Boukar et al. 2016). The present study depicted different levels of total dispersions within the breeding sets, with some sets such as Prelim5, Prelim8 and Prelim11 showing slight shifts towards high GY, HSDWT and less DT50FL respectively. The observed dispersions suggested involvement of genetic factors governing the traits tested and that recovering promising lines from these sets is highly probable.

When the breeding sets were analyzed using PCA, it became apparent that the eight breeding sets were distinct from each other although some of them overlap for the three traits. Grouping the breeding sets by their means in respect to the traits allowed the PCA to highlight the potential sets for GY, HSDWT and DT50FL (Fig. 3). When PCA was examined within each breeding set, the structure reflected diversity among genotypes, but some genotypes were highly associated with GY reflecting their yield potentials while others were more correlated with HSDWT and DT50FL, implying those genotypes performed well for the traits in question (Supplementary Fig. 2). PCA was able to identify the top performing genotypes within each breeding set for the three traits, with clear categorizations of those having GY above 1500 kg/ha, seed HSDWT above 20 g and DT50FL less than 45 days.

A summary of the proportion of high performing genotypes that could be extracted from each breeding set was derived from the PCA and presented in Fig. 3. This chart portrayed Prelim5, Prelim7 and Prelim11 as sets having high frequencies of genotypes with GY above1500kg/ha, seed HSDWT above 20 g and DT50F less than 45 days, while Prelim5 had the highest number of genotypes with good combination of desired values of the three traits. PCA is a powerful data reduction tool that has been used in cowpea conventional breeding for morphological characterization and defining key determinants of grain yield (Oladejo et al. 2016). A study by (Vural and Karasu 2007) deployed PCA using multiple yield component traits to understand which of the factors explained most of the total variance in the data, and found seed weight and pod size to contribute most of the variations. In the present study, the traits distributions and PCA provided an overall picture of total variability and structure in the data among and within the breeding sets. Differences among sets were mostly explained by DT50FL as indicated by higher PC1 score for this trait than others. This observation is consistent with the fact that the sets were created based on maturity and therefore, it is expected that the groups would be distinct in terms of DT50FL. On the other hand, variation among genotypes within sets were mostly explained by GY and HSDWT as reflected by high PC1 scores for these traits. Given the information on the contributions of the traits to variation on the PC1 and PC2 axes it was possible to identify promising sets and genotypes within sets for higher GY. The fact that variability among genotypes within each set was mostly explained by GY and HSDWT implies that selection within the sets for these two traits would be more beneficial than for DT50FL. However, since the phenotypic variability generally was only slightly greater than the genetic variability in these traits, the total dispersion does not reflect wholly the magnitude of genetic variance since it is a combination of genetic and environmental variations and hence, an accurate assessment would require partitioning of total variance into its different components (Bernado 2010).

To unravel the variability between and within the sets, we conducted a two-step classical ANOVA, first between the sets and then for individual breeding sets. Sets did not show significant mean differences for all the three traits considered although numerically some sets had higher mean values than others. However, the effect of genotypes nested within location was significant, an indication that sets are likely different, but its significance could have been masked by environment. Indeed, the analysis revealed significant interactions between sets and location and that of genotypes nested in set by location. This outcome suggested that meaningful selections among sets and genotypes within sets would require testing the materials in multiple locations to eliminate the confounding effect of the environment. In addition, it’s important to understand the amount of variation within the population in addition to the mean in order to make a more informed selection decision (Tabanao and Bernardo 2005; Bernado 2010). The present study tested genotypic variation in the eight sets and found the genotype effects within each set to be significantly different for all the traits except for GY in Prelim11. This suggested that there was enough genetic variability within the sets to warrant selection and recovery of good performing lines. However, the observed significant effects of genotype-by-location interaction for traits in most of the sets suggested presence of variation in relative performance of genotypes between the locations, creating an alert to proceed with caution when merging means from the two locations to make selection within the sets (Mohammadi et al. 2015). Genotypic variation for grain yield, seed weight and flowering time in cowpea are known to be influenced by environments (Adewale et al. 2010; Odeseye et al. 2018). This complicates the selection of superior genotypes, thereby reducing genetic progress (Allard and Bradshaw 1964; Mohammadi et al. 2015). In the present case, decision would be made based on two locations data, and considering that further testing in more locations is expected, selection based on means and with a relatively relaxed selection intensity would be suggested to avoid elimination of potentially stable genotypes for GY at this stage.

To further decode the genetic potential of the eight breeding sets, total variance within each set was partitioned to reflect variances attributed to genotype, location and the interaction thereof (Table 4). This allowed further dissection of the breeding sets in terms of genetic coefficient variability, heritability, genetic advance and overall genetic usefulness of the sets. Breeding sets that had high relative magnitude of genetic variance had moderate to high heritability and further depicted relatively high expected genetic advance and genetic usefulness This observation suggested that the sets with high values of genetic variance, genotypic coefficient of variation, heritability, expected genetic advance and genetic usefulness for the traits in question, would respond well to future selection, and superior lines for the traits are extractable from these sets. This finding was consistent with the past studies in cowpea which used similar genetic parameters to evaluate the effectiveness of population response to selection (Damarany 1994; Omoigui et al. 2006; Manggoel 2012; Nwosu et al. 2013). The observed minimal differences between GCV and PCV for all the traits studied implied that the traits are mostly governed by genetic factors with little role of environment in the phenotypic expression of these characters (Manggoel 2012). Therefore, selection for these traits based on phenotypic value may be effective. Manggoel (2012) alluded to the fact that heritability estimates coupled with genetic advance are useful in predicting the resultant effect for the selection of the best individuals from a population. Moderate to high broad sense heritability values observed in the present study suggested that selection within each Prelim set for GY, HSDWT and DT50FL would be beneficial, given the moderate magnitude of environmental influence. The results of usefulness criteria were consistent with that from variance components and genetic advance. This suggested that the concept of genetic usefulness may be used to evaluate the genetic merit of specifically defined groups of breeding materials that are not necessarily derived from a two-parent cross. Usefulness criteria have historically been applied to bi-parental populations with full sib progeny to predict population performance in early generations (Tabanao and Bernardo 2005; Bernado 2010; Allier et al. 2019). The advantage of genetic usefulness is that is captures the overall value of a population in terms of its mean performance and total variance (Tabanao and Bernardo 2005; Bernado 2010; Allier et al. 2019). With homozygous lines and the opportunity for replicated testing at later generations as it is the case in the present study, there is improved prediction accuracy of genetic usefulness. The information may still be helpful at early performance testing phase, especially when there is need to prioritize among several groups of breeding materials. Indeed, our study has demonstrated that there are some sets like Prelim11 (UP = 874 kg/ha) and Prelim3 (UP = 1110 kg/ha) with relatively low GY scores that would be dropped at this stage and lines taken back in the crossing nursery for yield improvement.

The present study elucidated the structure and properties of eight sets of cowpea breeding materials that are destined for further testing, revealing the uniqueness of each set and the magnitude of expected gain from selection within each set and the genetic usefulness of each set. The variance component analysis allowed estimation of genetic and phenotypic coefficient of variation, heritability and expected genetic advance. These parameters exposed the genetic potential of eight sets of cowpea breeding lines for GY, HSDWT and DT50FL, revealing sets with high genetic variance and from which superior lines could be extracted to recommend for advanced testing. Estimates of genetic usefulness were generally consistent with results from variance components which provided additional layer of information on the score for genetic merits of the sets. The current study highlights a novel application of usefulness criteria in non-biparental populations with populations defined based on maturity groups. However, comparisons of performance among populations may be limited by the nature of traits used for grouping as in the present case where maturity may be correlated with other traits used for assessing performance. Principal component analysis depicted the relative contributions of the three traits to the variability between and within sets, revealing that more benefit would be obtained by selecting among genotypes within sets based on GY and HSDWT than on DT50FL. These approaches generated relevant information required in making decision for advancement in a conventional breeding program.