Introduction

Many thousands of plant species can be used by humanity, and around a hundred have been developed into crops. However, as only a few crops are widely grown today research interest into the so-called underutilized crops is rapidly growing—among them the yam beans (Pachyrhizus spp.). The nearest relative of economic importance is the soybean (Glycine max (L.) Merr.) and the levels of oil and protein of yam bean seeds resemble those typical of soybean (Grüneberg et al. 1999). Formerly, the genus Pachyrhizus was placed in the subtribe Diocleinae in close relationship to the subtribe Glycininae and Phaseolinae (Lackey 1977; Ingham 1990), but based on chloroplast DNA restriction site mapping, it was transferred to the subtribe Glycininae (Bruneau et al. 1994; Polhill 1994). Within the Glycininae, the yam bean shows a close relationship to tropical kudzu (Pueraria phaseoloides (Roxb.) Benth.) and other genera with a chromosome base number of x = 11 (Lee and Hymowitz 2001; Kumar and Hymowitz 1987). The yam bean species are diploid (2n = 22), self-pollinating (up to 8% cross pollination) and native to South and Central America (Sørensen 1990). The genus is defined as a homogeneous entity due to the stigma structure having a median to subterminal globular process on the adaxial side, the short hairs on the adaxial side of the ovary extending almost to the stigma, and the formation of storage roots (Sørensen 1988). Unlike its close relative, the soybean, the yam bean is exclusively used for its storage roots (Ramos-de-la-Peña et al. 2013). The use of yam bean seeds as source of biodegradable insecticide is also of potential economic interest due to their high rotenone contents (Lautié et al. 2012). The crop is the most important storage-root-forming legume, as its productivity is high and it has also high protein content in the storage roots (NRC 1979). In the cultivated species, due to the roots’ high moisture content, and their traditional raw consumption, they have been considered exclusively as fruity vegetables.

The genus Pachyrhizus encompasses two wild (P. ferrugineus, P. panamensis) and three cultivated species: Amazonian yam bean (P. tuberosus), Mexican yam bean (P. erosus), and Andean yam bean (P. ahipa). The cultivated species are separated taxonomically on morphological and physiological traits using univariate statistics (Sørensen 1988; Sørensen et al. 1997a, b): (1) P. ahipa—in contrast to P. tuberosus and P. erosus‒ is bushy or semi-erect with generally entire leaflets and with short racemes, which are only basally dibotryoid; it is day length insensitive and only found cultivated in cool tropical and subtropical Andean valleys within 1800–2900 m a.s.l. (2) P. tuberosus—in contrast to P. erosus—has wing and keel petals that are ciliolate and rarely glabrous; the legume at maturity is 13‒14 cm long and the seeds are plump and reniform with the exception of the square seeds of the ‘Chuin’ cultivar group (Sørensen et al. 1997a); usually plants are larger than P. erosus and P. ahipa, i.e. the stem can reach up to 10 m in length, but semi-erect types can be found that exhibit growth type similar to those of semi-erect P. ahipa; the habitat is wet tropical lowlands of Central and South America and the slopes of the Andean mountain range within an altitude range from sea level to 2000 m a.s.l. (Sørensen et al. 1997b) (3) P. erosus has wing and keel petals that are glabrous; the pod is glabrous to strigose at maturity and 6‒13 cm long; the seeds are flat and square to round; it is widely distributed throughout many tropical and subtropical regions in South and Central America, South and East Asia and the Pacific from sea level to 2200 m a.s.l. (Sørensen 1988).

Interspecific crosses among all three cultivated yam bean species result in fertile and vigorous hybrids with one exception, i.e. P. ahipa × P. tuberosus ‘Ashipa’ yielded non-functional seeds (Grum 1990; Sørensen 1991; Grüneberg et al. 2003; Agaba et al. 2017). From the breeder’s perspective, the species form one primary genepool. In a validation of the taxonomic separation of Sørensen (1988), no clear overall separation between the three cultivated species was found by Døygaard and Sørensen (1998), based on 18 morphological characteristics as characteristics of the tuberous roots is not included in the herbarium material analysed by principal component analysis. The authors concluded that flower and inflorescence characters appeared to be the major differences, followed by leaf, legume and seed characteristics among accessions of the yam bean genepool. Using molecular characterization [random amplified polymorphic DNA (RAPD) markers], Estrella et al. (1998) observed a clear genepool separation between P. tuberosus and P. erosus. However, that study did not consider P. ahipa.

Pachyrhizus erosus and P. ahipa are not subdivided into cultivar groups, but for P. tuberosus four cultivar groups are distinguished: ‘Chuin’, ‘Ashipa’, ‘Yushpe’ and ‘Jíquima’ (Sørensen et al. 1997a, b; Tapia and Sørensen 2003; Oré-Balbin et al. 2007). Both ‘Jíquima’ and ‘Ashipa’ have low storage-root dry matter content similar to that of P. erosus and P. ahipa, whereas ‘Chuin’ and ‘Yushpe’ cultivars exhibit a high storage root dry matter content (Grüneberg et al. 1998; Oré-Balbin et al. 2007). The Peruvian ‘Chuin’ type of P. tuberosus was first reported by Tessmann (Tessmann 1930; Sørensen et al. 1997a, b) and is cooked and consumed like cassava from the root of the manioc plant (Sørensen et al. 1997a, b; Grüneberg et al. 2003). Its existence has caused researchers to conclude that the yam bean could be used and developed as a protein-rich starchy staple also outside its current area of cultivation along the Río Ucayali, Peru. Due to the later discovery, the ‘Yushpe’ cultivar group was not included in the study presented here. Studies of the genetic diversity within the cultivated species have been performed in P. erosus (Heredia-Zepada and Heredia-Garcia 1994; Estrella et al. 1998) and in P. tuberosus (Sørensen et al. 1997a, b; Estrella et al. 1998; Tapia and Sørensen 2003). There is no study on the genetic diversity in P. ahipa, except those which had involved univariate descriptive statistics (Ørting et al. 1996) and no investigation of the genetic diversity comprising all the three species (P. erosus, P. tuberosus and P. ahipa), except the study by Santayana et al. (2014). This is of interest because breeding is aiming to combine the wide adaptation of P. erosus, the storage root quality of the ‘Chuin’ type in P. tuberosus and the bushy-erect growth type and day-length insensitivity of P. ahipa.

A good description of the plant materials is necessary for the effective use of germplasm resources and for crop improvement. Therefore, curators of genebanks characterize their materials, recording selected traits of an accession. Traditionally, these data are limited to highly heritable morphological and agronomic traits (Acquaah 2007). With increases in germplasm sizes and data on molecular, biochemical, morphological, and agronomic traits, multivariate statistical analysis (MVA) methods are receiving increasing interest. Some MVAs (e.g. multivariate analysis of variance, MANOVA, discriminant function analysis, DFA, and partial least square, PLS) are extensions of uni- and bivariate statistical methods appropriate for significance tests of statistical hypotheses. MVA methods classify and order large numbers of breeding material and, trait combinations. Unfortunately, handling data with multi-collinearity can be unwieldy and variables are best considered together since they may be interdependent (Becker 2011; Acquaah 2007; Hill et al. 1998; Falconer 1989; Wricke and Weber 1986; Wricke 1962). Multivariate statistical analyses (MVA) will be advantageous in analyzing genetic diversity and classifying germplasm collections (Acquaah 2007; Mohammadi and Prasanna 2003). Recently, several studies have been conducted to assess the genetic diversity based on morphological traits and using multivariate procedures including principal component analysis (PCA) (Badu-Apraku et al. 2011; Zeba and Isbat 2011; Hailu et al. 2006), discriminant function analysis (DFA) (Francisco et al. 2011; Safari et al. 2008), multivariate analysis of variance (MANOVA) (Arms et al. 2016; Ukalska et al. 2006), and partial least squares (PLS) (Jaradat and Weyers 2011). MANOVA analyses differences among populations for a given trait, and the distinctiveness is studied with a number of vector variables combined (Acquaah 2007; Zhu 1990). Population or genotype discrimination can be achieved by discriminant function analysis (DFA). DFA as a post-cluster analysis method was able to recognize the accuracy of clustering when used by several researchers (Safari et al. 2008). The magnitude of each trait in the genetic diversity of tall fescue (Festuca arundinacea Schreb.) (Vaylay and Van Santen 2002), hairy vetch (Vicia villosa Roth) (Yeater et al. 2004) and groundnut (Arachis hypogaea L.) (Safari et al. 2008) was identified using discriminant functions.

Understanding the level of genetic diversity is helpful for the selection of parental genotypes and important in widening the genetic base of crops (Molosiwa et al. 2016). Assessment of crop diversity also allows efficient sampling and proper management of the germplasm (Van Hintum et al. 2000). Reports on application of MVA in yam bean research are still limited. Da Silva et al. (2016) characterized 64 yam bean accessions maintained in genebank in Brazilian Amazon in a location using PCA and cluster analysis. Tapia and Sørensen (2003) studied the morphological variation in a germplasm collection of P. tuberosus grown at CATIE (Costa Rica) by canonical statistics. Similar reports were highlighted on recent performance investigations in yam beans using descriptive statistics and PCA in Rwanda and Uganda, respectively (Ndirigwe et al. 2017; Agaba et al. 2016).

This study had three objectives: (1) to investigate the genetic variation comprising all species and cultivar groups currently of interest for breeding using morphological as well as agronomic traits measured under field conditions by different assessment procedure; (2) to determine and compare the amount of diversity present between and within the three species by diverse multivariate analysis (MVA) methods; and (3) to determine the morpho-agronomic quantitative and qualitative traits that contribute most to the differentiation between and within species.

Materials and methods

Plant material

In total, 34 entries were studied: 14 P. ahipa entries [13 seed sample accessions; one genotype from each seed sample accession, except for sample AC214 from which two genotypes (AC214-109 and AC214-110) that differed in flower colour were selected], 14 P. erosus entries, and 6 P. tuberosus entries, including five ‘Chuin’ and one ‘Ashipa’ seed sample accession (Table 1).

Table 1 List of tested entries and their passport data; AC = Andean yam bean (Pachyrhizus ahipa)

Study sites and experimental design

The field experiments were carried out between June 2001 and January 2002 at two locations in Benin: the Centre Songhai in Porto-Novo (02°37′E, 06°29′N) and at the experimental station of INRAB (Institut National des Recherches Agricoles du Bénin) in Niaouli (02°18′E, 06°66′N). The soils at both locations were well-drained sandy red loams. Each species was grown in one of three separate but adjacent experiments in the same field, and each experiment was laid out as a randomized block design with two plot replications for each entry. The experimental plots consisted of four rows, each containing 24 plants; the distance between plots was 1 m. In the experimental plots, the planting distance was 0.75 m between rows and 0.25 m within rows. Two seeds were sown per planting station at a depth of approx. 2 cm. Five weeks after sowing surplus plants were removed leaving only one per planting station. In the case of P. tuberosus and P. erosus, two stakes were used as trellises. Weeds were manually removed every 2 weeks, and no fertilisers or pesticides were applied. The rainfall during the crop growth period was 460 mm and 393 mm at Songhai and Niaouli, respectively. The trials at Porto-Novo were irrigated during August and September (approx. 800 mm irrigation), whereas at Niaouli no irrigation was applied. The average annual temperature at the experimental sites ranges from 23 °C in August to 28 °C in May. The average temperatures during this study were 28.1 °C (Songhai) and 27.2 °C (Niaouli).

Traits recorded

In total, 75 morpho-agronomic characters were measured (Tables 2, 3). The descriptors developed for Phaseolus spp., Vigna spp. and Ipomoea batatas (L.) Lam. were used with small modifications (IBPGR 1985, 1987, 1991). Data were recorded on a plot basis, six individual plants selected randomly from the two central rows of each plot and six plant parts of six individual plants selected randomly from the two central rows of each plot. Measurement unit and measurement/sampling procedure of each trait are given in Tables 2 and 3. A calliper was used to measure most of the morphological characters. Storage root crude protein content was determined by measuring the N content according to the Dumas method (Sweeney and Rexroad 1987) and multiplying the N content by 6.25; starch content was determined using the polarimetric standard analysis No. 123/1 (ICC 1999), while sucrose, glucose and fructose contents were analysed enzymatically as described by Boehringer (1983).

Table 2 Variable sets and 50 observed yam bean quantitative characters, codes, measurement units and measurement procedures
Table 3 Variable sets and 25 observed yam bean qualitative characters; codes; measurement units and measurement procedures

Quantitative traits

Fifty (50) quantitative traits were surveyed in each accession or line. Traits were recorded on variable sets represented by the following plant organs: storage root, seed, pod, flower, stem, leaf, and composite characters evolving more than a single plant organ in their magnitude. The quantitative traits encompassed 26 morphological and 24 agronomic and quality attributes. All the traits evaluated are listed in Table 2.

Qualitative traits

Twenty-five (25) qualitative traits were studied (Table 3). They were treated as quantitative since they showed continuous variation within and among species. All qualitative characters were morphological (Table 3) and scaled diversely according to the visual observation on frequency among genotypes (Table 3).

To sum up, investigated traits encompassed 50 quantitative (Table 2) and 25 qualitative characters (Table 3). Qualitative characters were described as exhibiting continuous or quantitative variation.

Statistical analyses

Phenotypic variation estimates

Statistical analyses were conducted using SAS6.12 (SAS 1997). Initially data were classified relative to the experimental factors: species (S), entries (A), and location (L). For each trait xi the variance components were estimated relative to: species (\( \sigma_{S}^{2} \)), entries within species (\( \sigma_{A(S)}^{2} \)), locations (\( \sigma_{L}^{2} \)), and the error term (\( \sigma_{\varepsilon }^{2} \)) comprising the genotype-location interactions and the plot error. Variance component estimations were performed using the SAS procedure REML (SAS 1997) and the statistical model: \( Y_{ijkl} = \mu_{i} + s_{ij} + a(s)_{ik(j)} + l_{il} + \varepsilon_{ijkl} \), where \( Y_{ijkl} \) is the observed value of the \( i \)th trait of the \( j \)th species, for the \( k \)th entry and \( l \)th location, \( \mu_{i} \) is the trial mean of the \( i \)th trait; \( s_{ij} \), \( a(s)_{ik(j)} \), and \( l_{il} \) are, respectively, the effects of species, entries within species, and locations; \( \varepsilon_{ijkl} \) is the error, comprising effects of genotype-location interactions and the plot error.

We estimated the phenotypic variation within (PVA(S)) and between species (PVS) as:

$$ {\text{PV}}_{{{\text{A}}\left( {\text{S}} \right)}} = {\raise0.7ex\hbox{${\sigma_{A(S)}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{A(S)}^{2} } {{\text{V}}_{\text{P}} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${{\text{V}}_{\text{P}} }$}}\quad {\text{and}}\quad {\text{PV}}_{\text{S}} = {\raise0.7ex\hbox{${\sigma_{S}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{S}^{2} } {{\text{V}}_{\text{P}} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${{\text{V}}_{\text{P}} }$}} $$

where \( \sigma_{A(S)}^{2} \) is the sum of variances represented by within and between genotypes in a given species, \( \sigma_{S}^{2} \) the sum of variances within and between species in the genus, \( \sigma_{e}^{2} \) is the experimental error term including G × E interactions to take into account the inter-annual variation in the expression of a trait due to environmental different effects on genotypes.

The phenotypic and average genetic diversity (H′) at species and genus levels for all traits were further assessed using the Shannon‒Weaver (1949) index of diversity as applied by Al Khanjari et al. (2008) and were calculated based on phenotypic frequency of alleles controlling each trait category of descriptors. The differences among the groups for the levels of diversity were tested using the Wilcoxon non-parametric test implemented in JMP 7.0 software (SAS Institute, Inc. 2007). Each character was categorized into specific class states. The 25 qualitative and 50 quantitative characters were assigned to classes ranging from 1 to 22, and analyzed using the Shannon–Weaver diversity index (H′; Shannon and Weaver 1949) as defined by Jain et al. (1975) to calculate phenotypic variation of each accession:

$$ {\text{H}} = - \mathop \sum \limits_{{{\text{i}} = 1}}^{\text{n}} {\text{P}}_{\text{i}} {\text{lnP}}_{\text{i}} $$

where n is the number of phenotypic classes for a character and Pi is the genotype frequency or the proportion of the total number of entries in the ith class.

H was standardized by converting it to a relative phenotypic diversity index (H΄) after dividing it by \( H_{max} = log_{e}^{\left( n \right)} \)

$$ {\text{H}}^{{\prime }} = - \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{P}}_{\text{i}} \ln {\text{P}}_{\text{i}} }}{{{\text{H}}_{\hbox{max} } }} $$

Principal component and cluster analyses

Principal component analysis was performed using SAS procedure PRINCOM (SAS 1997). The spatial relationships among entries (accessions and genotypes, respectively) were presented by plotting the scores of the first, second and third principal components in a three-dimensional space. Correlations of all traits with the first five principal components were calculated using the SAS procedure CORR, (SAS 1997) using the Pearson correlation coefficient. A cluster analysis was carried out using the SAS procedure CLUSTER (SAS 1997). All traits were standardized by their mean value and standard deviation [z = (x − \( {\bar{\text{x}}} \))/s] using the STD option of the CLUSTER procedure. Euclidian distances were calculated and a cluster analysis, involving the unweighted group average linkage method (UPGMA), was conducted using the AVE option of the CLUSTER procedure. Cluster summaries were plotted using the SAS Macro DENDRO (Nicholson 1995). All traits with estimated ratios of \( \sigma_{A(S)}^{2} \)/\( \sigma_{\varepsilon }^{2} \) > 2 and significant correlation with at least one of the first five principal components were analysed by a multiple regression analysis to select useful traits to differentiate among entries. The multiple regression analysis was performed by SAS procedure REG with the selection option STEPWISE (SAS 1997). The dependent variables in the multiple regression model were the first five principal components, whereas the regressor variables were those traits with estimated variance component ratios of \( \sigma_{A(S)}^{2} \)/\( \sigma_{\varepsilon }^{2} \) > 2 (traits with considerable genetic variation between entries and low genotype-location interactions and plot errors). In a final analysis, an environment search was carried out to identify—on the basis of temperature and rainfall range (including irrigation) at the experimental sites—some relevant target set of yam bean production environments in the regions of the world. The environment search was carried out with ArcGIS software and the options: (1) temperature range of 23‒28 °C, (2) a rainfall range of 400‒1200 mm and (3) at least six consecutive months of the temperature and rainfall parameters.

Multivariate analysis of variance and discriminant function analysis

Further multivariate analyses including multivariate analysis of variance (MANOVA) and discriminant function analysis (DFA) methods were applied to distinguish within- and between-species variations on the 34 accessions and lines of Pachyrhizus studied. Analyses were based on four different data sets consisting of all the P. ahipa, P. erosus, P. tuberosus accessions, as well as the 34 genotypes taken together. MANOVA and DFA were fulfilled separately in P. ahipa, P. erosus, P. tuberosus, and on the other hand in all genotypes taken together with the raw data from all observed variables. A comparison of the outputs has led to conclude on the consequences of unbalanced entry number size upon the accuracy of the results. To test the power of discrimination across locations, the within- and among-diversity were estimated considering G × L and S × L interactions, where G and S represent Genotype and Species, respectively. MANOVA and DFA were performed by partitioning the raw data into variable sets according to the plant organ they are inferred to Tables 2 and 3.

To determine the number of groups repre-senting the optimal partition in the hierarchical tree, a multivariate analysis of variance (MANOVA) was performed. MANOVAs using the Wilks’ lambda statistic and Hotelling test as well as Pillai’s trace and Roys’ Max root were performed using the raw data for all 75 variables with the MANOVA statement in JMP 7.0 (SAS Institute, Inc. 2007). In the MANOVA, sources of variation are as described by Lázaro-Nogal et al. (2015) and Ukalska et al. (2006), and the followed model was used:

$$ {\text{Y}} = 1_{N} {\text{m}} + {\text{XG}} + {\text{ZR}} + {\text{E}} $$

where Y is the (N × k)-dimensional observation matrix with k, the number of response traits; 1Nis the (N × 1)-dimensional unit vector; N, the total number of not empty subclasses in the two-way data set; m is the k-dimensional vector of the general means; X is the (N × a)-dimensional design matrix for genotypes; G is the (a × k)-dimensional matrix of the random genotypic effects; Z is the (N × b)-dimensional design matrix for locations; R is the (b × k)-dimensional matrix of the random location effects; and E is the (N × k)-dimensional matrix of the residuals. These estimates were calculated using the MANOVA option of JMP 7.0 (SAS Institute, Inc. 2007). A significant effect of genotype indicates genetically based phenotypic differences. This model was repeated with mixed models using restricted maximum likelihood (REML), testing for the fixed effects of location and the random effects of genotype and interaction.

A discriminant function analysis (DFA) was carried out in JMP 7.0 (SAS Institute Inc. 2007) to identify which vari-ables best differentiate the groups identified in the hi-erarchical classification. The correlation of each variable with each discriminant function based on the structure matrix was used to create the discriminant function. These Pearson coefficients are structure coefficients or discriminant loadings and function like factor loadings in factor analysis. By determining the larg-est loadings for each discriminant function, insights were gained into how to name each function.

To avoid biases, which might occur due to heterogeneity of variance from difference in genotype size between the three species surveyed (e.g. among P. tuberosus represented by 6 accessions, and P. ahipa and P. erosus with 14 genotypes within each species), response variables were log transformed before the multivariate analyses, which as suggested by Hughes et al. (2009) are necessary to meet assumptions of normality and homogeneity of variance. Linear relationships among the variables investigated were not improved by logarithmic transformation and therefore untransformed data was used afterwards in the following analysis.

To analyze possible sampling error bias—that may due to unbalanced data or unequal genotype size among species analyzed here– three effect size estimators (η2, Ɛ2, ω2) were generated from the ANOVA, to take into account a possible lack in accuracy of our results, which relied upon an heterogeneity of variances. This analysis was performed following the procedure and the software elaborated by Skidmore and Thompson (2013). The total sample or entry size of 34 in the present study falls between the smaller (24) and the larger (48) sample sizes studied by Skidmore and Thompson (2013) with a number k of groups equal to 3, corresponding here to the three species involved in the study.

Results

Phenotypic variation

Variance component estimations (Tables 4, 5) show that for a large number of traits \( \sigma_{S}^{2} \) or \( \sigma_{A(S)}^{2} \) are larger than \( \sigma_{\varepsilon }^{2} \). For many traits, \( \sigma_{S}^{2} \) is larger than \( \sigma_{\varepsilon }^{2} \), and \( \sigma_{A(S)}^{2} \) is smaller than \( \sigma_{\varepsilon }^{2} \), i.e. comprising most agronomical traits, but also several morphological traits; this includes situations with zero estimates for \( \sigma_{A(S)}^{2} \) and genetic variation within species, respectively. For 40 traits, estimates of \( \sigma_{A(S)}^{2} \) are larger than \( \sigma_{\varepsilon }^{2} \). Among these are 33 traits for which the estimated ratios of \( \sigma_{A(S)}^{2} \)/\( \sigma_{\varepsilon }^{2} \) were > 2 and nearly all these are morphological traits.

Table 4 Variance component estimations of species (\( \sigma_{S}^{2} \)), entries within species (\( \sigma_{A(S)}^{2} \)) and the error term [(\( \sigma_{\varepsilon }^{2} \)) comprising genotype-location interactions and plot errors] for 50 morphological and agronomic quantitative traits measured in 34 Pachyrhizus entries
Table 5 Variance component estimations of species (\( \sigma_{S}^{2} \)), entries within species (\( \sigma_{A(S)}^{2} \)) and the error term [(\( \sigma_{\varepsilon }^{2} \)) comprising genotype-location interactions and plot errors] for 25 morphological qualitative traits measured in 34 Pachyrhizus entries

Yam bean accessions and lines used appeared to be well differentiated from one another for all the 75 characters investigated, except for some qualitative traits. Table 6 reports the trait variation for each quantitative attribute. Significant differences were observed among species as well as inside them for each variable, particularly for quantitative character; trait variation ranged from 0.81 (for total harvest index) to 49.35% (for number of storage roots per plant). Lower trait variation inferior to 10% was noted in 16 characters. High differences were, however, observed in many other traits such as beginning of flowering with a phenological trait variation of 20.93% (Table 6). Agronomic and quality characters were also highly varied (Table 6). The trait pod number per plant showed further a high variation (46.39%) within and among species of the germplasm collection under evaluation.

Table 6 Percentage (%) of trait variation and significance levels for each quantitative morphological and agronomic trait considered in Pachyrhizus

Phenotypic variation estimates among genotypes within species (PVA(S)) in quantitative characters ranged from 0.00 to 82.61% (Table 7). PVA(S) was high and above 10% for most traits (Table 7). Among species, phenotypic variation (PVS) ranged from 0.00 (for storage root sucrose content) to 95.02% (for ratio of lateral leaflet length to maximum width) indicating that interspecific variability is higher. Furthermore, that variability was clearly above 10% in all characters with the exception of storage root dry matter yield, ratio of storage root length to maximum width, vine start of climbing, and leaf number per plant (Table 7).

Table 7 Phenotypic variation within (PVA(S)) and among (PVS) the three cultivated yam bean species evaluated at two locations in West Africa for 50 morphological and agronomic quantitative traits measured in 34 Pachyrhizus entries

For qualitative characters, PVA(S) was in general higher than PVS for most of the traits. It ranged from 0.00 (for damage of storage root by insects and nematodes) to 80.03% (for shape of storage root). For most of the qualitative traits, PVA(S) presented values above 50%, except for damage of stem and leaves by fungi, leaf green colour, and damage of stem and leaves by insects (Table 8). Among species variation (PVS) for qualitative characters was rather smaller than at the intraspecific level (Table 8). The highest value for PVS was scored in leaf green colour (81.58%), and the lowest in shape of storage root (16.12%). Most of the qualitative traits showed PVS values between 25 and 50%; thus lower than PVA(S), where values are often clearly above 50% (Table 8).

Table 8 Phenotypic variation within (PVA(S)) and among (PVS) the three cultivated yam bean species evaluated at two locations in West Africa for 25 morphological qualitative traits measured in 34 Pachyrhizus entries

Over all fourteen P. ahipa accessions and lines—with regard to quantitative characters—(Table 9), the mean Shannon–Weaver diversity index (H′) value was highest for pod yield (1.00) followed by period of flowering (0.99), seed width (0.99), seed yield (0.99), time of maturity (0.96), and seed height (0.95). In all traits, H′ is higher than 0.50, except for pod degree and shape of curvature (0.37) and pod beak degree curvature, which presented monomorphism (H′ = 0.00) (Table 9). The average diversity index value within P. ahipa was 0.83 comparable to the observed mean in P. erosus (0.87) and P. tuberosus (0.86), where the highest values for Shannon–Weaver index (H′ = 1.00) were noted in characters inflorescence length, pod length and storage root glucose content; and the lowest values were observed for pod beak degree and shape of curvature (0.47) in P. erosus, while in P. tuberosus the highest H′ appeared for harvest index of storage root yield (H′ = 1) and the lowest for storage glucose content (H′ = 0.61) (Table 9). The Shannon–Weaver diversity index was in general high and over 0.80 for most of the traits evaluated. At the generic level, monomorphism in P. ahipa scored similarly for the trait pod beak curvature. Except character pod degree and shape of curvature (H′ = 0.09), the diversity index was equal to or higher than 0.65. Across the 50 quantitative traits, Shannon–Weaver diversity index in the yam bean germplasm is quite similar and around 0.83 both within and among the three species studied (Table 9).

Table 9 Estimation of the standardized Shannon–Weaver diversity index (H′) for 50 quantitative morphological and agronomic characters of yam bean among and within 3 species in the genus Pachyrhizus

Diversity index estimates with the 25 qualitative characters indicated generally lower values compared to quantitative attributes (H′ = 0.51; 0.61; 0.46 and 0.51 within P. ahipa, P. erosus, P. tuberosus and the genus Pachyrhizus, respectively) (Table 10). Monomorphism (H′ = 0.00) was observed for the traits: colour of storage root, surface defect of storage root, cracking of storage root, dehiscence of pod, mature pod colour, shape of central terminal leaflet lobe and shape of central lateral leaflet lobe inside P. ahipa. In P. erosus, only the trait mature pod colour exhibited the monomorphism. Within P. tuberosus, the characters: surface defect of storage root, cracking of storage root, mature pod colour, storage root colour, flower colour of petal, flower colour of wing and plant type were monomorphic (H′ = 0.00) (Table 10), whereas among the three species, monomorphism was observed in seven traits. Diversity index ranged from 0.00 to 1.00 in P. ahipa; from 0.00‒0.83 in P. erosus, and 0.00‒0.93 in P. tuberosus. Amount of diversity within and between yam bean species using qualitative traits was lower than with quantitative characters (Tables 9, 10). However, P. tuberosus yielded higher H′ values for most of the qualitative traits than P. erosus and P. ahipa, respectively.

Table 10 Estimation of the standardized Shannon–Weaver diversity index (H′) for 25 qualitative morphological characters of yam bean among and within 3 species in the genus Pachyrhizus

Principal component analysis

The first ten principal components of the analysis explained 90% of the total variation. The first, second, third, fourth and fifth principal component accounted for 39.3, 21.3, 8.3, 5.7 and 3.8% of the total variance, respectively. The first component was highly correlated with eight agronomic traits and more than eleven morphological characters (Tables 11, 12). The second component was mainly related to variation in three agronomic and seven morphological traits (Tables 11, 12). The third component was highly correlated with five agronomic traits and only one morphological attribute, while the fifth principal component was significantly correlated only with two morphological traits (Tables 11, 12).

Table 11 Pearson correlation coefficients for the relationship between each of the first five principal components (PC) and each of 50 quantitative morphological and agronomic traits measured in 34 Pachyrhizus entries
Table 12 Pearson correlation coefficients for the relationship between each of the first five principal components (PC) and each of 25 qualitative morphological and agronomic traits measured in 34 Pachyrhizus entries

The plot of the scores of the first three principal components (Fig. 1) derived from the principal component analysis showed that the three species are clearly separated, and that within P. tuberosus, the ‘Chuin’ and ‘Ashipa’ cultivar groups can be distinguished on the basis of the third principal component. All P. ahipa entries had negative scores for the first and second principal components. Entries of P. tuberosus from all cultivar groups sensu lato had positive scores for the first and second principal components, whereas P. erosus entries showed positive scores for the first principal component, but negative scores for the second principal component. The ‘Ashipa’ entry of P. tuberosus was the only entry with a large negative score for the third principal component.

Fig. 1
figure 1

Plot of the scores for the first (PC1), second (PC2) and third principal component (PC3) of the principal component analysis of 34 entries of three Pachyrhizus species determined from 75 morphological and agronomic traits; triangle = P. ahipa, circle = P. erosus, diamond = P. tuberosus (‘Chuin’), square = P. tuberosus (‘Ashipa’)

Cluster analysis

In general, the results of the cluster analysis (Fig. 2) were similar to those from the principal component analysis. Each species entry was usually clustered at the first fusion steps and the average Euclidian distance between entries within species was large (> 0.25). Pachyrhizus erosus, P. ahipa and the P. tuberosus of the ‘Chuin’ cultivar group formed three main groups. At the final fusion steps, the P. tuberosus ‘Chuin’ group was aggregated with the P. erosus group; this ‘P. tuberosus ‘Chuin’–P. erosus’ cluster was then merged with the P. ahipa group and, following this, with the P. tuberosus ‘Ashipa’ cultivar group. The P. tuberosus ‘Ashipa’ and ‘Chuin’ fell into two distinct clusters, and the average Euclidian distance between P. tuberosus ‘Chuin’ and P. erosus was smaller than the average Euclidian distance between the two P. tuberosus types. Within each of the three species, a similar amount of diversity was observed and several subgroups could be identified. The cluster structure obtained for P. erosus only partly reflects the geographic origins of the entries (Table 1). Thus, some clusters were formed by entries with the same origin (e.g. EC040, EC041 and EC042 from Guatemala), while other clusters combined entries with different origins (e.g. ECKEW from Mexico and EC533 from Macau in Asia).

Fig. 2
figure 2

Cluster analysis of 34 entries of three Pachyrhizus species based on 75 agronomic and morphological traits; TC = P. tuberosus, AC = P. ahipa and EC = P. erosus

Only a few traits remained in the final multiple regression model among all agronomic and morphological traits with variance component estimations of \( \sigma_{A(S)}^{2} \)/\( \sigma_{\varepsilon }^{2} \) > 2 that entered the stepwise multiple regression analysis as regressor variables (principal components as dependent variables) (Table 13). Inflorescence length, pod degree and shape of curvature, and pod green colour showed a coefficient of determination (R2) of 0.976 for the first principal component. Thousand seed weight, start of flowering, shape of central terminal and lateral leaflet lobe had a coefficient of determination (R2) of 0.972 for the second principal component. The third, fourth and fifth principal components were determined to 50.1% by pod beak curvature, to 69.8% by pod beak curvature and width of storage roots, and to 37.1% by terminal leaflet lobe type, respectively. In total only 10 traits were significant at the 0.01 level for the first five principal components in the regression analysis. These principal components explained 78.7% of total variation observed in this study, and the 10 selected traits (i.e. IF, PDS, PC, TSW, BF, SCTLL, PBC, WS and TLLT) explained 69.6% of the total variation observed.

Table 13 Morphological and agronomic traits measured in 34 Pachyrhizus entries with variance component estimations of \( \sigma_{A(S)}^{2} \)/\( \sigma_{\varepsilon }^{2} \) > 2 entered and left in multiple regression analysis for variable reduction in future Pachyrhizus classificatory studies

The global search for environmental conditions characterized by comparable temperature and rainfall parameters of the experimental sites, i.e. average annual temperature from 23 to 28 °C and precipitation from 400‒1200 mm (including both rainfall and irrigation) for at least six consecutive months, i.e. the crop duration at experimental sites, revealed that similar conditions can be found in 85 countries around the world (Fig. 3). The relevant target set of yam bean production environments are mainly located in Central and South America, West, Central, and South East Africa, South East Asia and the Pacific. Albeit, the environmental survey did fail to identify the known yam bean production areas in Central Mexico, South India and South China.

Fig. 3
figure 3

Target set of yam bean production areas in the tropics, which correspond to the average annual temperature range (23–28 °C) and rainfall range [400–1200 mm (including irrigation)] at experimental sites for at least six consecutive months

Multivariate analysis of variance

The behaviour of the genotypes and species was the same regardless of the environment used, considering all the 50 quantitative and 25 quantitative variables simultaneously. The main effect of each factor (G or L) was then investigated separately. For the factor G, significant differences were noticed between all genotypes or species. All entries react also in the same way since P ≤ 0.0001.Furthermore, the other multivariate contrasts were significant (Table 14).

Table 14 Wilks’ λ, Hotelling-Lawley, Pillai’s trace and Roy’s Max root results from the MANOVA analysis of the 75 traits investigated for the sources of variation (Location, Species, Genotype) and their interactions

The MANOVA against all 50 quantitative and 25 qualitative traits measured variables revealed then significant Wilks’ Lambda (P ≤ 0.0001). They yielded also significant results with similar statistics (Pillai’s trace, Hotelling-Lawley trace, and Roys’ Max root) (Table 14). Wilks’ Lambda was transformed as an F approximation. Strong significant differences were detected among all genotypes within as well as between species, which suggested the need for discriminant analysis for centroid comparison between groups. All parallel statistical tests resulted from MANOVA (Hotelling-Lawley, Pillai’s trace and Roys Max root) were treated in the way as Wilks’ Lambda for the F test. The tested morpho-agronomic and quality traits can be efficiently utilized in further breeding programs. MANOVAs were conducted for sources of variation Location, Genotype (G), Species (S), G × L, and S × L in the full MANOVA, excepted for qualitative characters where little variability is exhibited within each species (Table 14). In agreement with the PCA and cluster analysis, MANOVA indicated that the main components of the total phenotypic variance were due to almost 28 characters of the 75 evaluated across two environments. Comparison of the 50 quantitative and 25 qualitative traits using MANOVA showed a significant difference between accessions of Pachyrhizus species at all taxonomic levels examined. The difference is based mainly on 19 quantitative traits and 9 qualitative characters which showed positive Eigenvectors related to genotype and species distinctiveness (Tables 2 and 3). The MANOVA applied to the 34 accessions exhibited significant difference between the 3 species pairwise (F = 3.90; P ≤ 0.0001). This result relied mainly on the 19 aforementioned quantitative characters of the 75 measured. However, some differences appeared among the three species. In Pachyrhizus ahipa, 8 traits were sufficient to analyze variation (Zanklan, unpublished data), while 18 traits were necessary in P. erosus and P. tuberosus (Zanklan, unpublished data) as indicated in Tables 2 and 3. Differences between accessions inside a given species are as large as differences among species as described by phenotypic variation estimates and Shannon–Weaver diversity index analyses (Tables 9, 10).

Discriminant function analysis

Tables 15‒23 are provided in Electronic Supplementary Material Table ESM1.

Tables 15‒17 present the discriminant function analysis (DFA) structure for the 75 morpho-agronomic characters, the 50 quantitative and the 25 qualitative traits effects respectively, and related statistics including eigenvalues, proportion of total discriminant power accounted for by the first five canonical functions as well as cumulative amount of discrimination power of functions. Tables 18‒20 show results on DFA performed to dissect differences between entries at the interspecific scale.

Similarly, Tables 21‒23 show standardized canonical coefficients against the first three functions for comparison of the performance of genotypes within and among the three yam bean species studied, again at both intra- and interspecific levels. Graphically, the mean scores of the first two canonical variates were scatter plotted to show visually differences between genotypes within and among species across two environments to indicate the original variables, which easily explained the variability. These are showed in Figs ESMs 1‒5, respectively. Figs ESMs 6‒8 show discrimination variability between the two locations used in the present study. Differences between genotypes were examined within and among the species.

Discriminant function analysis (DFA) carried out on the entire 75 morpho-agronomic characters scored with emphasis on traits recorded on different plant organs (Table 15) showed that the cumulative variance explained by the first five canonical variates accounted for 98.90% of the total variance with respect to both quantitative and qualitative storage root traits. The first two functions accounted for 96.77%. The variables which most contributed to those canonical variates were metrical as well as visual descriptor traits as shown in Table 21 listing the standardized canonical discriminant function coefficients between the first three canonical scores of discriminant ordinations and 75 morphological and agronomic traits in Pachyrhizus spp. The first discriminant function, which accounted for 94.96% of the total variance, was negatively correlated with 27 characters. Positive association with CAN1 was due to nine traits (Table 21). Examination of the second function suggested it was mainly associated with eight characters (with positive correlation) and negatively linked to twenty further traits (Table 21). The order in which the variables were included in the discriminant analysis indicates their relative importance in classifying entries.

Canonical analysis to find divergent trends for root related traits resulted in five main variates that accounted together for 98.90% of the total variation in Pachyrhizus (Table 15). The first and second variates contributed for 94.96 and 1.80% of the variation, respectively.

With regards to seed traits, the first five canonical variates extracted from DFA were responsible for 99.98% of total variation (Table 15). Canonical loadings showed that CAN1 was determined and dominated by traits presented in Tables 15 and 21. The first variate represented 98.37% of the total variation explained by DFA and was highly correlated to most of the original variables aforementioned (P < 0.0001), but it was negatively correlated to SNP and SW. The second variate explained 1.16% of the seed parameter variation between genotypes and was negatively correlated to most variables.

DFA performed with focus on pod characters indicated that the first five variables demonstrated 99.79% of the total variation. CAN1 powered 90.93% and CAN2 explained 5.25% of the total variance. The original variables contributing to the variation observed are stressed in Tables 15 and 21.

Canonical analysis to identify genotypic differences for flower traits resulted in five variates that accounted together for 100% of the variability among genotypes inside a given species. The first canonical variate described 94.96% of the total variation (Table 15). Distribution of genotypes through canonical axes 1 and 2 showed a conspicuous divergence between the groups formed by accessions within each species with high correlation to traits beginning of flowering and period of flowering (Tables 15, 21).

Stem traits showed big variation within as well among species. Canonical outcomes indicated that five variates explained 100% of the total variation observed. Original variables with CAN1 contributed jointly to 97.85% of the variation, while CAN2 was demonstrated only 1.73% of the total variation (Table 15). The first variate was negatively linked with stem colour, early vigour and plant height. CAN2 was caused mainly by time of emergence (Table 21).

All original leaf variables associated with the five first variates from DFA contributed to dissect 99.53% of the amount existing in the germplasm studied. The first two caused 66.16% of the total variability observed, whereas the second explained 30.70% showing among all other variable sets the higher percentage for variation attributed to the second canonical variate. Main traits causing the powers of CAN1 and 2 are labelled in Table 21.

Traits related to many plant organs together, considerable variation was also observed in the yam bean collection. Thus, the first variate explained most of the total amount of variation (99.57%). The remaining variates were responsible for less than 1% (Table 15). The second variate and others explaining only additional variability discriminated accessions on the basis of weight of vines and leaves.

Using only the 50 quantitative or the 25 qualitative traits (Tables 16 and 17) indicated generally the same trends as previously described for the entire 75 characters investigated. As showed in Tables 22 and 23, all traits positively or negatively associated with CAN1 and CAN2 were highly variable and dispersion of genotypes within and between species differed significantly according to the plant organ considered.

Canonical ordinations of variables and genotypes along the first two significant axes presented also in general similar trends, as mentioned above, and their distribution was highly variable (Figs ESMs 1–3).

Among species, canonical analysis extracted five variates that were responsible for nearly 100% of total variation of the entire 75 quantitative and qualitative traits data with attention on storage root, seed, pod, flower, stem, leaf and composite characters (Tables 18–23). Distribution of whole species through canonical axes from DFA showed a conspicuous divergence between the groups formed by them. Their ordinations on the significant axes were attained by each assembling either all the 75 morpho-agronomic, or the 50 quantitative, or 25 qualitative characters; even by considering traits related to each plant organ separately (Figs ESMs 4 and 5). Species showed unequivocally distinct characteristics by appearing sometimes on both negative values for CAN1, or on high mean values for CAN1 and 2, or showing differences for diverse other combinations of variables along these first two canonical axes (Figs ESMs 4 and 5).

Even though species were well discriminated by DFA across both locations (Songhai and Niaouli), differences appeared in the importance of original variables and variable sets between environments as described above with the MANOVA results. As shown in Figs ESMs 6–8, DFA displayed diverse results for CAN1 and CAN2 in location Songhai or Niaouli. Moreover, these differences in the discriminatory of the CANs were linked to the characters related to varied plant organ of interest. Trait variation was displayed most inversely according to the location indicating high S × L interactions (Figs ESMs 6–8).

Effect size estimators and bias due to unequal sample size per species

In the study reported here, the total number of 34 entries was placed in three groups. The estimated bias due to unbalanced data error was computed and three effect size estimates were generated for all 50 quantitative and 25 qualitative traits evaluated, and summarized in Tables 24 and 25. In our experiment, it was noted that for forty-one traits, Ɛ2 and ω2 estimates were negative as indicated in Tables 24 and 25 provided in Table ESM1.

Discussion

Phenotypic variation

The coefficient of variation provides a measure of diversity for quantitative traits (Ferguson and Robertson 1999). A part of the results on descriptive statistics, including 33 traits, has been reported earlier in Zanklan et al. (2007). Clear variability existed in the yam bean germplasm studied for all quantitative inherited traits with somewhat reduced variation in characters related to biomass production, pod number per plant—a trait well associated to other agronomic traits—terminal leaflet length, inflorescence length, thousand seed weight, pod length, time of flowering, weight of vines and leaves, pod width, period of flowering, vine start of climbing and total harvest index. Trait variation ranged from 0.81 (for HITOT) to 49.35% (for number of storage root per plant) considering all traits and species evaluated. This indicated that genotypes belonging to all three cultivated yam bean species possess a high potential for biomass yield production and its components such as NSP, SEEY and SRDY. A remarkable diversity for most morphological and agronomic characters is shown at the intraspecific scale (Table 6), whereas overall diversity at the interspecific level was somewhat lower. Understanding the mechanisms making some sites confer more variability to the germplasm would be desirable to plan collecting missions, and to efficiently exploit the available genetic diversity in genebanks (Pecetti and Piano 2002). Our current results indicate that there is a wide differentiation among accessions and lines within and between the three cultivated yam bean species, both for quantitative and qualitative traits that can be used to breed for higher biomass, storage root dry matter content and yield.

The grouping of similar genotypes relies on the dissimilarity among them, which can be determined by a phenotypic diversity index (Upadhyaya et al. 2002). The average diversity index was similar in the three species, The Shannon–Weaver diversity index was calculated to compare phenotypic diversity index (H′) among traits and within- and between-groups. A low H′ indicates extremely unbalanced frequency classes for individual traits and a lack of diversity (Upadhyaya et al. 2002). Diversity estimates were performed for each trait and the three species as well as in the entire genus. H′ was then pooled across traits, species and the genus Pachyrhizus (Tables 9 and 10). The average H′ across traits was quite similar for species and the genus for quantitative characters. For qualitative traits, P. erosus, P. tuberosus and the genus presented the same trends, while in P. ahipa, the diversity was lower. The three species value of the H′ index for each trait or averaged across all quantitative or qualitative characters was neither correlated with the number of constituent accessions and lines per species, nor with environmental differences. The significant variation at the interspecific as well as the intraspecific scales suggests a differentiation of the species, likely related to the selective pressures in the environments of origin. Mean adjustment of adaptive traits takes place—in the long term—according to the prevailing environmental conditions of the location of origin (Pecetti and Piano 2002; Piano et al. 1996). In the present study, from the relative importance of among- and within-species variation (Tables 7, 8), such adjustment to a given environment may be realized. A relative high level of intraspecific variation, which is a primary factor of adaptation, can provide a buffering effect to the population to cope with the unpredictable, seasonal climatic fluctuations (Pecetti and Piano 2002).

Phenotypic variation estimated with Shannon–Weaver diversity index seemed to be equal in importance for quantitative as well as qualitative characters (Tables 9, 10) even though some differences were recorded between the two types of variables. This is indicating the equal usefulness of both variable types in studying genetic diversity in yam bean.

Døygaard and Sørensen (1998) observed no single trait separation among the three species when analysing 18 quantitative morphological traits in herbarium material, i.e. excluding agro-ecological traits, using principal component analysis. However, our results, based on 75 morphological and agronomic characters (50 quantitative and 25 qualitative) and field trials, showed a clear separation between P. erosus, P. ahipa, P. tuberosus ‘Chuin’ and P. tuberosus ‘Ashipa’ in both the principal component analysis and the cluster analysis. A clear separation between P. tuberosus and P. erosus had previously been observed by Estrella et al. (1998), who used random amplified polymorphic DNA (RAPD) markers; however, that study did not consider P. ahipa. A clear separation between P. tuberosus ‘Chuin’ and P. tuberosus ‘Ashipa’ was also observed by both Tapia and Sørensen (2003) and Delêtre et al. (2017), the first using a canonical analysis based on 70 morphological and agronomic traits, and the second chloroplast DNA and microsatellite markers. Tapia and Sørensen (2003) found a large genetic distance in P. tuberosus between ‘Chuin’ and ‘Ashipa’. The same was observed in this study and additionally, that the genetic distance between the ‘Chuin’ and ‘Ashipa’ group within P. tuberosus is as large as the genetic distance between ‘Chuin’ and P. erosus as well as between ‘Chuin’ and P. ahipa.

One limitation of the current study is the fact that we were only able to use one accession of the P. tuberosus of the cultivar group ‘Ashipa’ (TC118); however, this entry can be considered representative of the ‘Ashipa’ group (Tapia and Sørensen 2003). With five ‘Chuin’ entries we gave emphasis to the agronomically most important cultivar group within P. tuberosus due to the quality characteristics of the storage root. Together the P. tuberosus of the cultivar group ‘Ashipa’, entry TC118 from Haiti, and the five P. tuberosus of the cultivar group ‘Chuin’ from Peru, the clear differences concerning growth type and storage root nutrient content within P. tuberosus can be assessed in this study. The domestication of the P. tuberosus cultivar groups was studied by Delêtre et al. (2017) and the cultivar groups are still found in cultivation among many indigenous farmer communities in the Amazonas basin of Peru, Ecuador, Colombia, Brazil, Bolivia, and Venezuela, so that the existing ex situ germplasm collections of P. tuberosus maybe improved through renewed collection efforts. Such collection initiatives should be directly linked with the maintenance of the material in professionally managed genebanks. For this study it was only possible to obtain five ‘Chuin’ and three ‘Ashipa’ accessions and two of these ‘Ashipa’ samples could not be used because of lack of seed vigour. In contrast to the samples of P. tuberosus, the P. ahipa samples used in this study represent a very good coverage of the current diversity of this species.

Pachyrhizus ahipa had been considered extinct, because it had not been observed in the Andean fields in Peru and Ecuador (Grüneberg, pers comm.). However, in 1994/95 farmers in the remote valleys of Bolivia and Northern Argentina were found to be growing P. ahipa. This material was collected during two field trips of several months duration in 1994 and 1996 (Ørting et al. 1996). All the P. ahipa material used for this study—with the exception of AC102, AC524 and AC525—trace back to these two collection trips (Ørting et al. 1996) and were selected to cover the diversity between and within accessions. Most of this material is today available from CIP’s genebank (Table 1).

In contrast to P. tuberosus and P. ahipa, the species P. erosus can be easily found in cultivation—often on a commercial scale—in many tropical regions of the world. Nevertheless, like P. tuberosus there is no professionally managed genebank with a mandate for P. erosus and germplasm acquisition depends upon research institutions and botanical gardens providing seed samples, excepted of “INIFAP genebank Campo Experimental Bajio” at Celaya, Guanajuato, Mexico. The P. erosus material used for this study represents a broad sample for this species from different regions of the world. The sample of yam bean genotypes were assessed in two field trials in Benin, West Africa, and both experimental sites had the same soil type (well-drained sandy red loam). Average annual temperature at the experimental sites ranges from 23 °C in August to 28 °C in May and the sites are characterized by usually dry conditions. However, using irrigation we generated a wide range of water supply in these field trials ranging from about 400 mm to 1200 mm during the crop growth period of about six months. Using ArcGIS software to search environments in the world with a temperature range from 23 °C to 28 °C and a precipitation range from 400 to 1200 mm within at least six consecutive months we found such conditions present in 85 countries worldwide, mainly in Central and South America, West, Central, and South East Africa, South East Asia and the Pacific (Fig. 3). This relevant target set of yam bean production environments corresponds well with the known yam bean distribution, with the exception of the cultivation areas in Central Mexico, South India and South China. However, it was observed that in this environmental survey all areas of origin of the three species were included in the output. This might explain why all three yam bean species were performing well at our experimental sites, although they originate from different eco-geographic regions. The two experimental sites used in this study are representative of a very wide range of yam bean production environments in West Africa as well as in other tropical regions around the world.

Principal component and cluster analyses

The average Euclidian distance observed between P. tuberosus group ‘Chuin’ and P. tuberosus group ‘Ashipa’ was larger than the average Euclidian distance between P. tuberosus group ‘Chuin’ and P. erosus as well as that between P. tuberosus group ‘Chuin’ and P. ahipa. This is a clear indication that the three species are very closely related. That the relationships between the three cultivated yam bean species are very close is further indicated by the fact that the species are not, with one exception, separated by crossing barriers (Grum 1990; Sørensen 1991; Grüneberg et al. 2003). The diversity observed within P. ahipa and the P. tuberosus of the cultivar group ‘Chuin’ type (Figs. 1, 2) was nearly as large as the diversity observed within P. erosus. This was unexpected; because all P. tuberosus of the cultivar group ‘Chuin’ entries originate from Peru and all P. ahipa entries from Bolivia, whereas the P. erosus entries originated from several different countries in Central and South America and Asia. Moreover, within P. erosus, no clear subgroups related to geographic origin was detected which agrees with the results of Estrella et al. (1998). This suggests that in both Peru and Bolivia an extremely large amount of diversity can likely be found for cultivated yam beans. This also suggests that the yam bean was introduced to Asia from the Americas, which is in agreement with historical reports that P. erosus was introduced into Asia in the sixteenth century by the Spaniards via the Acapulco—Manila trade route (Sørensen 1996).

The first three principal components explained about 70% of the total variation in all traits (Tables 4, 5). This figure is high considering the large number of variables recorded in this study. Principal component analysis showed that, for the first principal component, all P. erosus entries had positive scores and all P. ahipa entries had negative scores (Fig. 1). This reflects the fact that P. erosus matures later, has a generally higher yield potential than P. ahipa and can morphologically be clearly separated from P. ahipa, because the first principal component was mainly associated with yield-related traits. With regard to yield potential and time to maturity, P. tuberosus, fell between P. erosus and P. ahipa (Zanklan et al. 2007). This was reflected by the positive scores obtained for the first principal component in P. tuberosus, which were always lower than those obtained for P. erosus and higher than those obtained for P. ahipa. However, P. tuberosus was mainly separated from P. erosus and P. ahipa by the high positive values obtained for the second principal component, which were associated with leaf and seed morphological characteristics, high storage-root dry matter and starch content and low storage-root fructose and glucose content as well as leaf length and shape.

In total, 75 characters were used in the present study. Recording such a large number of traits is both labour-intensive and time-consuming. This study shows that a character discard can be performed in describing cultivated yam bean genotypes, because many characters were highly correlated. The variation of the entire primary yam bean genepool considered by 75 agronomic and morphological traits can be determined by a few traits, which express low genotype-environment interactions and plot errors (Tables 4, 5, 13). Only seven highly heritable characters—namely inflorescence length, legume angle and shape of curvature, pod green colour, thousand seed weight, start of flowering, outline of terminal central lobe and lateral leaflet lobe—explain a large part of the total pattern of variation observed by the first two principal components. With three further characters—namely the curvature of the persistent style at the tip of the legume, width of storage roots, and terminal leaflet lobe type—nearly 70% of the genetic diversity may be determined in yam beans. Our results will be useful to identify characters that could be used as descriptors for cultivated yam bean accessions, genotypes and varieties. Nevertheless, the development of a descriptor list would require a greater number of accessions including new genotypes. However, to obtain a large diversity of cultivated yam beans for studies is difficult, because no professionally managed genebank has the genus Pachyrhizus as a mandate. Only for P. ahipa is there a clear mandate at CIP within the frame of the mandate for Andean Root and Tuber Crops. However, this collection has started to include a few P. erosus and P. tuberosus accessions that might be of interest for agronomy and breeding (Table 1).

Multivariate analysis of variance

Our results showed morpho-agronomic and quality trait heterogeneity within the collection of Pachyrhizus evaluated. With MANOVA, differences among the three species were significant (Table 14). The power of MANOVA declines with an increase in the number of response variable (Scheiner 2001). Unequal sample sizes are not a large problem for MANOVA, but may bias the results for factorial or nested designs (Chahouki 2011).

To take into account these observations, we performed two sets of statistical analyses, either with all 75 characters or the quantitative and qualitative separately. Our results showed no significant difference with the methodology, and no weakness of analysis was observed. Differences between single genotypes and species are based mainly on 21 traits (Tables 2, 3).

In all analyses involving the factor G (Genotype) and S (Species) including their interactions with other sources of variation, phenotypic variance was distributed across all eigenvectors. In the full MANOVA, the primary root accounted for about 43.41%of the variance (source G). Across locations, the primary root accounted for 43.50% (source G) (at location Songhai) and 43.47% (at location Niaouli) of variance associated with G. Applying MANOVA to tomato trial, Lounsbery et al. (2016) reported minor rank changes among genotypes across different locations. These findings are consistent with the results presented here in yam bean. Our results indicate that genotypes of yam bean with favourable phenotypic trait expression in terms of yield (storage root, seed and fodder), and related other morpho-agronomic characters exist and simple morphs characterized by BIOM, SRDM, SRDY, HIR, HIS, SEEY, TSW, BF, PF, TF, PH, PT, LS, WS, MWS, PRO, STA, SUC, GLUC, FRUC and high tolerance and resistance towards abiotic and biotic stresses could be selected for further breeding activities.

Discriminant function analysis

To classify accessions, a discriminant function analysis (DFA) was conducted using the entire set of 75 morpho-agronomic traits including 50 quantitative and 25 qualitative characters. Variables which have relatively high positive regression weights on a variate are positively inter-correlated as a group. Similarly, those having high negative weights are also positively inter-correlated, but negatively with those showing positive weights. The magnitude of the weights indicates the relative contribution of the original variables to each canonical variate. The total amount of variability were explained by two to five canonical variates considering the factor levels species, genotypes, location, and their interactions. DFA revealed a clear separation between genotypes within and among species. The discriminant function analysis based on the entire 75 traits explored and either on the 50 quantitative or 25 qualitative characters identified correctly nearly 100% of P. ahipa, P. erosus, and P. tuberosus with an overall average minor error (Tables 18–20, as presented in Table ESM1).

The two dimensional plots (Figs ESMs 4 and 5) obtained from the first two variates indicated the formation of three distinct groups represented by each of the species. We found no earlier morphometric studies in yam beans conducted under field conditions into that scale for comparison with the data analyzed in this report. Floral traits exhibited clear differences among species, but narrow variation within them. Similar observations were made for leaf, stem and pod characters. Among and within species floral variability was confirmed by DFA completing MANOVA results. Species were clearly separated into distinct groups by first canonical variate with almost all the variable sets investigated as shown in all graphics presented. Tables 9, 10, 11, 12, 13 and 14 show the relative contribution of each morpho-agronomic trait involved in the discriminant functions for genetic dissimilarity. Results indicate that all traits were with greater contribution to the genetic diversity. DFA results demonstrates further that genotypes varied in their phenotype dependent upon the environment, and the magnitude of that variation was very diverse relying on the trait, indicating the need for further researches on stability analysis upon the most important agronomic traits since the results presented here are highlighting significant G × L interactions.

Results from DFA in combination with those from MANOVA were more useful and powerful statistical tools than simple ANOVA, because considering variables in combination. With the DFA following the MANOVA, the complex interrelationships among dependent traits could not only be revealed, but could also be taken into account in statistical inference, which is not done in a simple ANOVA.

General assessment and comparison of multivariate analysis techniques

Investigation on intra- and interspecific variation in yam bean for as many traits as in this study using MVAs has not been undertaken to date. However, Tapia and Sørensen (2003) reported a high level of diversity in a germplasm collection of P. tuberosus using hierarchical grouping (Mahalanobis distances) and Duncan test. The present study indicates the existence of genetic diversity with superior characteristics that could be used in diverse breeding programs. The results presented here demonstrate clearly the congruity between the patterns of morpho-agronomic and quality characters along with genetic variation among the three Pachyrhizus species. All MVAs (PCA, cluster and regression analyses, MANOVA, and DFA) clearly separated the three species from one another. Using 50 quantitative and 25 qualitative traits with application of MVAs are in accordance with earlier findings indicating clear divergence between all yam bean genotypes investigated (Zanklan et al. 2007; Tapia and Sørensen 2003). Many genotypes possess high root and seed yields with quality suitable for diverse nutritional purposes. The information on diversity provides breeders with the ability to develop desirable types having high yield as well as better nutritional profiles. The reduction in the number of variables makes it easy to characterize and evaluate the performance of Pachyrhizus genotypes. Surprisingly, only one canonical variable accounted for about 90% of the total variation in the interspecific level apart from the results obtained on leaf traits (Tables 15, 17–20, as shown in Table ESM1), in contrast to the intraspecific scale, where more canonical variates were involved to elucidate the total variation. A good visualization of discrimination between species and accessions within and among the taxa is presented in scatter plots (Figs ESMs 1–5). The MANOVA and DFA were important in the study of morpho-agronomic and quality characteristics, allowing the simultaneous analysis of the most important attributes. Moreover, they facilitated the distinction of genotypes regardless of their taxonomic origin. Utilization of the multivariate techniques is therefore recommended in further studies in yam bean breeding.

Results from PCA, cluster and regression analyses, MANOVA, and DFA indicated that all those techniques have excellent predictive power for distinguishing among yam bean genotypes whichever taxa they are belonging to. However, we cannot definitely conclude that one method is better than the other since a judgment of these classification methods depends on the completeness of the data and the objectives of the study. With our results DFA was slightly more powerful than the others methods in classifying and discriminating Pachyrhizus genotypes on the basis of their morpho-agronomic and quality traits. Results from DFA also showed a range of possibilities to use diverse types of traits to discriminate between genotypes.

Compared to PCA (76.4%), the discriminant function analysis accounted for nearly 100% of the within and among variance when considering five axes. The discriminant analysis identified more clearly a number of traits to be used in future studies. A combination of all techniques would be most appropriate for describing the variation in yam bean germplasm and to design a collection strategy.

Conclusion

The study allows a better knowledge of the cultivated yam bean germplasm collection. Morpho-agronomic characterization using MVAs demonstrated significant intra- and interspecific variation and indicates significant differences between Pachyrhizus species for all individual or grouped traits. The statistical analysis was useful in identifying the most divergent variables within yam bean species and can be helpful in the future to advance progress in breeding programs.

In conclusion, the study’s results demonstrate that within each cultivated species a similar amount of diversity may be found and that the genetic distance between species is limited. Moreover, considerable diversity may exist within P. ahipa and P. tuberosus grown at both sides of the Andean mountain range. Since interspecific hybridisation is possible (Grum 1990; Sørensen 1991; Grüneberg et al. 2003), all three cultivated yam bean species may constitute an important source for breeding. The close relationship among species further supports the proposition that only a few highly heritable characters are required to describe the diversity within the yam bean genepool. The list of these traits may serve breeders and curators in germplasm management, acquisition and distribution.