Introduction

Genetic diversity provides species with the ability to adapt to changing environments. Several studies have been reported on the use of morphological descriptors to determine the genetic diversity among cassava genotypes (Rimoldi et al. 2010; Asare et al. 2011; Thompson 2013). Recent advances in molecular biology techniques have led to the development of important tools for genetic diversity study in several plant species. The accuracy in accession characterization may therefore, be enhanced/achieved with the use of molecular markers associated with morphological traits.

Previous studies in plant genetic diversity used DNA molecular markers for beta carotene improvement in cassava (Ferreira et al. 2008; Rimoldi et al. 2010), and included amplified fragment length polymorphism (Benesi et al. 2010), simple sequence repeats (Alves et al. 2011; Parkes 2009; Oliveria et al. 2012; Costa et al. 2013) and single nucleotide polymorphism (Kizito et al. 2005; Tangphatsornruang et al. 2008; Ferguson et al. 2011; Thompson 2013; Rabbi et al. 2015). With recent advances in high throughput genotyping technologies, single nucleotide polymorphism markers (SNPs) are increasingly becoming markers of preference for plant genetic studies and breeding.

SNPs are the most common types of genetic variation among species, involving just a change in a single nucleotide. Expressed Sequence Tags (ESTs) have been exploited to explain and detect SNPs in maize (Zea mays L.) (Ching et al. 2002) and soybean (Glycine max L. Merr.) (Zhu et al. 2003). Lopez et al. (2005) and Rabbi et al. (2014, 2015) have also reported SNPs detection from ESTs in cassava. Cassava being an outbreeding and highly heterogeneous crop, possesses an extreme level of phenotypic plasticity, and thereby, lacks the potential for unified classification system for cultivars (Kawano 1978). Consequently, characterization of agronomic traits becomes a challenge. To conduct a successful genetic diversity study on cassava germplasm in Sierra Leone, there is a need to unravel the genetic potential existing among Sierra Leone’s cassava breeding program, which consists of fourteen released varieties and provitamin-A cassava accessions induction from Institute of International Tropical Agriculture, Nigeria. Thus, the need for assessing and understanding the genetic diversity among the provitamin-A cassava accessions and identifying gaps to be filled within the breeding program in Sierra Leone is required.

The objectives of the study, therefore, were to characterize, quantify and exploit the diversity of 183 provitamin-A cassava accessions and five Sierra Leonean varieties using morphological traits, SNP markers and total carotene content and to develop a collection for conservation and future use in the breeding programmes.

Materials and methods

Germplasm sources and experimental design

The plant materials used in the study consisted of 183 provitamin-A cassava accessions known for their varying levels of provitamin- A properties, obtained from the International Institute of Tropical Agriculture (IITA, Ibadan, Nigeria) and established at the Taiama experimental site in Sierra Leone, in 2014 (Table 1) and five Sierra Leonean cassava varieties. The trial was established and evaluated during the cropping season of 2015–2016 at the Njala Agricultural Research Institute (NARC), Foya crop site, Njala, representing the transitional rain forest agro-climatic zone (Van Vuure et al. 1972; Odell et al. 1974). The trial was laid out in an Alpha lattice design with two replications, and each replication had four blocks with 47 entries per block. The blocks were separated by 1 m and 2 m alleys between and within blocks to reduce intra and inter block plant competition, respectively. Each entry was grown on 10 m row ridge at a spacing of 1 m × 1 m between and within ridges, respectively. Cassava cuttings of 20–25 cm length were obtained from healthy stem cuttings and horizontally planted.

Table 1 Provitamin–A studied accessions and sierra leonean varieties and their pedigrees

Morphological traits

Agro-morphological data was collected at 1, 3, 6 and 9 months after planting (MAP) on the parameters listed below using the IITA cassava descriptor (Fukuda et al. 2010) (Table 2).

Table 2 Parameters evaluated at 1, 3, 6 and 9 month after planting

Harvesting was done at 12 MAP (August–September). The following parameters were taken at harvest: number of marketable roots (expressed as count numbers), number of non-marketable roots (expressed as count numbers), total number of storage roots (expressed as count numbers), roots weight/tuber (kg), inner skin color, and outer skin color, ease of peel, root shape, marketable weight (kg), and non- marketable weight (kg). Dry matter content, expressed in percentage was determined by selecting three representative storage roots. Slices of the fresh root were randomly selected and weighed to obtain a 100 g fresh mass sample per genotype, before being dried for 48 h in an oven at 80 °C. The dried samples were then re-weighed to obtain the dry mass. Disease occurrence and intensity were mostly measured in the 1st, 3rd, 6th and 9th month after planting.

Molecular characterization

The Dellaporta method of DNA extraction (Dellaporta et al. 1983) was carried out at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria. For genotyping-by-sequencing library preparation, the ApekI restriction enzyme (recognition site: G|CWCG) that produces less variable distributions of read depth, and therefore, a larger number of scorable SNPs in cassava (Hamblin and Rabbi 2014) was used. Two 96-plex GBS libraries were constructed as described by Elshire et al. (2011) and sequenced at the Institute of Genomic Diversity at Cornell University, using the Illumina HiSeq 2500. Raw read sequences were processed through cassava GBS production pipelines developed using TASSEL 5.0V2. The GBS-derived SNPs were further filtered using the TASSEL software (Bradbury et al. 2007) to retain only polymorphic SNPs. Initially, filtered for minor allele frequency (MAF < 0.05), the generated 5634 SNPs were processed under the Next Generation Cassava project. The resulting SNP dataset was used for the diversity analysis study among the 188 cassava accessions already phenotyped and analyzed. Results from both the phenotype and genotype analyses were compared to check the correspondence between the two.

Data analysis

Agro-morphological data sets from this study were subjected to selected statistical packages for analysis. Analytical procedures comprised the following softwares and statistical procedures: descriptive statistics using XLSTAT (2010), MINITAB 15 and STATA 13. Principal Component Analysis (PCA) were performed using Princomp software to examine the structure of the correlations between the variables using SAS 9.3. Cluster analyses, based on Agro-morphological and SNP markers data sets, were performed to group observations together using the method of Ward’s minimum variance distance using SAS 9.4. A dendrogram was plotted from the computed similarity values for each Agro-morphological traits and SNP markers to show the relationship among the accessions. The provitamin-A studied accessions and varieties were grouped based on the varying levels of total carotenoid content.

Basic diversity indices for the 183 provitamin-A studied accessions and varieties were calculated using Power marker (Liu and Muse 2005) and GenAlex version 6.41 (Peakall and Mouse 2006). The Power maker software was used to generate the following statistics: number of alleles per locus, major allele frequency, observed heterozygosity (Ho), expected heterozygosity (He) and polymorphic information content (PIC) (Bostein et al. 1980). PIC values were calculated with the equation:

$$\text{PIC} = 1 - \Sigma \text{P}^{2} \text{i} - \Sigma 2\text{P}^{2} \text{i}$$

where: ΣP2i = sum of each squared ith haplotype frequency.

A Mantel matrix test (Mantel 1967) was carried out to compare the extent of agreement between dendrograms derived from morphological and molecular data using the distance matrices. The pairwise genetic distance (identity-by-state, IBS) matrix was calculated among all individuals using PLINK (Purcell et al. 2007). A Ward’s minimum variance hierarchical cluster dendrogram was built from the IBS matrix, using the analyses of phylogenetic and evolution (ape) package in R.

Results and discussion

Summary statistics of morpho-agronomic traits of 183 provitamin-A studied accession and varieties

Table 3 shows summary statistics of some morpho-agronomic traits of 183 provitamin-A studied accessions and varieties. Sprouting was only recorded in the first month after planting (MAP) and ranged from 65 to 100% among the 183 provitamin-A studied accessions and varieties with an average of 9.56 seeds sprouted in the first month. Severity scores for African Cassava Mosaic Disease Cassava Bacterial Blight and Cassava Green Mite variably ranged from 0 to nine in the studied collection consisting of the 183 provitamin-A cassava collection and the five varieties. Percent incidence for African Cassava Mosaic Disease, Cassava Bacterial Blight and Cassava Green Mite variably ranged from 0 to 9. Most of the morphological characters both quantitative and qualitative were taken in the 3rd, 6th, 9th and 12th MAP. Color of apical lobe ranged from 3 to 9 about a mean of 6.8 ± 1.61 3 MAP; whereas the same traits scored ranged from 0 to 9 about a mean of 6.71 ± 1.74 9 MAP. Plant height ranged from 65.5 to 284.5 cm at 6 MAP about a mean of 155.69 ± 26.12 cm. Leaf area ranged from 10.24 to 73.93 cm2 at 6 MAP; whereas leaf retention ranged from 1.75 to 4.5 at the same time. All yield related traits were recorded at 12 MAP. Yield per hectare ranged from 0.2 to 42.5 t/ha; while dry matter content ranged from 4.0 to 44.5% (Table 3). These parameters which were good indicators of growth showed considerable variation for the morpho-agronomic traits evaluated in the study, and the findings were in concordance with previous studies by Mbah et al. (2019) who reported that agro morphological parameters exert strong influence on cassava root yield. In the present study, descriptive analysis of the 183 provitamin-A studied accessions and varieties based on various traits showed high variability among the accessions. The significant variation observed among the 183 provitamin-A studied accessions and varieties studied for these economically important traits, such as African cassava mosaic disease, yield and dry matter content (DMC) offers a prospect for progress in cassava breeding program in Sierra Leone. Diversity studies of cassava germplasm has been widely undertaken worldwide (Bolanos 2001; Chavez et al. 2005; Morillo 2009; Fregene 2007; Parkes 2011; Njoku 2012; Thompson 2013) with little or no attention in Sierra Leone. These findings agree with the findings by Carvalho and Schaal (2001) who reported, in Brazil, a high degree of variability among 94 cassava accessions of Brazilian origin. Raghu et al. (2007) in a similar study, in India, also identified a high level of diversity among 58 cassava accessions from South Indian cassava germplasm based on 29 morphological traits. Lyimo et al. (2012) reported significant variability among 39 cassava accessions of Tanzanian origin using 14 morphological traits. Thompson (2013) observed a moderate to high diversity among 150 Ghanaian landraces and introduced accessions from IITA, Ibadan, Nigeria using 25 morphological traits in Ghana.

Table 3 Summary statistics of some morpho-agronomic traits of the studied accessions and varieties

Summary statistics of the genetic variation among the 183 provitamin-A studied accessions and varieties using SNP markers

Summary statistics for number of alleles observed, expected heterozygosity and polymorphic information content are presented in Table 4. The number of observed alleles ranged from 1.30 to 1.47, with an average of 1.38 alleles per locus. The expected heterozygosity was the lowest for TR 1233 (0.15) and SLICASS 6 (0.15) and highest in TR 1525 (0.23), with a mean of 0.19. The observed heterozygosity per individual observation ranged from 0.30 (TR 1233) to 0.47 (TR 1525) with a mean of 0.38. The mean of observed heterozygosity (0.38) was moderately higher than the expected heterozygosity (0.19). This substantiates the difference in the relatedness of most of the provitamin-A studied accessions which were developed from varieties of half sib families with different female know parental sources (Female plants) been pollinated by different sources. However, the major allele frequency (MAF) of all the ‘markers used in the observations was generally, below 0.95, indicating that they were all polymorphic. PIC values ranged from 0.11 in TR 1233 to 0.18 in TR 1199 and TR 1525 with a PIC mean of 0.14. The higher the PIC value the more informative is the marker. Since morphological traits are influenced by the environment, molecular markers which are not influenced or controlled by the environment are preferable in genetic diversity studies (Kaemmer et al. 1992; Gepts 1993; Njoku 2012; Thompson 2013). The study carried out by Kawuki et al. (2009) was the first published report where SNPs were used for genetic diversity studies in cassava. They characterized and identified some SNP markers and assessed their utilization in cassava genetic diversity analysis assessment. The present study seems to be the first reported case in Sierra Leone, where SNP markers were used in cassava diversity study of provitamin-A cassava accessions. Using the 5634 SNP markers, 95% of them were polymorphic. The informativeness of a genetic marker is measured by the polymorphic information content (PIC). The mean PIC value observed in this study (0.14) is relatively lower than previously reported. Indeed, Kawuki et al. (2009) reported a PIC value of 0.29 in 74 cassava accessions using 26 SNP: while Thompson (2013) also reported PIC value of 0.29 using 150 cassava accessions. PIC values for SNP markers in cassava are generally lower than observed in genetic diversity studies in other crops. For instance, Yang et al. (2011) reported PIC value of 0.34 in maize genotypes using 884 SNP markers.

Table 4 Summary statistics for number of alleles observed, expected heterozygosity and polymorphic information content

Principal component analysis among yield and yield related traits of 183 provitamin-A cassava studied accessions and varieties

The first five PCs together accounted for 70.44% of the total phenotypic variation among the 183 provitamin-A cassava studied accessions and five varieites (Table 5). PC1 axis had an eigenvalue of 4.44 and acounted for 27.74% of the total variation, whereas PC, PC3, PC4 and PC5 axes had eigenvalues of 3.1, 1.45, 1.11% and 1.09% acounted for 19.8%, 9.03%, 6.95% and 6.82% of the total variation, respectively. Marketable root, marketable weight and yield had positive loadings on PC1. Non-marketable weight, storage root weight and ease of peel had positive loadings on PC2. Unmarketable root and total number of storage roots had negative loadings in PC3. Root Size had a positive loading in PC4 and Inner color had a positive loading in PC5.

Table 5 Principal component analysis of yield and yield related traits

Principal Component Analysis is a technique which identifies plant traits that contribute most to the observed variation within a group of 183 provitamin-A studied accessions and five varieties. The tool has a practical application in the selection of parent lines for breeding purposes and varietal development. The cumulative variance of 70.44% by the first five axes with eigen values > 1.0 indicates that the identified traits within these axes exhibited great influence on the phenotype of these accessions, and could effectively be used for selection among them. This study agrees with findings of Afuape and Nwachukwu (2005; Afuape et al. 2010), who reported a cumulative variance of 70.09% for the first three axes in the dry evaluation of nine sweetpotato genotypes, weight of total roots, weight of biomass, and dry matter as the important traits that distinguished the elite materials been researched on.

Cluster groupings of the studied accessions and varieties based on morpho-agronomic traits using ward’s minimum variance and SNP markers

Agro-morphological traits diversity analysis: The dendrogram constructed based on the data generated from the agro-morphological traits divided the provitamin-A studied accessions and five varieties into six major clusters (A to F), and at a genetic distance of 0.30, and each had sub clusters apart from Cluster A (Table 6). Cluster A consisted of only two cassava accession germplasm with no sub clusters. Cluster B, had two sub cluster, Cluster D recorded the highest number of accessions, 57 in total, followed by Cluster E and F, grouping 53 and 34 accessions, respectively. In general, most of the accessions in this study were grouped according to their morpho-agronomic traits and geographical location. For example, the accessions in major Cluster E scored similar values for most of the morph-agronomic traits studied. Three out the five Sierra Leonean varieties developed in Sierra Leone were grouped into cluster F: while cluster B and D contained only provitamin-A studied accessions introduced to Sierra Leone in the form of seeds from IITA, Nigeria, and had a discrete pattern of clustering, which have been grouped more or less per their state, geographical distribution or country.

Table 6 Cluster groupings of the 182 provitamin-a studied accessions and five sierra leonean varieties based on morpho-agronomic traits using ward’s minimum variance

SNP markers diversity analysis: The 181 Provitamin-A cassava accession germplasm and 4 Sierra Leonean varieties were grouped into nine clusters based on the 5643 SNP markers (Fig. 1). Clusters A, B, C, D and E, had 21, 7, 11, 8, and 16 accessions, respectively; while cluster F, G, H and I consisted of 10, 47, 50 and 17 accessions, respectively (Table 7). Clusters A, B, C, E, G, H and I had 3, 1, 2, 4, 9, 10, and 1 accessions with varying levels of total carotenoid content. Cluster I consisted of only one provitamin-A studied accessions.

Fig. 1
figure 1

Dendrogram of 182 Provitamin-A studied accessions and Sierra Leonean varieties based on SNP markers

Table 7 Cluster groupings of the 181 Provitamin-A Studied Accessions and Sierra Leonean Cassava varieties based on SNP Markers

Correlation Analysis between Clusters from Agro-Morphological Traits and SNP Makers: A comparison of the two dendrogram based on Mantel matrix test showed a significant positive, but weak correlation between the morphological and molecular data sets (r = 0.104, p < 0.034). In a similar study, Raghu et al. (2007) mentioned that 24 morphological traits out of 28, contributed to the total variation observed. Here, our clustering study showed six and nine distinct clusters based on morphological and molecular analyses, respectively, indicating a large variability in the collection. In a similar study, Carvalho and Schaal (2001) identified 22 distinct clusters using 94 cassava accessions in Brazil, whereas Raghu et al. (2007) identified six distinct groups using 58 accessions. Our study is, therefore, in agreement with all these studies. Although the morphological and SNP data grouped the accessions into six and nine distinct clusters, respectively, some similarities were observed. Accessions TR 0747 and TR 0365 which were selected as provitamin-A studied accessions were found to be closely similar using both morphological and genetic markers. This could explain why the morphological and molecular analysis showed similar accessions between the two clusters. There are no reports on the genetic diversity of provitamin-A cassava accessions using morphological traits, molecular markers and total carotenoid content so far. This remains the first study using morphological, genetic diversity characterization and total carotenoid content levels of our provitamin-A cassava accessions in Sierra Leone.

The study reveals a moderate degree of diversity among the provitamin-A cassava accessions and varieties which can be further used for crop improvement. This may provide an opportunity to enhance and boost the breeding strategy.

Thirty provitamin-A studied accessions with varying levels of total carotenoid content, yield and dry matter content

The 30 accessions grouped in the different clusters were selected as provitamin-A studied accessions for formation of core collection, conservation and improvement in the breeding program. These accessions were selected based on the higher levels of total carotene content after laboratory analysis using color chat and the i-check device. The core selected provitamin-A cassava accessions across different clusters revealed significant variation of total carotenoid content, yield and dry matter. These provitamin-A cassava accessions TR 0998, TR 0222, TR 1337 and TR 0461 contained higher levels of total carotenoid content with TR 0365 been the lowest. Dry matter content ranged from 12.5 (TR 0696) to 39.5 (TR 1208) with yield ranging from 2.0 (TR 0461) to 22.8 (TR 0232) in the study provitamin-A accessions. TR 0747, TR 1337, TR 0232, TR 0998 and TR1755 clustered similarly morphologically and genetically (B, D, E, E and E). The wide range of total carotenoid content, dry matter content, yield, and distribution of morphological variability revealed in the study might provide a broader scope for the crop’s improvement through hybridization and selection. The higher dry matter content and significant variability observed in some provitamin-A cassava accessions in this study contradict findings reported by Esuma et al. (2012) who reported high DMC and low total carotenoid content for local white cassava root varieties using the Ugandan landraces.

Conclusion

The present morphological and molecular assessment studies reported that provitamin-A cassava accessions in Sierra Leone have moderate to high diversity based on total carotenoid content, morphological, and molecular assessment (Table 8).

Table 8 Thirty provitamin-A studied accessions with varying levels of total carotenoid content, dry matter content and yield

The inter-relationships of morpho- agronomic factors in determining cassava fresh root yield based on provitamin-A cassava accessions require additional research to fully understand concept of improving total carotenoid content and yield on provitamin-A cassava accession germplasm. Even though the agro-morphological traits are generally employed to estimate genetic diversity in crop plants, such a method has its own limitations as the traits are heavily influenced by the environmental conditions and climate being the main factor influencing the growth and development of the species (Cadena Iniguez and Arevalo Galarza 2011). This also confirms the importance of molecular techniques and markers on Provitamin-A cassava accession germplasm to carry out successful research and improvement studies. The present study has revealed that during provitamin-A cassava variety development, high dry matter content (quality trait) is a priority trait that should be considered at both primary and advance (yield evaluation) stages with good root qualities to facilitate adoption after varietal release.

Finally, the genetic diversity revealed from this study would provide the cassava breeding program in Sierra Leone an opportunity to boost the breeding strategy on crop genetic improvement for Provitamin-A cassava varieties with end-use preferred traits (total carotenoid content, dry matter, yield and African cassava mosaic disease resistance).