Introduction

Africa depends on the climate-resilient, starchy root crop, cassava (Manihot esculenta Crantz), to feed a substantial proportion of its’ population. Annual production in sub-Saharan Africa is estimated at 178 million tons, which is 61% of global production (FAOSTAT 2022). Cassava serves as a food security crop because of its concentrated source of carbohydrate, ability to produce a yield under diverse climatic conditions, ability to remain in the soil without deteriorating until needed, and its amenability to staggered harvesting (De Vries and Toenniessen 2001). It can also be used for animal feed and industrial applications for income generation (Li et al. 2017). Nevertheless, on-farm yields in Africa are low; 8.62 Mt/ha in 2020 compared to 14.6Mt/ha in Latin America in the same year (FAOSTAT 2022). This is largely due to farmers growing traditional landraces, which are often inherently low yielding and susceptible to widespread virus diseases, including cassava mosaic disease (CMD) and cassava brown streak disease (CBSD) (Legg et al. 2014). Landraces of cassava, often referred to as farmer-cultivars or heirloom cultivars, have been grown for many years, being subject to natural and artificial selection by farmers, and represent an important source of genetic variation for plant breeding. It is important that this diversity is collected, evaluated, and conserved as some of these landraces may be vulnerable to extinction through susceptibility to viruses or replacement by modern cultivars.

In Comoros, cassava is the main food crop after banana, a major source of calories and the most resilient of local food crops. The Comoros is a group of islands located at the northern end of the Mozambique Channel of the Indian Ocean, between Madagascar and the southeast African mainland, about 290 km off the eastern coast of Africa. Agriculture contributes 35.6% to the country's GDP and employs 68% of the country's labor force and accounts for 90% of exports (IFAD 2016). Cassava germplasm in these islands comprise both landraces and improved cultivars that could have been introduced from neighboring countries, or when cassava was first introduced to the East African coast in the 17th or eighteenth centuries (Carter et al. 1992). A first step in improving cassava production in Comoros is to collect and conserve existing landraces and farmer knowledge. By being cognisant of this information, breeders are more likely to breed or introduce appropriate improved cassava cultivars that respond to the intricacies of farmers' preferences with implications for positive varietal adoption. After collection of landraces, it is important to identify duplicates in the collection as farmers often give different names to the same cultivar and thus define a unique set of germplasm for conservation and for breeders to utilize. In addition, germplasm must be evaluated, to rapidly identify any obvious, interesting characteristics, and to determine any genetic structure, or relationship to germplasm from neighboring regions.

Although traditional agro-morphological characterization is useful to assess general features of the germplasm, and identify any landraces with striking characters, molecular characterization provides a far greater number of makers that are more stable and less influenced by the environment for diversity assessment (Tiago et al. 2016). Several molecular marker technologies have been used to characterize cassava germplasm to identify genetic variability. These include Random Amplified Polymorphic DNA (Asante and Offei 2003), Amplified Fragment Length Polymorphism (Roa et al. 1997), microsatellites (Hurtado et al. 2008) and more recently, Single nucleotide polymorphic (SNP) markers (Kawuki et al. 2009; Ferguson et al. 2019, 2021; Pierre et al. 2022). The objectives of the present study were to (1) document farmers’ knowledge relating to the genetic material collected, (2) assess the cassava genetic variability and diversity available in Comoros using morphological and molecular characters and (3) identify a unique set of cassava germplasm for conservation, further evaluation and utilization by breeders.

Materials and methods

Collection of the germplasm

Cassava germplasm collections were conducted by the National Research Institute for Agriculture, Fisheries and the Environment (INRAPE) of Comoros (from whom permission to collect was obtained), in collaboration with the International Institute of Tropical Agriculture (IITA) in July 2019. At this time the crop was nearing maturity at approximately 10 months after planting. All the collections were carried out under the terms of the International Treaty of Plant Genetic Resources for Food and Agriculture (ITPGRFA) (http://www.fao.org/plant-treaty/en/). Three of the largest islands of Comoros; Ngazidja, Ndzouani and Mwali were targeted, with the objective of collecting as many different cultivars as possible covering the different agro-ecological production zones and to document related farmer knowledge. The initial choice of municipalities and regions was determined according to the following criteria: (1) whether the locality was known for its diversity in cassava; (2) whether cassava was traditionally grown, (3) whether it was subject to high or increasing disease pressure, (4) whether the landraces were somehow endangered, such as through replacement by improved cultivars or (5) environments with distinct agro-ecological characteristics where specific adaptation may have occurred.

Key informants (farmers), who had a reputation for growing or conserving landraces were identified with the help of extension agents or village elders. At each farm, three questionnaires were conducted according to Cox et al. (2014). These were designed to capture farmer knowledge and preferences: (1) a ‘Key Informant Interview’ to profile each farmer, their farm and identify cultivars that they grew, (2) an ‘Individual Landrace’ form documented information on the history of the landrace, farmer’s experiences and preferences of the landrace, both of agronomy, response to biotic and abiotic stresses and culinary properties, and (3) a germplasm collection form, to document morphological and agronomic descriptors of individual landraces, based on Fukuda et al. (2010), and describe the collection environment. Yield potential was assessed by enquiring about the number of plants that would need to be harvested to fill a 25 kg bag. Collection and documentation of farmers knowledge followed the ITPGRFA and the terms of the Convention on Biological Diversity including the Nagoya Protocol. Prior verbal informed consent was obtained from all participants for inclusion in the study.

Decision to collect a stem cutting of a landrace was made according to the following criteria; (1) a local cultivar bearing a name which had not been collected before, (2) the local cultivar name had been collected before, but in a different region, or (3) the collectors identified unusual or interesting characteristics. After selection, a collection number was assigned to uniquely identify the landrace, and the three collection forms were completed. Two to three stems from a single plant of each collected cultivar were then sampled and labelled. The label included at least the collection number, the local cultivar name and the town. Photos of different parts of each cultivar collected were also taken. On returning to the station, stakes of each cultivar collected were planted in a field genebank at INRAPE.

A database with all information documented during the collection missions was constructed according to that outlined in Ferguson et al. (2021). The database was queried to summarize information relating to the collected landraces and farmer profiles. Co-ordinates defining sampling sites were plotted on a map using GADM (https://gadm.org/license.html) (GADM 2023).

Agro-morphological evaluation

Five stem cuttings per plot of each landrace sampled during the collection mission were planted in a randomized complete block design with two replicates at the INRAPE station for morphological characterization according to Fukuda et al. (2010) in July 2019. Plants were scored on an individual basis and an average taken for quantitative measurements, and the most frequent score for qualitative measures. A list of the traits scored, with abbreviation, equivalent Term ID from Cassavabase (www.cassavabase.org) (Fernandez-Pozo et al. 2015) and data type is provided in Table 1.

Table 1 Quantitative and qualitative traits scored with abbreviation, Term ID from the trait ontology in Cassavabase and data type

Analysis of phenotyping data

Before the agro-morphological data was analysed, duplicates as identified through genetic fingerprinting (genotyping) (see below) were removed, leaving a dataset of 46 unique landraces. Initially a univariate analysis was performed. The quantitative (continuous) data were analyzed using the linear mixed model. For each trait, both clone and replication were considered as random effects. Best Linear Unbiased Predictions (BLUPs) of clone and associated standard errors derived from the linear mixed model were computed. Heritability for each trait was estimated using the method of Cullis et al. (2006) and all traits with a heritability less than 0.1 were excluded from the analysis. The relative frequency distribution of qualitative data that lists the number of occurrences for each trait level was computed and visualized.

A two-step multivariate clustering analysis was then adopted, for both traits and clone. Step 1 was a dimension reduction, i.e., reducing a mixture (high number) of qualitative and quantitative traits to a small number of quantitative synthetic variables, each of which is a linear combination of the original set of traits. A hierarchical clustering algorithm was then implemented, based on PCAMIX (Kiers 1991), a generalization of Principal Component Analysis (PCA) and Multivariate Correspondence Analysis (MCA). All the traits linked to a synthetic variable are classified in the same cluster, which is considered to be homogeneous. Step 2 was to group the clones using the small set of synthetic variables using the hierarchical agglomerative criterion of Ward. The optimal number of clusters was determined using 30 indices to identify the best clustering scheme by varying all combinations of number of clusters, distance measures, and clustering methods.

Quantitative univariate analyses were carried out in R, Version 4.0.5 (R Core Team 2018) using ASReml-R (v4.1) (Gilmour et al. 2014) for fitting the mixed models while the relative frequency distribution was calculated for each of the qualitative traits and visualized in R using the tidyverse package (Wickham et al. 2019). Multivariate clustering analysis were performed in R (R Core Team 2018) using the ClustOfVar package (Chavent et al. 2017) for the clustering of the traits, the NbClust package (Charrad et al. 2014) for the optimal number of clusters of the clones, and the factoextra package (Kassambara and Mundt 2020) for plotting the dendrograms.

Genotyping

DaRTSeq SNP genotyping was conducted to (1) identify duplicates within collections made in Comoros, (2) determine the identification of Comoros landraces using an existing set of references, and (3) determine the relationship of Comoros landraces to those from Tanzania (Ferguson et al. 2021). For Comoros germplasm, four 6 mm diameter leaf disks were sampled from young leaf tissue of one plant per cultivar from the field genebank at INRAPE and placed in a tube in 96 well format and dried in an oven overnight at 45 °C. Samples were shipped to Intertek, Adelaide, South Australia, for DNA extraction and forwarding to Diversity Array Technologies (DaRT), Canberra, Australia, for DaRTSeq™ SNP genotyping (Sansaloni et al. 2011). Existing genotyping data was used for the other genotypes included from Tanzania, which were co-analysed with new data from Comoros.

Analysis of genotyping data

To identify duplicates in the Comoros collection and determine any matches with known landraces and breeding lines, genotyping results of 76 genotypes with prefix FMO, collected in Comoros were analysed together with 34 landraces (Online Resource 1) and 96 breeding lines (Online Resource 2) which included 21 known duplicates (7 breeding lines and 14 landraces) (a total of 214 entries). To combine data from Comoros germplasm and the reference genotypes, raw reads were combined by DArT and SNPs were called on the combined data set. All SNPs with more than 1% missing data were removed from the analysis, together with all monomorphic loci and genotypes with more than 5% missing data. Data was not filtered for minor allele frequency. Euclidean distances was calculated among clones to determine genetic distance which were then visualized as a dendrogram. This was converted to a similarity by subtracting from 1. Duplicates were identified, based on a cut-off of greater than 85% similarity. This was determined from the lowest similarity of 21 known duplicates which was Mumba (0.852). To select the most representative from a set of FMO duplicates, the one with the highest mean similarity was selected. For duplicates with two clones, the average will be the same so one was selected at random.

The correlation of the distance matrix generated from morphological data with that from genotypic data was calculated using the Mantel test.

The relationship among islands according to cassava cultivars

To determine the relationship of clones among islands a Discriminant Analysis of Principle Components (DAPC) using 50 principal components and two eigenvalues was conducted. Population structure was further described using the Weir and Cockerham (1984) overall and/or pairwise estimators of Wrights F statistics (Fst and Fis). The fixation index Fst is a measure of the identity of individuals within sub-populations compared to individuals from other sub-populations within the total population or the total homozygosity due to the Wahlund effect. It is thus a measure of differentiation between subpopulations. Fst = 1 − (Hs/Ht), where Hs is average of intra-population genetic diversity and Ht is genetic diversity across populations considered as a single population (total diversity). According to Wright: 0 < Fst < 0.05 is weak differentiation; 0.05 < Fst < 0.15 is moderate differentiation; 0.15 < Fst < 0.25 is significant differentiation; and Fst > 0.25 is very important differentiation (Wright 1965).

Fis measures the identity (or homozygosity) of alleles within individuals within sub-populations as compared to Hardy– Weinberg expectations, it is thus a measure of deviation from random mating. Finally, FIT is a measure of homozygosity of individuals in the total population and thus measures the deviation from Hardy–Weinberg due to local deviation from panmixia and Wahlund effect. In addition, diversity was described using observed heterozygosity (Ho) which is the proportion of heterozygous individuals in a population, and expected heterozygosity (He), also known as Nei’s unbiased estimator of gene diversity, both calculated by population and overall.

The relationship of Comoros landraces to those from Tanzania

For the purpose of determining the relationship of Comoros landraces to those from Tanzania, SNP data from 402 Tanzanian genotypes, published by Ferguson et al. (2021) was used together with the dataset of 76 genotypes from Comoros, which included the duplicates. This resulted in a dataset of 478 genotypes. Again, to combine datasets, raw reads were aligned by DArT and SNPs were called on the combined data set. All data filtering was as described above.

Genotypes were assigned to one of 11 regions, depending on where they were collected; these were; Mwali (16 samples), Ndzouani (19) and Ngazidja (41) Islands of the Comoros, the Lake (58), Western (30), Central (61), Southern Highlands (79), Coastal (53), Northern (65) Zones of Tanzania and the Islands of Zanzibar (33) and Pemba (20) in Tanzania. As described above, the relationship between clones was illustrated using a dendrogram based on euclidean distance. In addition, a Discriminant Analysis of Principle Components (DAPC) was conducted to visualize the relationship between populations. Nei’s distance (Nei 1972, 1987) was calculated based on geographical location and a dendrogram constructed. Analysis of genotypic data was carried out in R, Version 4.0.5 (R Core Team 2018). The relationship between unique clones was explored using a dendrogram based on euclidean distance using DaRTR (Gruber et al. 2018) package with the tree constructed using hclust in phyclust package (Chen 2010; Chen et al. 2013) and plotted using the package ggtree (Yu et al. 2017). The Discriminant Analysis of Principle Components (DAPC) was performed using adegenet package (Jombart 2008) and the population structure described using hierfstat package (De Meeûs and Goudet 2007; Goudet and Jombart 2021). Nei’s distance (Nei 1972, 1987) was calculated based on geographical location using heirfsat package (De Meeûs and Goudet 2007). The correlation of the distance matrix generated from morphological data with that from genotypic data was calculated using the Mantel test using the ade4 library (Bougeard and Dray 2018).

Results

Collection missions

Collection sites

Cassava landraces were collected from 34 production sites across the three islands of Ngazidja, Ndzouani and Mwali (Fig. 1). The average altitude of farms was 471 m asl in Ngazidja, 375 m asl in Ndzouani and 161 m asl in Mwali. A full report on the collection mission (in French) can be found in Online Resource 3.

Fig. 1
figure 1

Location of collection sites on three Islands in the Comoros, Ngazidja, Ngazidja and Mwali

Profiles of surveyed farmers and their farms

Seventeen cassava farmers (key informants) were surveyed across the country, including nine from the island of Ngazidja, five from Ndzouani and three from Mwali (Fig. 1). Men represented 88% of farmers surveyed in Ngazidja, 100% in Ndzouani and 33% in Mwali. The age of key informants varied between 30 and 80 years old. The majority of people interviewed (82%) were heads of households who made the decision on which of each cassava cultivar to grow, and the quantity. They were selected by the extension personal based on their knowledge on the history of cassava cultivation in their respective areas.

Forty-seven percent of farmers had grown cassava for between 5 and 19 years, the same proportion had grown the crop from between 20 and 49 years and six percent had grown cassava for more than 50 years. Forty-one percent of key informants had no level of education and depended on agriculture as their main source of income, 23.5% of farmers had a primary education level and also depended on agriculture as a main source of income. An equal number of farmers had secondary education and also depended on agriculture as their main source of income and only 12% of the farmers interviewed had a higher level of education and a formal job that provided them with a main income. All of the farmers surveyed had less than one hectare of agricultural land and cultivated between 50 and 100 cuttings of different cultivars of cassava. All farmers practiced intercropping of cassava with others crops such as corn, peanuts, taro and bananas.

The local cultivars collected

Among the 79 local cultivars collected, 41 were from the island of Ngazidja, 21 from Ndzouani and 17 from Mwali (Online Resource 4). In general, the cultivars collected had local names. The translation in French of some of them indicated certain characteristics of the cultivar such as "Mdja" which means "comes early", "Mhogomoudou" which means "black manioc", "Katsayidoudou" which means "dry and bitter". These local cultivars were named by the farmers according to their morphological or agronomic traits and, in some cases, in according to the person who introduced them to the region or according to the place where they originated. Some of the local cultivars had more than one name even within the same region. Ten of the 79 cultivars collected did not have a name, so were assigned a name according to the name of the farmer who supplied them, for example Djaf 1 and Djaf 2. It was estimated that 20% of the collected cultivars arrived in the Comoros before 2010, 34% were estimated to have arrived in the various regions after 2010 and the remaining 46% had an unknown time of arrival in Comoros.

Increase, decrease or rejection of the cultivation of cassava cultivars

Over the past five years, the cultivation of 36 cultivars among the 79 cassava cultivars collected (45.5%) had increased in the different regions of the country. Reasons for this included higher yield, earliness in production, improved taste quality and possibly resistance to diseases. In addition, 38 cassava cultivars saw their production stagnate or decreased in the different regions because of their low yield, their high sensitivity to diseases and their long production period. The production trend of five cultivars among the 79 collected during the last five years was not known by the farmers surveyed.

In almost all the regions of the Comoros archipelago, the different cultivars of cassava were grown for sale and personal consumption from the leaves and storage roots. No other use was mentioned by the farmers surveyed. It is reported that 44.6% of the cultivars collected were sweet types that are often consumed directly after boiling in water, and the same number were intermediate in taste. The farmers surveyed reported the existence of three bitter cultivars (Katsa yidoudou, Yidoudou and Maweni Med 1) which were processed by drying in the sun and making into flour before consumption.

The yield potential of local cultivars

Yield potential was assessed in terms of the number of plants that would need to be harvested to fill a 25 kg bag. Thirty-nine percent of cultivars were in the highest yielding category, in which two to eight cassava plants would be needed to fill a 25 kg bag, while 20% of the cultivars collected needed 10 to 19 plants to be harvested. Fourteen percent of the cultivars collected had a low yield, where it was necessary to harvest between 20 and 40 plants of cassava to fill a 25 kg bag. The yield was very poor for 3% of cultivars collected where it was necessary to harvest more than 40 plants to fill a 25 kg bag. For 24% of the cultivars collected, farmers have no estimate of their yield. Duration of storage of cassava in the soil after maturation varied from between 6 months and 5 years, however the majority of farmers indicated that the taste decreased with increasing storage period in the soil.

The sensitivity of landraces to biotic and abiotic stresses

It was indicated that 83.5% of cultivars collected performed well under conditions of drought, and yield did not change substantially. For the remaining cultivars, farmers had no idea about their ability to tolerate drought. In terms of maturity, farmers classified the cultivars collected into three groups: 41 were considered early cultivars, whose maturity period varied between 6 and 8 months; 16 were considered medium maturity taking 9–12 months to reach maturity, and 14 were considered late maturing, taking 13–24 months to reach maturity. For the remaining eight cultivars, the farmers had no idea about their maturity period.

At the time of collection, according to observations of a participating cassava breeder, 59% of cassava landraces did not show CMD symptoms, while 72% of landraces did not show CBSD symptoms. The incidence of CMD was higher on the island of Ngazidja, than on Ndzouani and Mwali, with CBSD being observed exclusively on Ngazidja. The majority of farmers were not able to easily recognize the symptoms of these two diseases. Farmers had noticed that the sensitivity to CMD and CBSD was more pronounced at lower elevations. They also reported the presence of parasites such as green mites and mealybugs.

Culinary qualities and market demand

In terms of the perceptions of cooking quality, 68% of collected landraces were considered to be good, with a light colour and a short cooking period. Ten cultivars (13%) were classified by farmers as hard, requiring a very long time to cook. They were frequently dried and stored for specific preparations with coconut milk. The remaining 15 cultivars (19%) could not be assessed in terms of cooking quality.

A market demand analysis was carried out based on knowledge of selling different cultivars of cassava at the market. The scale varied from “cultivars in high demand” to “demand unknown”. After consolidation of cultivars based on genotyping results which identified identical clones, six cultivars were identified as being in particularly high demand, based on high demand of all synonyms. These were Trewe_FMO73 (synonyms FMO32, -41, -7 and -75, collected on Mwali and Ngazidja. HS52_FMO7 was identified as being in exceptional demand), Mze Hamidou_FMO37 (synonyms FMO27, -44, -56 collected on Ngazidja and Ndouani), Mdja_FMO1 (synonyms FMO19, -63, -50, collected on all three islands), H57_FMO13 (synonym FMO21, from Ngazidja), Mlaregno 16_FMO40 (synonym FMO47 from Ngazidja and Ndouani) and Mlaregno_FMO49 (synonym FMO48 from Ndouani) (Online Resource 5). In addition, Namanga_FMO8 with no synonyms was in exceptional demand in Ngazidja, and an additional 10 cultivars with no synonyms in high demand (Online Resource 5) across other islands. The presence or absence of disease symptoms at the time of collection is indicated in Online Resource 5. Interestingly CBSD was only observed on Ngazidja, and not in the other two inlands. Of note was H57_FMO13 (synonym FMO21, both from Ngazidja) which was in high demand and recognized as being tolerant to both CMD and CBSD. In addition, Mlaregno 16_FMO40 (synonym FMO47 from Ngazidja and Ndouani) was consistently scored tolerant to both CMD and CBSD. The most popular cultivars were sweet and softer such as H57_FMO13 and Mlaregno 16_FMO40 (and their synonyms), whereas the least requested were hard and bitter cultivars such as Maweni Med 2_FMO22.

Conservation practices for germplasm

To ensure the conservation of local cultivars, farmers grow cassava in several plots, and plant at several times throughout the year. This method of conservation is supplemented by sharing or exchange of genetic material between farmers in the same region and sometimes even different regions or islands. Other farmers conserve cassava cultivars by delaying the harvest until the next planting season. Exchanging cassava cuttings as a gift is a common practice, especially in Ngazidja, however this does promote the spread of disease when exchanged cuttings are infected. On the other two islands planting material is mostly purchased. This offers the possibility of being able to choose the cultivars which are free from disease and parasites.

Analysis of agro-morphological / phenotypic data

Establishment of cuttings at the INRAPE station in M’Be ranged from zero to 100%, with an average of 88%.

Qualitative traits

The percentage distribution of scores in qualitative trait categories is given in Fig. 2a–c for stems, leaves and roots respectively. The most frequent scores for each leaf trait were elliptic-lanceolate light green leaves with seven lobes, green veins and smooth margins. Petioles were most frequently horizontal and red and apical leaves dark green without pubescence. Stems were most frequently straight, cylindrical with prominent leaf scars which were fairly close to one another. Stems were silver with a light brown epidermis and a light green cortex. End branches were most frequently green-purple in colour. The branching habit of the plants was mostly dichotomous, followed by erect.

Fig. 2
figure 2

The percentage distribution of scores in qualitative trait categories for a stems, b leaves and c roots. Abbreviations for the traits scored are according to Table 1. a ShPLT, shape of plant; PromLScar, prominence of leaf scar; GHStem, growth habit of stem; DisBLFScars, distance between leaf scars; CoStemEx, colour of stem exterior; CoStemEp, colour of stem epidermis; CoStemCor, colour of stem cortex; CoEndBAP, colour of end branches of adult plant; BH, branching habit. b SCenLF, shape of central leaflet; PubALF, pubescence of apical leaf; OrPet, orientation of petiole; NoLFLo, number of leaf lobes; LoMar, lobe margin; CoPet, colour of petiole; CoLFVein, colour of leaf vein; CoLF, colour of leaf; CoApLF, colour of apical leaf. c TexRtEpi, texture of root epidermis; RtTaste, taste of storage root; RtSh, root shape; RtCon, root constrictions; ExRtPed, extent of root peduncle; ExCoStRt, exterior colour of the storage root; EasePeel, ease of peeling; CoRPulp, colour of the root pulp; CoRtCor, colour of the root cortex

Most storage roots were conical-cylindrical in shape, light brown in colour and pedunculate with few or no constrictions. The texture of the storage root epidermis was mostly rough. The storage roots were mostly easy to peel, with a cream root pulp (parenchyma) and white or cream cortex with a sweet or intermediate taste in equal proportion.

Quantitative traits

Broad-sense heritability, mean and standard error of 11 quantitative traits are given in Table 2.

Table 2 Heritability, mean and standard errors of the quantitative traits

Qualitative and quantitative data combined

Optimal partitioning of all agro-morphological traits into clusters suggested seven clusters. This is illustrated in Fig. 3a which shows an optimal partition with increasing clusters number based on aggregation distances. This is confirmed with the dispersion of the adjusted rand index (Fig. 3b) with a lower dispersion from seven clusters. Agro-morphological traits clustered according to this grouping is illustrated in Fig. 4. This dendrogram can be “cut” with a horizontal line at a height with maximum distance up and down, indicating seven groups of traits. From this grouping, seven synthetic variables were generated. As expected, root weight and harvest index were closely associated as were number of storage roots and number of commercial roots, shape of plant and levels of branching, number and length of leaf lobes, shape of central leaflet and the ratio of the leaflet width and length, plant height and biomass.

Fig. 3
figure 3

Stability of the partitions, mean adjusted Rand index from 100 bootstrap (A) and dispersion of the adjusted Rand index (B)

Fig. 4
figure 4

Relationship among quantitative and qualitative traits, from which seven synthetic variables were derived

The seven synthetic variables were used to group the clones into four clusters (Fig. 5). The indices used for cluster validation equally suggested two, four or 10 clusters of clones. Four groups were considered using the same principle of a horizontal line on the dendrogram that have a maximum distance up and down at four clusters. Of the genotypes from Ngazidja, 12% were from Cluster 1, 63% from Cluster 2, 17% from Cluster 3 and 8% from Cluster 4. Of the genotypes from Ndzouani, 10% were from Cluster 1, 30% from Cluster 2, 20% from Cluster 3 and 40% from Cluster 4. Of the genotypes from Mwali, none were from Cluster 1, 42% from Cluster 2, 8% from Cluster 3 and 50% from Cluster 4. Cluster 2 had a good representation of genotypes from each island, whereas Cluster 4 had good representation from Ndzouani and Mwali.

Fig. 5
figure 5

Relationship between 46 unique landraces, once duplicates had been removed, based on quantitative and qualitative agro-morphological traits. Numbers refer to FMO number

Genotyping

Identification of duplicates and identity based on known landraces and breeding lines

Genetic fingerprints were obtained from genotyping data of 76 of the 79 cassava samples collected. Initially calls at 42,804 SNP positions were obtained, but after data cleaning, all the 76 entries remained (Maximum missing value within any genotype was 1.3%) and 14,071 SNPs. The lowest similarity value of 21 known duplicates included in the analysis was of two Mumba entries with a similarity of 0.852, so a cut-off of ≥ 0.85 was used to define duplicates. Based on this, a set of 46 unique genotypes was defined within the 76 FMO genotypes from Comoros, and the only landrace or breeding line included in a panel of known clones, that matched with landraces from Comoros, as a duplicate was Aipin Valenca (a very old cultivar, and variant of Namikonga, with wild species characteristics, and possibly some CBSD resistance) (Cluster 1 containing FMO1, 19, 50 and 63). Duplicates are listed in Online Resource 6 together with the most representative clone of a duplicate group. The relationship between unique FMO clones is illustrated in the dendrogram, based on Euclidean distance, in Fig. 6. The distance matrix of morphological data has a positive and significant relationship with that of genotypic data (Mantel statistic: 0.19, p value = 0.02), indicating that as clones become more dissimilar in morphology, they also become more dissimilar in terms of genotypic information.

Fig. 6
figure 6

Relationship of 46 unique or most representative cultivars from a cluster of cassava duplicate clones from Comoros, based on Euclidean distance and SNP data. Branch lengths are not representative of distance. Different colours define the six main clusters

Four main clusters and two smaller clusters consisting of FMO10, 12 and 25 in one, and FMO66 and 33 in the other are evident from Fig. 6, and identified by colour. All these clones are from the island of Ngazidja except FMO66 which is from Mwali. The relationship between 76 clones by island, is also given in a Discriminant Analysis of Principle Component (DAPC) plot in Fig. 7. This relationship is supported by the unbiased estimator of the Wright’s F statistics, Fst (Wright 1965) in which Ngazidja is similarly related to both Mwali (Fst = 0.0212) and Ndzouani (Fst = 0.0219) with Mwali and Ndzouani slightly more distantly related (Fst = 0.0234). This is similar to Nei’s pairwise genetic distance (Nei 1972, 1987) in which Mwali and Ngazidja are most closely related (0.0219) followed by Ngazidja and Ndzouani (0.0224) with Mwali and Ndzouani being most distantly related at 0.0234. This relationship is also evident in the dendrogram in Fig. 8. An Fst less than 0.05 is considered weak differentiation.

Fig. 7
figure 7

Discriminate analysis of principle coordinate analysis, showing relationship among 76 collected clones from different islands

Fig. 8
figure 8

Relationship of landraces from different regions of Tanzania and the Comoros

Diversity indices among Islands is provided in Table 3, calculated in Hierfstat package (De Meeûs and Goudet 2007; Goudet and Jombart 2021) in R. The mean observed heterozygosity (Ho) of Ndouani is slightly higher than the other two islands. The within population gene diversity (Hs) and overall gene diversity (Ht) are greatest in Ngazidja with the largest number of samples, compared to Ndouani and Mwali. The amount of gene diversity among samples was also highest in Ngazidja, followed by Mwali, and was very low in Ndouani (0.002).

Table 3 Diversity indices by island

Relationship with Tanzanian landraces

After filtering, a total of 13,954 SNP loci remained in the dataset of 475 genotypes. The only match of Comoros genotypes with any Tanzanian genotypes was FMO63 and BKP56 which also matched with Aipin Valenca, which is an extremely old cultivar, with a distance of 0.024.

Germplasm from Tanzania and Comoros was analysed based on 11 geographical areas, according to Nei’s distance, and the result given in Fig. 8, Table 4. The relationship of individual landraces, colour-coded according to region is provided in a DAPC plot in Fig. 9, and circular dendrogram in Fig. 10. From these representations, it is clear that cassava germplasm from Comoros is quite distinct from that from Tanzania. It is most closely related to germplasm from the Northern Zone (which includes the northern coastal area) and the Coastal Zone of Tanzania, rather than the islands of Zanzibar and Pemba. The distinction between that and germplasm from the Lake, Western, Central and Southern Highlands Zones of Tanzania is clear.

Table 4 Nei’s genetic distance between 11 geographical regions in Comoros and Tanzania
Fig. 9
figure 9

Discriminant analysis of principle components (DAPC) analysis showing relationships of landraces from different regions of Tanzania and the Comoros

Fig. 10
figure 10

Circular dendrogram showing relationships of landraces from different regions of Tanzania and the Comoros

Interestingly, cassava landraces from the Islands of the Comoros showed lower levels of heterozygosity (Ho) (lowest 0.2079) than those from the Tanzanian mainland (highest 0.2342 in Western Zone), and similar to that from the island of Zanzibar (0.2123) (Table 5). Similarly, within population gene diversity (Hs) and overall gene diversity (Ht) was lower on the islands, including Comoros, as well as Pemba and Zanzibar.

Table 5 Diversity indices of cassava landraces according to 11 Zones of Comoros and Tanzania

Discussion

Landraces of cassava have provided a reliable food source for generations of people living on the Islands of the Comoros. In anticipation of the introduction of higher yielding, more disease resistant cultivars, and due to the threat of current viral pressure, INRAPE, in collaboration with IITA, collected 79 landraces of cassava for conservation and breeding. By far the largest number of landraces (41) were collected from Ngazidja in 13 production sites, 21 cultivars were collected in Ndzouani in 12 sites and 17 cultivars were collected in Mwali in 9 sites. Most of the cultivars had local names given according to their characteristic traits, history such as the person who introduced them to the area or depending on where they originated. Interestingly, most of the producers were men, aged between 30 and 80 years, who grew cassava roots and leaves for personal use and sale, with a strong preference for sweet cultivars. Cultivars with sweet, soft storage roots had the highest market demand. Of note were two cultivars, H57_FMO13 and Mlaregno 16_FMO40 (and their synonyms) which were in high market demand and consistently scored tolerant to both CMD and CBSD during collection. These cultivars should be further evaluated for response to CMD and CBSD with the view to both rapid propagation and distribution to farmers and incorporation into a breeding program in Comoros. On the Islands of Ndzouani and Mwali planting material was frequently purchased, indicating that the promotion of a commercial seed system for cassava might be readily accepted.

Interestingly the incidence of both CMD and CBSD was higher on the island of Ngazidja, than on Ndzouani and Mwali, despite the fact that the average altitude of collection was higher (471 m), than in Ndzouani (375 m) and Mwali (161 m). Increasing altitude has generally been found to be associated with decreased disease incidence, particularly for CBSD (Nichols 1950; Hillocks et al. 2002; Hillocks and Jennings 2003), although it has been noted that below 1000 m, the general pattern of higher incidences of CBSD at lower altitude is not necessarily true for specific sites. It is influenced by the source of planting material, whether from infected cuttings and/or the sensitivity of the cultivars grown (Hillocks and Jennings 2003). In fact, CBSD was only observed, by a cassava breeder participating in the study, on Ngazidja, and not on the other two inlands. It may be only a matter of time until it is introduced to Mwali and Ndzouani.

All agro-morphological characters were clustered into seven synthetic variables, which were then used to group the clones into four clusters. As expected, traits such as harvest index (HI) and root fresh weight (RootFWt) were closely related, as were plant height (PltHt) and biomass which consisted of one synthetic variable, and levels of branching (LevBranch) and the shape of the plant (ShPLT).

Based on genotyping data, none of the Tanzanian landraces shared identity with any of the Comoros landraces, and only one cultivar, known as Aipin Valenca, matched with four cultivars from Comoros, FMO1,19,50 and 63. Aipin Valenca is a very old cultivar, originally imported from Brazil, for use in the CMD and CBSD resistance breeding program that operated at Amani in northern Tanganyika (now Tanzania) in 1937 (Nichols 1947; Jennings 1957). It was found to be a promising parent particularly for CBSD resistance and was used as a parent in the interspecific backcrossing programme (Hillocks and Jennings 2003).

The relationship among islands is to be expected with Ngazidja forming an intermediary position between Mwali and Ndzouani according to both Weir and Cockerham (1984) and Nei (1972). Ngazidja is the island where most landraces were collected (41 of the 76 landraces, based on genotyping data). Germplasm from all three islands was most closely related to each other than to germplasm from different Zones in Tanzania, including Pemba and Zanzibar. Relative to these Zones, cassava germplasm from the Comoros of all three islands was most closely related to that from the Northern Zone. Fascinatingly this was the Zone hosting the Amani Breeding Program and could suggest an historical transfer of germplasm from that program to the Comoros, particularly as the old cultivar Aipin Valenca was found in the Comoros. Following from the Northern Zone, germplasm from Comoros was next closely related to that from the Tanzanian Coastal Zone, and that from Zanzibar. Interestingly germplasm from Pemba was more distantly related, as was germplasm from the Lake Zone, Southern Zone, Western Zone and Central Zones of Tanzania.

Cassava landraces from the Comoros appear to demonstrate some of the characteristics in terms of diversity that we would expect from being isolated on islands with restricted germplasm movement. Observed heterozygosity (Ho) was the lowest of all 11 regions for Ngazidja, followed by Zanzibar, then Ndzouani and Mwali (Table 5). This indicates increased selfing, or more likely in cassava, cross-pollination amongst clonal individuals, which is effectively selfing, has increased homozygosity. In addition, the within population gene diversity (Hs) was lowest for all the islands; Pemba (0.1727), followed by Ndzouani and Mwali (0.1883), Zanzibar (0.1894) and Ngazidja (0.1958). Total gene diversity (Ht) was very similar with Ndzouani and Mwali having the lowest (0.1883), followed by Pemba, Ngazidja and Zanzibar. These low levels of diversity are likely due to restricted movement of germplasm due to the isolation of being on an island. This suggests that the Comoros should pro-actively import germplasm to enhance diversity, and the availability of improved germplasm.

In conclusion, cassava plays a vital role in the food security of the largely rural population of the relatively isolated Comoros Islands even though few modern cultivars are grown. This, together with the narrow genetic base of landraces, prompts the recommendation that introductions of improved germplasm be made with a focus on cultivars with preferable sweet tasting roots. Two landraces with high market demand and reported tolerance to diseases were identified for further evaluation with a view to multiplication and distribution and incorporation into the breeding program. Cassava landraces from Comoros warrant urgent conservation in IITA’s international genebank due to their uniqueness which is likely a result of limited germplasm exchange due to their relative isolation on the islands.