Molecular and cytological characterization of the global Musa germplasm collection provides insights into the treasure of banana diversity

Bananas (Musa spp.) are one of the main fruit crops grown worldwide. With the annual production reaching 144 million tons, their production represents an important contribution to the economies of many countries in Asia, Africa, Latin-America and Pacific Islands. Most importantly, bananas are a staple food for millions of people living in the tropics. Unfortunately, sustainable banana production is endangered by various diseases and pests, and the breeding for resistant cultivars relies on a far too small base of genetic variation. Greater diversity needs to be incorporated in breeding, especially of wild species. Such work requires a large and thoroughly characterized germplasm collection, which also is a safe depository of genetic diversity. The largest ex situ Musa germplasm collection is kept at the International Transit Centre (ITC) in Leuven (Belgium) and currently comprises over 1500 accessions. This report summarizes the results of systematic cytological and molecular characterization of the Musa ITC collection. By December 2015, 630 accessions have been genotyped. The SSR markers confirmed the previous morphological based classification for 84% of ITC accessions analyzed. The remaining 16% of the genotyped entries may need field verification by taxonomist to decide if the unexpected classification by SSR genotyping was correct. The ploidy level estimation complements the molecular data. The genotyping continues for the entire ITC collection, including newly introduced accessions, to assure that the genotype of each accession is known in the largest global Musa gene bank.


Introduction
Bananas (Musa spp.) are one of the world's major fruit crops that have been cultivated since the dawn of agriculture (Denham et al. 2003). Their annual production has reached 144 million tons (FAOSTAT 2014) and different types of bananas are a staple and nutritious food for hundreds of millions in the tropics and subtropics. Despite the importance of bananas for food security, there is insufficient knowledge and few tools are available to counteract the negative impact of diseases and pests threatening production. Currently, large plantations manage these threats with massive amounts of agrochemicals, creating economically and ecologically unsustainable growing conditions; while smallholders often use improperly characterized local cultivars and mostly cannot afford large investments into agrochemicals.
Breeding new and resistant cultivars is hampered by virtual seed sterility of cultivated clones (Ortiz and Swennen 2014). Most of the modern cultivars are believed to have originated from natural inter-and intraspecific crosses of wild AA and BB diploids (2n = 29 = 22) derived from Musa acuminata Colla (with genome A, D'Hont et al. 2012) and M. balbisiana Colla, respectively (with genome B, Davey et al. 2013). Musa schizocarpa (S genome) and M. textilis (T genome) have also contributed to the origin of some cultivars. Next to the seed sterility, parthenocarpy has played an important role in formation of edible fruits, which were domesticated by early farmers. While M. acuminata is thought to have originated in Malaysia or Indonesia (Simmonds 1962;Nasution 1991), the centre of diversity for M. balbisiana was designated to India, Myanmar, Thailand and Phillipines (Daniells et al. 2001), to where M. acuminata edible diploid and triploid cultivars were brought by humans. This allowed natural hybridization of the two species, resulting in various genomic combinations [diploid AB; triploid (2n = 39 = 33) AAB, ABB; and less frequent tetraploid (2n = 49 = 44) AAAB, AABB; Simmonds and Shepherd 1955]. Throughout the years of domestication process, many edible banana varieties have arisen all around SE Asia. Migration of humans in the early days of agriculture have brought bananas to the secondary centers of diversity such as Africa (where AAB Plantains and AAA East African Highland bananas arose) or Pacific Islands (with today's AAB Maoli/Popo'ulu and Iholena cultivars;De Langhe et al. 2009). Despite enormous difficulties, Musa breeders have managed to produce successful hybrids (Vuylsteke et al. 1993;Rowe 1998;Ortiz and Swennen 2014). One of the key steps in the Musa breeding process is the development of agronomically-improved, disease-resistant diploid parental lines (Tenkouano et al. 2003), which can be used for producing synthetic tetraploid hybrids (Rowe and Rosales 1996). However, little is known about the exact ancestral genetic basis of domesticated banana cultivars (De Langhe et al. 2010), which hampers the choice of suitable parents for crosses to produce new hybrids with plant and fruit qualities comparable to the currently grown cultivars.
Yet, the genetic diversity of wild banana species is vast, and a rather restricted portion of it seems to have been used during the initial domestication process through their semifertile variants (De Langhe et al. 2009). Hence, there is a rich pool of genetic material among the wild Musa species (Janssens et al. 2016) that still needs to be described, collected, preserved and used in breeding.
The largest international ex situ collection of banana germplasm (International Transit Centre, ITC) comprising 1518 accessions is maintained by Bioversity International and hosted at the Catholic University in Leuven (KU Leuven), Belgium. The ITC collection has a good overall representation of the groups of cultivated bananas. However, some specific cultivated groups and subgroups from some geographical areas, as well as wild species are still underrepresented in the collection. There are still an estimated 300-400 cultivars and wild specimens known to be missing in the collection, including unique AAB plantains from the Congo-Basin, Fe'i bananas and wild species from the region of Papua New Guinea, diploids from the East African region, M. balbisiana diversity from China and India, Callimusa species from Borneo, and wild and cultivated bananas from Myanmar. The efficient management of such collection requires a strategy for thorough characterization of the stored accessions. This is critical for the identification of duplicated and mislabeled accessions, as well as accessions that may have undergone somatic mutations, but particularly needed for efficient selection of germplasm for distribution and use in breeding. Besides the precise morpho-taxonomic classification that, in Musa, is based on a set of phenotypic characters (IPGRI-INIBAP/CIRAD 1996) and basic chromosome number, a molecular-level characterization of these accessions has been providing complementary information on diversity and relationships.
DNA markers proved to be useful for characterizing germplasm in various plant species (Chin et al. 1996;Roder et al. 1998;Cregan et al. 1999;Hayden et al. 2007;Jing et al. 2010) and amongst others, microsatellites (simple sequence repeats; SSR) markers gained special popularity thanks to their co-dominant inheritance, good reproducibility and high abundance in plant genomes (Goldstein and Schlotterer 1999;Selkoe and Toonen 2006). In, Christelová et al. (2011) established a standardized procedure for genotyping Musa accessions with 19 SSR markers, which were selected from a larger set of SSR markers developed by Crouch et al. (1998), Lagoda et al. (1998) and Hippolyte et al. (2010). This resulted in the establishment of the Musa Genotyping Centre (MGC) in Olomouc, Czech Republic, under the auspices of Bioversity International.
The systematic molecular characterization of the ITC collection (including the already genotyped accessions as well as the newly introduced samples) requires a robust genotyping system which can handle large batches of samples as well as single accessions sent for re-analysis, without compromising the comparability of results. The genotyping technology evolves fast with high-throughput genotyping methods such as DArT or nextgeneration sequencing based marker systems (Mason et al. 2015), some of which have been applied to Musa research (e.g. Risterucci et al. 2009). Yet SSR markers are still used as an effective tool in elucidating the genetic diversity in many plant species, including bananas (Nicolai et al. 2013;Mbanjo et al. 2012;Gross-German and Viruel 2013;Liu et al. 2015;Kitavi et al. 2016). The main advantage of the SSR genotyping approach resides in its capability of systematically adding new information to existing data sets. The newly analyzed accessions can be added to the existing database of SSR profiles, while the optimized and undemanding methodology assures comparability of results gathered over different time points, at a reasonable cost.
The maintenance of genebanks of vegetatively propagated plants is expensive and timedemanding. Therefore, it requires a rationalized approach dealing with properly characterized germplasm and ensuring the management of true-to-type germplasm. This is particularly important in banana where somaclonal variation occurs more frequently due to the in vitro culture process (Côte et al. 1993;Vuylsteke et al. 1996;Rodrigues et al. 1998). In Musa, germplasm is usually conserved as in vitro proliferating meristems under limited growth conditions (Van den houwe et al. 1995) complemented with cryopreservation to minimize the risk of contamination or human error during subculturing, as well as to avoid somaclonal variation (Panis et al. 2005;Panis 2009). In fact, Doleželová et al. (2005) identified accessions in the ITC collection that contained plants of different ploidy (mixed ploidy accessions) as well as accessions in which plants comprised cells with different ploidy levels (mixoploid accessions). Although the origin of off-type plants is not understood, chromosome number changes (polyploidy, aneuploidy) due to repeated in vitro subcultures might have contributed to this phenomenon. Thus, as a part of the quality management of the MGC, ploidy levels are ascertained via flow cytometry, a method which excels in high throughput and precision (Doležel et al. 1994;Roux et al. 2003;Pillay et al. 2006).
In the present study, we coupled flow-cytometric analysis of ploidy level with the genotyping platform based on SSR markers and analyzed 630 accessions of the genus Musa held at the ITC collection including 49 Reference DNA collection samples (http:// www.musanet.org/Musagenotypingcentre/genomicDNA), as well as 27 samples received from Hawaii to enlarge the representation of individual Musa subgroups, and 38 samples collected during the Bioversity International expedition to Indonesia (Sutanto et al. 2016). The current work was undertaken (1) to improve the characterization of accessions held at the ITC collection, (2) to identify problematic accessions at the ITC collection and reduce duplicated entries, (3) to support the Musa research and breeding community by facilitating proper evaluation of the available germplasm, and (4) to add new knowledge on Musa genetic diversity.

Plant material
Most of the plant material analyzed in this study (Online Resource 1) came from the International Transit Centre (ITC, Leuven, Belgium) and was delivered in batches of Biodivers Conserv in vitro rooted plantlets of about 50 accessions (five plantlets per accession), or as lyophilized leaf tissues. If fresh plant material was obtained, leaf tissues were lyophilized after ploidy level analysis and kept for further use. In particular situations-if ploidy analysis resulted in ambiguous results (1.2% of measured accessions), the plants were transferred to soil and maintained in a greenhouse for chromosome analysis. Altogether, 695 accessions were genotyped, including 327 diploids, 363 triploids and 5 tetraploids. Apart from the accessions received from ITC and from the Musa Reference DNA collection (http://www. musanet.org/Musagenotypingcentre/genomicDNA), 27 accessions from Hawaii were provided by Dr. Angela Kay Kepler (Banana Specialist, Hawaii) to enlarge the representation of individual Musa subgroups, and 38 accessions were obtained during two Musa germplasm collecting expeditions in Indonesia (Sutanto et al. 2016), that are currently being introduced into the ITC collection. The exploration expedition was confined to the East-Indonesian triangle (formed by Halmahera Islands, Sulawesi and Lesser Sunda Islands) and is henceforth referred to as ''Indonesian Triangle''. The samples from Hawaii and from the ''Indonesian Triangle'' expedition were collected as fresh leaves from young banana plants for ploidy level estimation and subsequently lyophilized for further use.

Ploidy level estimation
Ploidy level of each accession was estimated by flow cytometry according to Doležel et al. (1997Doležel et al. ( , 2007. About 30 mg of a young leaf tissue was chopped with a razor blade in a glass Petri dish containing 500 ll Otto I solution (0.1 M citric acid, 0.5% v/v Tween 20). Crude homogenate was filtered through a 50 lm nylon mesh. Chicken red blood cell nuclei (CRBC), prepared according to Galbraith et al. (1998), were added to the suspension of Musa nuclei as an internal reference standard. After 30 min incubation at room temperature, 1 ml Otto II solution (0.4 M Na 2 HPO 4 ) (Otto 1990) supplemented with 5 lM DAPI and 3 ll/ml of 2-mercaptoethanol were added. The samples were analyzed using Partec PAS or Sysmex-Partec CyFlow flow cytometers equipped with UV excitation and detectors for DAPI fluorescence. The gain of the instruments was adjusted so that the peak of the CRBC nuclei was positioned approximately at channel 100 on a 512 channel scale. Relative nuclear DNA content of Musa accessions was then determined by comparing peak positions of CRBC nuclei and nuclei of the sample (Fig. 1). Every accession was represented by five individual plants and ploidy was estimated in each of them.

DNA extraction and PCR amplification
Genomic DNA was extracted from lyophilized leaf tissues of young banana plants using a NucleoSpin Plant II kit (Macherey-Nagel, Düren, Germany) following the manufacturer's recommendations. Each of the accessions received from ITC collection was represented by five individual plantlets. If there were no differences in ploidy among individual plants, genomic DNA was extracted from a pooled sample containing lyophilized tissue from all of them. If plants from the same accession displayed inconsistent results during ploidy analysis, genomic DNA was extracted separately from each plant and individual plants were treated as separate accessions. For the SSR analysis, the pipeline established by Christelová et al. (2011) was used and is illustrated in Fig. 2. Briefly, 19 SSR loci (Lagoda et al. 1998;Crouch et al. 1998;Hippolyte et al. 2010) that are well distributed across ten of the eleven Musa genetic linkage groups (Hippolyte et al. 2010) were amplified using a set of M13-tailed specific primers to allow universal labelling with fluorescent dyes. Although new SSR markers are accessible thanks to the Musa genome sequence assembly completion (D'Hont et al. 2012;Martin et al. 2016), the hitherto used marker set was not enlarged, as the reproducibility and comparability of results gathered until now would then not be assured and a re-start of the whole genotyping effort would be inefficient. The PCR reaction mix contained (in the final volume of 20 ll): 10 ng of template genomic DNA, reaction buffer (consisting of 10 mM Tris-HCl (pH 8), KCl 50 mM, 0.1% Triton-X100 and 1.5 mM MgCl 2 ), 200 lM dNTPs (each), 1 U of Taq polymerase, 8 pmol of the M13tailed locus specific forward primer, 6 pmol of the fluorescently labeled universal M13 forward primer, 10 pmol of the locus specific reverse primer. The cycling conditions were set as follows; initial denaturation step at 94°C for 5 min, followed by 35 cycles of denaturation (94°C/45 s), annealing at temperature corresponding to the locus-specific primer (1 min) and extension (72°C/1 min). The final extension was allowed for 5 min at 72°C. Purification of the PCR products was performed by ethanol/sodium acetate precipitation. Two independent PCR reactions were performed in each accession to avoid random errors of allele binning.

Fragment analysis
Purified PCR products were diluted 40-fold in Hi-Di formamide containing the internal size standard (GeneScan TM -500 LIZ size standard; Applied Biosystems, Foster City, CA, USA) and loaded onto a capillary electrophoresis DNA analyzer (ABI 3730xl, Applied Biosystems) after 5 min denaturation (95°C). Electrophoretic separation and signal detection were carried out with default module settings. In order to reduce the cost and increase throughput of the genotyping platform, samples were multiplexed for electrophoretic separation. Up to fourfold multiplexing was applied by combining four PCR products, labelled with four different fluorescent dyes (6-FAM, VIC, NED and PET) into a single sample for loading. The resulting data were analyzed and called for alleles using Fig. 1 Estimation of ploidy level in Musa accessions. Histograms of relative nuclear DNA content obtained after simultaneous flow cytometric analysis of DAPI-stained nuclei isolated from fresh leaf tissues of Musa and chicken red blood cell nuclei (CRBC). The gain of flow cytometer was adjusted so that the G1 peak of CRBC, which served as an internal reference standard, was positioned on channel 100. Peaks appearing on channels 200, 300, 400 and 500 correspond to doublets, triplets, etc. of CRBC nuclei. Ploidy of Musa accessions was determined based on the ratio of G1 peak positions (Musa : CRBC), knowing that in diploid, triploid and tetraploid plants, the ratio is *0.5, 0.75 and 1 respectively. a Simultaneous analysis of nuclei isolated from a diploid accession (M. acuminata ssp. banksii, ITC0806) and CRBC. The ratio of G1 peak means was *0.5. b Simultaneous analysis of nuclei isolated from a triploid accession (Itoke, ITC1554) and CRBC. The ratio of G1 peak means was *0.75 GeneMarker Ò v1.75 (Softgenetics, State College, PA, USA), manually checked and implemented into marker panels .

Genetic diversity analysis using distance-based methods
The extent of genetic diversity among all samples was evaluated using the Neís genetic distance coefficient calculation (Nei 1973) and subsequent cluster analysis was done using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA; Michener and Sokal 1957). To enable joint analysis of all ploidy levels (29, 39 and 49) the genotypic data was converted into binary (coded by 1/0 = presence/absence) and analyzed as a dominant marker's record (Weising et al. 2005). Dendrograms were constructed based on the results of UPGMA analysis and visualized in FigTree v1.4.0 (http://tree.bio.ed.ac.uk/software/ figtree/). The first dendrogram that comprised all analyzed samples (not shown) was used to identify problematic accessions whose position in the dendrogram did not agree with their current classification and were therefore not used to build the ''core subset''. The core dataset comprised reliable accessions only, for which the clustering pattern agreed with the classification based on morphological descriptors. The genotyping and evaluating pipeline is shown in Fig. 2. The dissimilarity index threshold of 0.25 was used to assess the grouping of the accessions on the dendrogram. Bootstrap support for individual branches was calculated on 1000 replicates and values above 35% (0.35) were used to confirm the fundamental subclustering pattern. To evaluate the informative power of individual SSR loci, several characteristics were calculated, such as Polymorphism information content (PIC; Botstein et al. 1980), allele number, observed heterozygosity (H o , in the diploid dataset) and major allele frequency using Powermarker v3.25 (Liu and Muse 2005).

Results and discussion
The significant role of genebanks and germplasm collections as essential resource for breeders and researchers is often compromised by misclassification of some accessions. This is more common than expected (Mason et al. 2015), as errors can occur at different stages of handling the material, including the acquisition of new accessions, their delivery to the collection and during maintenance at the genebank, especially if in vitro subculturing is involved. It was shown for a number of plant species that SSR markers are useful to deal with such cases and can help to highlight problematic misclassified or redundant accessions in germplasm collections (e g. da Cunha et al. 2014;Fjellheim et al. 2015;Roy et al. 2016). The aim of this study was to systematically characterize a substantial portion of the accessions held at the ITC collection and to interpret the genetic structure of the Musa germplasm represented by the genotyped accessions. Nearly half of the ITC accessions were genotyped by flow cytometric ploidy estimation and SSR analysis.

Ploidy level estimation
Flow cytometry was used to estimate the ploidy in all 495 accessions that were received as fresh leaf tissues. The remaining accessions were received either as lyophilized leaf tissues (151) or DNA samples from the Reference DNA collection (49; http://www.musanet.org/ Musagenotyping centre/genomicDNA). Ploidy of these later accessions was retrieved from the Musa germplasm information system database (MGIS; http://www.crop-diversity.org/ mgis/). For a majority of the samples, the expected ploidy was confirmed (Online Resource 1). However, some accessions e.g. Gebi (ITC0877) and Mwitu Pemba (ITC1545) reported to be triploid AAB, were found to be diploid. Moreover, accessions Tongkat Langit Papua (ITC1716) and M. borneensis (ITC1531) showed ambiguous results with DNA content between diploid and triploid, or between triploid and tetraploid for accession Sar (ITC0898) and SUP 1 (AAB, Indonesian Triangle mission 2). SSR analysis indicated triploid status by the occurrence of three alleles at some loci. However, this observation does not guarantee three complete chromosome sets, nor it excludes the possibility of tetraploids status. Only chromosome counting would give a definite answer.
Out of 495 accessions analyzed by flow cytometry, two cases of mixoploidy were found. The accession M. rosea-hybrid (ITC1598) comprised one mixoploid (29 ? 49) plant out of the five analyzed plantlets. Mixoploid status (39 ? 69) was detected in three out of five plants of the accession Malbhog (ITC1631). The remaining two plantlets were either triploid or hexaploid. The presence of the hexaploid plant indicates that a shift to a higher ploidy happened in vitro as hexaploid banana plants do not grow in the field and that the classification of three plants as mixoploid was not an artifact due to accumulation of cells in G2 phase of cell cycle, which is yet to be confirmed in the case of M. rosea-hybrid ITC1598. In cases where the five plantlets differed in ploidy levels without showing a mixoploid pattern (ITC1573, ITC1598), analysis of re-sent samples was carried out. No incongruent results were obtained on the re-sent samples, which indicated that the problems detected in the first batch were caused by delivery error or mislabeling. This proved the capability of the genotyping pipeline to detect mislabeled accessions and showed the robustness of the method.
In the latest systematic effort to screen ploidy levels , 1150 entries deposited in the ITC collection by that time were analyzed. Since then, 381 new accessions were introduced into the genebank, out of which 249 accessions (Online Resource 1) had their ploidy level verified by flow cytometry for the first time in this study. For the remaining accessions, repeated analyses provided a valuable feedback for the conservation strategy. As demonstrated in some accessions, e.g., mixoploid ITC1631 and 18 samples listed in Online Resource 2, the ploidy determined in the present work did not agree with previous results.

Distance-based genetic diversity analysis
The informative power of the 19 SSR loci used in the genetic diversity study (expressed as the Polymorphism Information Content; PIC) ranged between 0.561 and 0.933 with an average value of 0.789 (Table 1). The highest observed heterozygosity within the diploid dataset was observed for the marker mMaCIR01 (H o = 0.623), whereas the locus mMa-CIR307 had the lowest H o value (H o = 0.250).
The UPGMA analysis of the dataset comprising all ploidy levels together grouped the accessions into clusters, which in general conformed to the morphological traits-based classification into subgroups (in 84% of accessions) as reported by germplasm donors to the collection (Musa Germplasm Information System (MGIS), Bioversity International. Accessed on February 15, 2016 at http://www.crop-diversity.org/banana/). However, in 16% of cases, the classification reported did not agree with the clustering observed in the dendrogram. These accessions are listed in Online Resource 2 and the discrepancy between the current classification and the results obtained here are possibly caused by mislabeling or handling error, or due to misclassification before introduction into the genebank. This set of 71 ''to be verified'' accessions was excluded from the dataset that was used for the final dendrogram construction. Furthermore, 24 accessions for which classification was incomplete or not provided, and eight synthetic hybrid accessions were removed from the dataset, in order to increase the resolution of the ''core subset'' tree. The remaining 591 accessions, representing only entries with coherent classification and genotyping data, were used to construct a final SSR dendrogram. The representativeness of the subset with regard to individual species/subspecies and subgroups of Musa is summarized in Table 2. We considered this subset of accessions as the ''core subset'' to which any accession from the excluded ones could be added in future and its potential true/more specific classification could be proposed. However, only field verification which is carried out in parallel (Chase et al. 2016) can provide conclusive results (examples included in Online Resource 2).
Our results indicate that the SSR markers covered well the genetic diversity of the Musa genus as revealed by the clustering pattern on the dendrogram (Fig. 3, Online Resource 3). The dendrogram reveals 48 sets of accessions separated at the Nei's dissimilarity index threshold of 0.25 (Table 3) with relatively significant bootstrap support for most of them ([35%). For the wild accessions, the sets and some singletons correspond to wild species and subspecies, while at the triploid cultivar level, the sets correspond to subgroups, of which the members are supposed to have derived as somatic mutants from an original domesticate. At the diploid cultivar level, the situation has not yet been clarified, and the sets may be composed of somatic mutants and accessions with different parentage.  The 48 sets of accessions formed 13 clusters as denoted on Fig. 3 and in Table 3. The diversity as revealed by the UPGMA clustering is further discussed by individual groups, according to their classification as wild and cultivated diploids and edible triploid cultivars, respectively.

Wild Musa species diversity
For the ease and clarity of the discussion regarding the diversity within the genus and to illustrate the discussed accessions in all figures, we decided to retain the former classification of genus Musa into sections Callimusa, Australimusa, Rhodochlamys and Eumusa. The accessions formerly classified under section Rhodochlamys (cluster II) are now considered as forming one assembly with the ex-section Eumusa, and entitled ''Musa'' (Häkkinen 2013). It may be worthwhile to revisit this naming as it creates confusion with the name of the genus 'Musa'. Similarly, the merger between the former sections Callimusa and Australimusa (cluster IV) in the complex named Callimusa needs further study . This result may indicate that M. ornata is a hybrid species, which was also suggested by Shepherd (1999), who formulated his opinion on M. ornata being a naturalized hybrid of M. velutina (Rhodochlamys) and M. flaviflora (a small species closely resembling M. acuminata).
Separate location of three well-represented subspecies of M. acuminata (burmannica, malaccensis and banksii) on the SSR tree may have a genetically indicative value. The 31 accessions of ssp. banksii form a coherent block in the middle of the cluster XI (Fig. 3) with more or less evident links to diploid AA cultivars. A comparable link for the block of 12 accessions of ssp. malaccensis in cluster III can be drawn. The block of seven accessions of ssp. burmannica is grouped with a sole and morphologically quite different AA cv. block (Pisang Jari Buayá, cluster I). This observation may indicate participation of this Fig. 3 Diversity tree of the core subset accessions. UPGMA dendrogram was constructed based on the results of SSR analysis. Only accessions clustering in agreement with their previous classification were used for the core subset diversity tree construction. Individual blocks of clustered accessions are named according to Table 3 and differentiated by colored branches. Major clusters are marked I-XIII. Full version of the tree including names of all accessions is shown in Online Resource 3 subspecies in the origin of AA cultivars to a much lesser extent, and conforms to Simmond's (1962) findings that the subspecies does not seem to possess parthenocarpic genes. The weakly represented East-Indonesian acuminata taxa, var. tomentosa and var. acuminata (Nasution 1991), as well as ssp. errans are spread over the ''banksii sensu lato''block in cluster XI, and are discussed further below.
Proportion of shared SSR-alleles and the proximity between ssp. zebrina, and African ''AAA Mutika-Lujugira block'' within the cluster X supports the previous reports on the contribution of zebrina and banksii subspecies to the formation of this triploid subgroup (Carreel et al. 2002;Boonruangrod et al. 2008;Risterucci et al. 2009;Hippolyte et al. 2012) and based on our results, the zebrina contribution is predominant. The position of the singletons ssp. microcarpa, and ssp. sumatrana (Fig. 3, white and grey triangles) in the dendrogram seems unreliable with rather low bootstrap values for corresponding branches (0.04 and 0.12, respectively). Increasing the sample representation for each of the subspecies might be needed to produce a more robust position on the tree.
As shown in a number of molecular studies on bananas (e.g. Hippolyte et al. 2012;de Jesus et al. 2013), the intraspecific diversity of M. balbisiana is low compared to M. acuminata. This is reflected in our work by clustering all balbisiana accessions in a single coherent block (Fig. 3, cluster VII) that contains morphologically distinct BB forms from Mainland and Island SE Asia ('Butuhan', 'P. Klutuk' and 'Pisang batu') intermingled with BBs of South Asian origin. Although genetic diversity within M. balbisiana has been reported (Ude et al. 2002;Ge et al. 2005;Swangpol et al. 2007;Wang et al. 2007), it is believed that the ITC collection does not cover enough balbisiana diversity to trace the ancestral lineages of triploid cultivars comprising the B-genome (Hippolyte et al. 2012).
Another wild species, M. itinerans, ubiquitous from NE India in the west to Taiwan in the east (Häkkinen et al. 2008) is morphologically a very distinct species with characteristic long rhizomes, which was, however, not reflected by the cluster analysis. Indeed, the three representatives of this species were scattered over the dendrogram instead of being grouped in one block (black boxes on Fig. 3), which could be attributed to underrepresentation of the sample in the present dataset or potential mislabeling. On the other hand, it may also indicate the limitations of the 19 SSR marker set used. The selected 19 SSR markers proved to work very well for verifying classification of diploid and triploid cultivars subgroups (as described further below), and for some M. acuminata subspecies (see above). However, there seem to be limitations when it comes to certain species/subspecies, such as M. itinerans, M. acuminata ssp. errans and others (see Table 2), which are rather underrepresented in the dataset. As compared to the result of the first study where the platform was applied , the resolution has greatly improved by enlarging the dataset, which is clearly demonstrated by the AA cv. clusters, or triploid subgroups diversification (see further below). Only future results comprising the whole diversity of the germplasm available for genotyping would give clear answer and show the borders for the application of the platform.

Diploid cultivars
SSR analysis classified the large group of edible AA accessions into ten sets (highlighted in Table 3) with distinct positions on the SSR tree (Fig. 3). This not only helps in classification of these accessions, but also provides insights into the origin and putative progenitors of some triploid cultivars. The clustering reflects geographical origins as well. While the set of Island SE-Asian accessions (AA cv. ISEA 1) is connected to the malaccensis subgroup and the dessert AAA 'Ibota' subgroup in cluster III, the second ISEA block (AA cv. ISEA 2) co-localizes with a large spectrum of dessert AAAs within the cluster IX, suggesting that some of its members may have played a role in the constitution of cultivated AAAs. The great majority of these AA cvs. have sweet taste and are usually eaten raw.
The AA cv. African set in cluster IX contains AA cultivars found in East Africa and nearby islands of the Indian Ocean. It has been proposed that progenitors of this set (which most probably originated in the area between New Guinea, Borneo and Java and was brought to the East African coast through human migration; Perrier et al. 2009) were the diploid parents of the subgroups AAA 'Cavendish' and AAA 'Gros Michel' (Hippolyte et al. 2012). Our results support this notion. Similarly, the small group of Tongat AA cv. may be the source of another diploid ancestor of AAA 'Cavendish' or 'Gros Michel' (Hippolyte et al. 2012).
The AA cv. IndonTriNG group of cultivars, in cluster IX, originated in New Guinea and within the Triangle shaped by the islands in Eastern Indonesia Halmahera (NE), Sulawesi (NW) and Sunda (South), that was recently explored for wild and edible bananas (Sutanto et al. 2016). The set contains a number of new accessions, of which some could have played a role in the formation of less known AAA cultivars. Three specimens from the Indonesian Triangle expedition clustered in a block of AA cultivars, which contains three Philippine cultivars (therefore called AA cv. IndonTriPh block in the cluster X). Its members produce bunches that vaguely resemble those of the AAB 'Plantain' (e.g. Roa Cakalang, strongly resembling the French plantain bunch) and 'Plantain-linked' (strikingly for Guyod and GabaGaba Putih; Sutanto et al. 2016).
Diploid AA cultivars that share the banskii background appear into two sets on the tree (Fig. 3, cluster XI). The first of them called the AA cv. banksii derivatives is on the same clade as the wild M. acuminata ssp. banksii and seems to contain direct/pure derivatives of the subspecies, all coming from New Guinea. The second block is termed AA cv. 'banksii sensu lato' and contains several wild AAs which are geographically proximal to ssp. banksii and ssp. errans in the Philippines, var. acuminata in Western New Guinea and Maluku and var. tomentosa in Sulawesi. The possibility that AA cultivars from this set are domesticated products/hybrids of these wild acuminatas with banksii deserves to be investigated. The AA cv. 'Pisang Jari Buaya' set (in cluster I) is separated from other edible AAs. This cultivar is used in breeding as a well-known source of resistance to pests i.e. burrowing nematodes (Wehunt et al. 1978;Marin et al. 1998;Ray 2002), resistance to Fusarium tropical race 4 disease and tolerance to black Sigatoka (Rowe 1998). However, based on our SSR analysis, its genetic affinity to wild and edible AA is unclear and deserves special attention. The AA cv. 'Pisang Sapon' singleton appears to be remotely linked to subspecies zebrina only, which could point to its being an edible derivative of zebrina rather than an inter-subspecies hybrid. However, low statistical support for this particular clade does not allow for conclusive remarks. The interspecific AS (acuminata x schizocarpa) hybrids are grouped on a separate clade within cluster X, closely linked to the banksii-dominant cluster XI (Fig. 3). M. schizocarpa (Cluster V) is sympatric with M. acuminata ssp. banksii (Cluster XI) in New Guinea and hybrids have been observed, but these are probably transient F1's, except if maintained by vegetative propagation. Indeed, several members of this subcluster show the banksii morphology except for the split fruit peels typical for schizocarpa (Daniells et al. 2001).
Indian AB cultivars 'Ney Poovan' and 'Kunnan' were clearly differentiated in this work by SSR analysis, but no conclusion could be reached on the contribution of particular A genomes to the origin of this subgroup. On the other hand, three AB cultivars, 'Muku Bugis', 'Mu'u Seribu' and 'Mu'u Pundi' (collected in the Indonesian Triangle expedition; Sutanto et al. 2016) were located in the 'Plantain-linked' set of cluster XIII (see Online Resource 3), hence at a remote distance from the previous ABs 'Ney Poovan' and 'Kunnan'. Therefore we wonder whether they may have delivered the B-genome to AA cultivars to form AABs such as 'Laknau' and 'Iholena'.

Triploid cultivars
The UPGMA clustering analysis separated the commonly recognized accessions into distinct clusters scattered over the tree with blocks of triploid edible accessions located among clusters of wild and cultivated diploids (Fig. 3, Online Resource 3). The clustering pattern reflects the proportion of shared SSR-alleles between the accessions. Thus, the probable ancestral relationships among diploid progenitors and edible triploids can be hypothesized. Using a similar set of SSR markers, Hippolyte et al. (2012) showed the African 'Mlali' subgroup to be the closest 2n donor for 'Cavendish' and 'Gros Michel' subgroups. Our results confirm this concept as illustrated by the position of the block of African AA cv. (Fig. 3, cluster IX) within the clade of 'Cavendish' and 'Gros Michel' subgroups. At a broader scope, this clade is loosely linked with the AAB 'Pome' subgroup that was also proposed to share the eastern ancestry with the diploid African cv. 'Mlali' (Perrier et al. 2011). Moreover, M. acuminata 'Pisang Pipit', that was proposed to be related with the n-gamete donor of Cavendish bananas (Hippolyte et al. 2012) is part of the cluster, supporting its potential role in the ancestral relationships.
Similarly, our results support the conclusion of Hippolyte et al. (2012), that the AAA 'Ibota' subgroup (cluster III) originated from malaccensis subspecies (but missing the banksii contribution; Perrier et al. 2009) and further substantiate it with AA cultivars from Island SE Asia (AA cv. ISEA 1) as potential contributors to the formation of this triploid subgroup. Another AAA subgroup 'Lakatan' also called 'Pisang Berangan' (cluster IX) suggests the contribution of ISEA 2 edible AA cultivars. The poorly defined AAA 'Orotava' seems to be loosely connected with Vietnamese AA cvs 'Ayam' and 'Chuoi La Rung'.
While for the AAA subgroups 'Pisang Ambon', 'Leite' and 'Rio' (AAA Hetero on Fig. 3, cluster IX) no clear donors of the 2n and n-gamete were identified previously, our results suggest that their putative progenitors may come from the AA cvs. from the Indonesian Triangle including New Guinea (AA cv. IndonTriNG, Fig. 3). Moreover, our analysis proved that the AAA 'Pisang Ambon' subgroup (cluster IX; Online Resource 3 for details) is genetically distinct from the 'Gros Michel' subgroup, while the opposite has been suggested based on its morphological appearance and from the analysis of mitochondrial and plastid DNA (Carreel et al. 2002). Although for the 'Red' subgroup a malaccensis background was proposed ), our results (cluster IX, Fig. 3) do not support this assumption.
The four morphologically distinctive clone-sets of the African AAA 'Lujugira/Mutika' subgroup (Nfuuka, Musakala, Nakabululu, Nakitembe; Karamura 1998; Karamura and Pickersgill 1999) form a single coherent block in the cluster X of the SSR dendrogram. The existence of a previously assumed fifth clone set (Mbidde), containing only bananas for beer making, has recently been abandoned (Kitavi et al. 2016), its cultivars being morphologically and genetically indistinguishable from one or other of the aforementioned clone sets. Our results are in line with Kitavi et al. (2016) findings that the four clone sets are part of a single subgroup stemming from a single original AAA hybrid that has probably been formed in eastern Indonesia (Perrier et al. 2011) and diversified subsequently by multiple somatic mutations. A similar conclusion can be made for the subgroup of AAB 'African Plantains' that is subdivided into three clone types: Horn, False Horn and French, based on inflorescence morphology . Their phenotypic differentiation is not reflected by SSR analysis (cluster XIII, Fig. 3). The fact that these clone types cannot be separated into different subsets on the tree despite their striking morphological differentiation (Vuylsteke et al. 1996;Daniells et al. 2001) seems to confirm the fact that the differences are due to epigenetic changes, or resulted from somatic mutations occurring along the centuries of cultivation (Simmonds 1966;De Langhe et al. 1995;Crouch et al. 2000;Noyer et al. 2005).
The same cluster XIII contains subgroups 'Iholena' and the Philippine 'Laknau', with a bunch morphology reminding that of AAB Plantains, which are hence assembled under the name AAB Plantain-linked. The 'Iholena'originated in mainland New Guinea Kagy et al. 2016), but its place in the 'Plantain-linked' set with accessions from eastern Indonesia points to a broader area of origin. Because of the proved banksii genetic background of both A genomes in the plantain AAB subgroup (Lebot et al. 1993;Carreel et al. 2002;Kagy et al. 2016), it is proposed that the Plantain-linked cultivars have the same A-background.
Interestingly, the AAB subgroup 'Maoli-Popo'ulu' is grouped together with the ABB subgroup 'Saba' and the ABB 'Bluggoe-Monthan' accessions within the cluster XII. This could point to the presence of the same A (presumably banksii) or B genome across all these accessions and indicates a need for more detailed investigation. Although the two components of the 'Bluggoe-Monthan' complex are considered to belong to different subgroups, their members are part of a single set on the tree. We therefore adopt the term 'complex' rather than 'subgroup'. No genetic ground was obtained to differentiate between 'Maoli' and 'Popo'ulu' clone sets through the SSR analysis.
Similarly the cluster VIII contains a heterogeneous set of different genome-combinations. Representatives of the ABB triploid subgroup 'Pisang Awak' and the Indian AAB 'Silk' subgroup were reported to share the malaccensis A-genome ), but this conclusion is weakly supported by our results. Interestingly, two supposedly Silk accessions-verified triploids cultivated in East Africa-are on the same branch as AB 'Kunnan'. The possible genotypic relation between 'Silk' and 'Kunnan' warrants further examination.
'Pisang Raja(h)' is the only AAB set in cluster X and its position may point to the dominant presence of banksii in its A genomes. However, the name 'Raja' is frequently applied to cultivars with different morphology as a mark of superior quality characteristics, hence a much larger number of candidate 'P. Raja(h)' cultivars should be analyzed to examine this and exclude potential mislabeling.

Conclusions
Recently, plant genetic diversity has been frequently assessed with high-throughput nextgeneration sequencing methods, such as Genotyping by Sequencing (GBS) or SNP-based DNA chips (e.g. DArT or DArTSeq). Although such methods are powerful and offer low costs per data-point, they may not be suitable for long-term projects in which entries need to be analyzed in batches, or even one by one over extended periods of time. The comparability of results gathered at different time points cannot be assured unless the whole set of accessions is analyzed again, which may not be economical. In contrast, SSR genotyping allows processing even single samples over time. Moreover, with each new accession, new alleles may be added into the dataset thereby increasing the resolution of the analysis and improving the classification. The results of the present large-scale molecular characterization effort are essential for the management and conservation of the global Musa germplasm collection and will facilitate its efficient use by the banana research and breeding community. The genotyping platform used in this work identified potentially problematic accessions for which field or other verification were deemed necessary and it has also helped in proposing subgroup classification for corrections. Coupled to the flow cytometric estimation of ploidy level, this genotyping system presents the new standard for molecular characterization of Musa genebank accessions. Moreover, the results of the study provide a detailed picture of genetic diversity of available Musa germplasm and lay a solid basis for a more focused molecular and morphological evaluation of certain, until now less studied groups.