Background

Many plant species remain under-studied due to their low economic importance, complicated biology and/or the absence of available genome sequence information. Upon initiating a research project aimed at characterizing the genetic diversity of such a species, researchers may be confronted with the situation that some resources can be found in related taxa. In such cases, the first step is often to assess whether some of these resources, such as molecular markers, can be used to study the new species. Provided that the “source” species display enough genetic similarities to the “target” species and that marker transferability has been previously assessed, this first step may lead to quick progress in a cost-effective manner. Often, transferring markers between species is seen as a smarter investment than developing and testing new markers, especially if limited funding is available [1, 2].

Over the last three decades, molecular markers have been widely used to study genetic variation among and within populations of various plant species [3,4,5,6,7]. Among the different types of markers that are available, microsatellites or simple sequence repeats (SSRs) are often selected due to their high mutation frequency, which ranges from 10− 2 to 10− 6 nucleotides per locus per generation [2, 8] and generates multiple allelic forms, and their co-dominant nature. The combination of both characteristics makes them sensitive tools for the assessment of genetic diversity among species, determination of population structure, phylogenetic reconstruction, genetic mapping, evolutionary analyses, and molecular breeding [9,10,11,12]. From a practical perspective, the popularity of SSRs is also related to their low resource requirements (i.e. technical skills, laboratory equipments and consumables) that enable their easy implementation and the reproducibility of results in most research environments [2, 8]. However, the steps leading to the development of functional SSR markers, namely the initial identification of microsatellite loci, primer selection and assessment of amplification/polymorphism detection, require some prior knowledge of the genome of the target species and may prove to be expensive and time-consuming [11, 13]. In order to overcome this difficulty, approaches relying on the transfer of SSR markers between species or genera have therefore been implemented. They have been successful in many instances, as documented across Prunus species and among members of the Rosaceae family [14, 15]; between species of the Hevea genus and to other Euphorbiaceae [16]; among Lamiaceae [17]; among Legumes belonging to the Vicia genus [18] and from the Phaseolus genus to Vigna [19]. In other cases, the ever-increasing affordability of high-throughput sequencing technologies and the development of dedicated bioinformatics data mining tools have enabled the identification of microsatellite loci and the development of SSR markers, including in non-model plant species with limited or no background genetic information [20,21,22,23].

Borassus aethiopum Mart., also known as ron palm, toddy palm or African fan palm, is a dioecious species belonging to the Arecaceae family. It is widely distributed across West and Central Africa, where it is present as wild populations [24]. The species is classified as a non-timber forest products (NTFPs)-providing plant, since different parts of the plant are used for various purposes by local populations [24, 25]. In Benin (West Africa) for instance, 121 different uses distributed in seven categories (medicinal, handicrafts, food, construction, firewood, ceremonies and rituals) have been reported for the species [26]. Among these, the consumption of ripe fruits (fresh or roasted) and hypocotyls as food, the use of the weather- and pest-resistant stipe as construction wood and that of leaves and petioles in handicrafts, are the most widespread in local populations [26,27,28]. These different products are also sold in markets, mostly by women, to whom they provide additional income: it is indeed estimated that in Benin, sales of hypocotyls alone may represent 50% to nearly three times the minimum wage of 40,000 CFA Francs (ca. 61 euros) a month [27].

These multiple uses of products derived from B. aethiopum have put a strong anthropogenic pressure on the species, thus contributing to both fragmentations of its populations and their poor natural regeneration [27, 29,30,31,32]. Further fragmentation of the species’ habitat has been observed as a result of land clearing for agriculture or urban development [32,33,34]. As illustrated through similar examples in the literature [35, 36], such phenomena may lead to restricted gene flow and ultimately, to loss of genetic diversity among B. aethiopum populations. A sustainable management policy for B. aethiopum populations is therefore urgently needed and acquiring information on the genetic diversity of the species and population structure is a major step towards defining sustainable management actions. At the time of writing the present article, only a few chloroplast sequences are publicly available for B. aethiopum through NCBI (https://www.ncbi.nlm.nih.gov/search/all/?term=borassus%20aethiopum). By contrast, abundant molecular resources, including genome assemblies or drafts, are available for model palm species such as Elaeis guineensis Jacq [37]., Phoenix dactylifera L. [38,39,40] and Cocos nucifera L. [41, 42]. In each of these three palm species, large numbers of SSR markers have been identified and for a fraction of them, cross-species and cross-genera transferability tests among species belonging to the Palmaceae family have been performed [43,44,45,46,47,48,49]. In several instances [44,45,46,47, 49] these tests included samples from Borassus flabellifer, the Asian relative of B. aethiopum.

The primary objective of the present study is to generate the first set of genetic data on Borassus aethiopum, as a first step towards improving the management of this species through a better knowledge of its diversity. In order to achieve this, we first describe attempts to use SSR markers identified in these other palm species. Then, we describe the low-coverage sequencing of the B. aethiopum genome with the aim of developing the first set of specific SSR markers targeting this species. Finally, we used the novel SSR markers to assess the genetic diversity and population structure of B. aethiopum samples collected across the three different climatic regions of Benin, a country that was most readily accessible to us for sample collection, as an important first step towards more comprehensive studies spanning the West African sub-region.

Results

Assessment of palm SSR marker transferability to Borassus aethiopum and evaluation of their capacity for characterizing genetic diversity

Of the 80 microsatellite markers selected from the three model palm species Elaeis guineensis, Phoenix dactylifera and Cocos nucifera and tested for amplification on B. aethiopum DNA, 18 (22.5%) generate amplification products (Table 1). No amplification is observed using the 11 C. nucifera markers, whereas 7 (15.9%) and 11 (44%) of the P. dactylifera and E. guineensis markers, respectively, show a successful amplification. None of the amplification products generated with P. dactylifera primers display genetic polymorphism in our B. aethiopum test panel. Among E. guineensis-derived SSR markers however, two, namely ESSR566 and ESSR652, display polymorphism. However, it must be noted that depending on the DNA sample the ESSR566 primer pair generates a variable number of amplicons with distinct sizes, which may be an indication that more than one locus is targeted.

Table 1 Summary of SSR markers transferability assessment

Overall, during this phase of the study we detect polymorphism in our B. aethiopum test panel with only 2 (11.1% of successfully amplified markers, 2.5% of total) of the palm SSR primer pairs assayed. Only one of these markers, namely ESSR652, enables unambiguous detection of microsatellite locus polymorphism in B. aethiopum, and might therefore be used for studying genetic diversity in this species.

De novo identification of microsatellite sequences in the B. aethiopum genome and assessment of potential SSR markers

In order to enable a more precise evaluation of genetic diversity in B. aethiopum, we developed specific B. aethiopum markers from de novo sequencing data. A total of 23,281,354 raw reads with an average length of 250 bp have been generated from one MiSeq run. Raw sequence reads have been trimmed resulting in 21,636,172 cleaned-up reads, yielding 493,636 high-quality reads after filtering (Q > 30) from which 216,475 contigs have been assembled.

From the contigs, the QDD software identifies a total of 1618 microsatellite loci (Additional file 1), of which 1327 (82.01%) are perfect (i.e. repeat size 4 bp or smaller and repeat number 10–20). Among the perfect microsatellite loci, 83.86% are composed of di-nucleotidic repeat units, 13.06% of tri-nucleotidic units, 2.39% of tetra-nucleotidic repeats and 0.67% of repeats with five nucleotides and over. From these, we selected SSR markers composed of di- (AG) or tri- nucleotide repeats, using the following criteria for specific amplification of easily scorable bands: primer lengths ranging from 18 to 22 bp, annealing temperatures 55–60 °C, and predicted amplicon sizes 90–200 bp.

The characteristics of the 57 selected primer pairs and the results of the test amplifications are presented in Table 2. Successful amplification of B. aethiopum DNA is obtained for 54 (94.7%) primer pairs and of these, 34 (60.0% of amplifying couples) show no polymorphism. The remaining 20 primer pairs enable the amplification of polymorphic products, however nine of them yield complex, ambiguous amplification profiles that prevent their use for reliable detection of genetic variation. As a result, 11 putative B. aethiopum SSR markers (representing 20.4% of primer pairs associated with successful amplification and 55.0% of those detecting polymorphic products in our study) are both polymorphic and unambiguously mono-locus in our amplification test panel and may therefore be used for further analyses.

Table 2 List of selected primer pairs targeting putative B. aethiopum microsatellite loci and assessment of their polymorphism detection ability

Microsatellite-based characterization of genetic variation of B. aethiopum in Benin

The newly identified set of 11 B. aethiopum-specific SSR markers has been used for the characterization of genetic diversity in our full panel of 180 individual samples from nine locations distributed across Benin (Fig. 1).

Fig. 1
figure 1

Sampling locations of the Beninese B. aethiopum used in this study

Map generated from publicly available resources of the Institut Géographique National du Bénin (IGN; https://geobenin.bj/) and the “Major Rivers of the World” dataset from the World Bank Data Catalog (https://datacatalog.worldbank.org/dataset/major-rivers-world; Creative Commons Attribution 4.0 International license), using the ArcGIS software by ESRI (www.esri.com).

As shown in Table 3, among our sample set the number of alleles per microsatellite locus ranges from 2 for marker Mbo41 to 6 for markers Mbo34, Mbo35, and Mbo50, with an average value of 4.27, whereas expected heterozygosity (He) values range from 0.031 (marker Mbo56) to 0.571 (marker Mbo35). Using these markers, the analysis of genetic diversity (Table 4) shows that the number of polymorphic markers detected at the microsatellite loci investigated ranges from 8 (sites of Togbin and Malanville) to 10 (Savè, Agoua, Pendjari, Pingou and Trois Rivières), with a mean value of 9 ± 0.865. With the exception of Savè, Hounviatouin and Malanville, 1 to 3 private alleles of the targeted microsatellite loci are observed in most sampling locations. Regarding the genetic parameters, the number of effective alleles (Ne) ranges from 1.447 to 2.069 with an average number of 1.761. He values range from 0.263 (Hounviatouin) to 0.451 (Savè) with an average value of 0.354 whereas the observed heterozygosity (Ho) varied from 0.234 (Togbin) to 0.405 (Pingou) with an average value of 0.335. Negative values of Fixation index (F) are obtained for Pingou, Malanville and Trois Rivières whereas positive F values, indicating a deficit of heterozygosity, are observed in all other sites investigated.

Table 3 Characteristics of 11 newly identified polymorphic microsatellites markers used for genetic diversity analysis of B. aethiopum
Table 4 Mean diversity parameters for each of the nine B. aethiopum sampling sites

Population structure of B. aethiopum in Benin

Nei’s genetic distance among locations (Table 5) ranges from 0.073, as observed between Togbin and Hounviatouin (Guineo-Congolian region), to 0.577 between Togbin (Guineo-Congolian region) and Trois Rivières (Sudanian region). Overall, genetic distances between B. aethiopum sampling locations are lowest within the same region, with the lowest genetic distances among the sites of Pendjari, Pingou, and Trois Rivières which are all located in the Northern part of the country. One interesting exception is the Central (Guineo-Sudanian) region of Benin, where we find that the most genetically distant location from Savè is the one from the Agoua forest reserve (0.339). Surprisingly, Savè displays its highest genetic identity value when compared to the other two collection sites located within protected areas, namely Pendjari (0.870) and Trois Rivières (0.882) which are both located in the Sudanian region. This is an unexpected finding considering the geographic distances involved.

Table 5 Pairwise location matrix of Nei’s genetic distance and genetic identity values

A similar structure of genetic distances emerges from the analysis of pairwise location genetic differentiation (Fst) (Table 6), suggesting genetic differentiation according to geographic distances between collection sites, with the notable exception of the lower genetic differentiation between samples from Savè and those from either one of the forest reserves in the Northern region, namely Pendjari and Trois Rivières.

Table 6 Pairwise sampling locations Fst value

In order to assess the strength of the relationship between genetic and geographic distances, we plotted them as a linear regression and performed the Mantel permutation test. As shown in Fig. 2, the positive correlation between both variables is weak, but significant (R2 = 0.1139, P = 0.040).

Fig. 2
figure 2

Correlation between pairwise Fst vs. pairwise geographical distance

The results of the non-hierarchical AMOVA (Table 7) show that within-site variation underlies the major part (53%) of total variance, whereas among-sites and among-regions variations explain genetic variance to a similar extent (23 and 24%, respectively). Accordingly, the average Number of migrants between collection sites (Nm = 1.019) is low, indicating very limited gene flow.

Table 7 AMOVA results

Hierarchical analyses performed with K = 2 and K = 3, respectively, yield an identical proportion of genetic variation at the within-individual level (62% of total; Table 7). Analysis using K = 3 allows for a balanced representation of variation between the among-regions and among-sites scales (16% of total variance for each), whereas among-regions variation is not as well accounted for under K = 2 (7% of total variance, vs. 24% for among-sites variation).

The Principal Coordinates Analysis (PCoA) of 180 B. aethiopum samples (Fig. 3a) shows that the first axis (accounting for 24% of total variation out of a sum of 33.90 for axes 1 and 2) roughly separates individual samples in two main groups, a result that is in agreement with the analysis of genetic distances. The sampling locations-based PCoA (Fig. 3b) confirms the genetic separation along the first axis (accounting for 44.08% of total variation over a total of 61.06% for the sum for axes 1 and 2) between sites from the Guineo-Congolian (Southern) region, plus the sites of Agoua and Biguina (Center) vs. sites from the Sudanian (Northern) region, plus the site of Savè (Center). Although the distinction is not as clearly marked, the second axis (accounting for 16.98% of total variation) further allows to distinguish two subgroups within the first group, corresponding to sites belonging to the Southern region and to those from the Central one, respectively.

Fig. 3
figure 3

Principal Coordinates Analysis (PCoA). a PCoA analysis of individual B. aethiopum samples. b Sampling locations-based PCoA

Likewise, the Bayesian analysis of our data indicates an optimal value of K = 2 for the clustering of the samples into two groups (Fig. 4a and b): one group that includes samples from Togbin and Hounviatouin in the Southern part of the country, as well as most samples from Biguina and Agoua at the Western (Togolese) border of the Center region; and one group composed of the majority of samples collected in Savè (Eastern part of the Center region) and from the Northern locations of Pendjari, Pingou, Malanville, and Trois Rivières. Since there is a possibility that the ΔK method used for estimating K leads to over- or under-estimated values [50], clustering with higher values of K have also been tested. As is apparent in Fig. 4b, for values of K = 4 and above standard deviations increase considerably, therefore we present results for both K = 2 and K = 3 (Fig. 4c; see also Additional Figure 4 for the summary of the complete analyses with K = 1 to K = 10). As previously observed with the location-based PCoA, under K = 3 further clustering emerges within the first group, involving samples from Togbin and Hounviatouin (South) and those from Biguina and Agoua (Center), respectively.

Fig. 4
figure 4

Bayesian cluster analysis. a Determination of the optimal value of K from Structure Harvester. b Evanno plot. c Bar plot representations of Bayesian STRUCTURE analysis of Beninese B. aethiopum samples with K = 2 (top) or with K = 3 (bottom) generated with CLUMPAK. Sampling sites are displayed along the horizontal axis

The Unweighted pair-group method with arithmetic mean (UPGMA) tree constructed from our data (Fig. 5) distinguishes two main groups matching the ones defined through the Bayesian analysis with K = 2, and which are supported by bootstrap values above 50. Within each of these groups, subgroups corresponding to those observed with K = 3 clustering and that globally match geo-climatic regions (Savè excepted) can further be defined. However, in this case most bootstrap values attached to these secondary branches are not significant.

Fig. 5
figure 5

UPGMA dendrogram based on genetic distances between Beninese B. aethiopum sampling locations. Bootstrap values supporting each branch are indicated on the nodes

Discussion

In flowering plant, the efficiency of cross-species transfer of SSR markers is highly variable among taxa, especially when important differences in genome complexity exist between the marker source and the target [51]. Nevertheless, this method has been used successfully for accelerating the analysis of genetic diversity in many plant species, including palms [11, 52,53,54]. In the present study, we find that the transferability rate of microsatellite markers developed in other palms genera to Borassus aethiopum, i.e. their ability to successfully amplify genomic DNA from the latter species, is very low. Indeed, among the 80 primer pairs designed on either Elaeis guineensis, Phoenix dactylifera or Cocos nucifera, we observe that only 18 (22.5%) produce amplicons from B. aethiopum. This percentage is very low when compared to both the inter-species and inter-genera transferability rates that have been found in similar studies targeting other palm species: from 17 to 93% in a panel of 32 palm species [49], 75% from E. oleifera to E. guineensis [54], 86% between the wooly jelly palm (Butia eriospatha Mart.) and related species Butia catarinensis [55] and up to 100% in the licuri palm (Syagrus coronate Mart) [56]. When considering other plant families, our transferability rate is also markedly lower than both the average rate of 50% found by Peakall et al. [57] within the Glycine genus and among Legumes genera, and the overall rate of 35.2% calculated by Rossetto [58] for within-family transferability among Gymnosperms and Angiosperms. The low transferability rate in our study might be explained in part by the fact that we used markers originating from genomic sequences. Indeed, as pointed out by Fan et al. [1], such markers have a lower transferability rate when compared to Expressed Sequence Tags (ESTs)-derived microsatellites due to the higher inter-species sequence variability within non-coding vs. coding sequences. Similarly, it is plausible that differences in genome size and complexity among palm species and genera account for our difficulty to identify palm SSR markers that successfully amplify in B. aethiopum. As a matter of fact, the size of the B. aethiopum genome, as determined by flow cytometry (1C = 7.73 Gb; Jaume Pellicer, unpublished data), is 3.2 to 11.5 times larger than those of the microsatellite source species used in the present study: P. dactylifera genome is estimated to be 671 Mb [39] whereas the E. guineensis genome is 1.8–1.9 Gb [37, 59] and C. nucifera genome is 2.42 Gb [42]. It is possible that these differences in genome sizes among related diploid plant species rely on differences in transposable element (TE) content, which in turn might have induced structural alterations throughout the genome through indels, copy number variations and recombinations [60, 61]. The illustration of such a mechanism working at the intra-genus level has been provided by cultivated rice species Oryza sativa L. and its wild relative Oryza australiensis [60]. Ultimately, TE-induced structural variations may have a negative effect on the cross-species amplification ability of some of the SSR primers. Indeed, in a recent study Xiao et al. [49] showed that over 70% of the conserved microsatellite loci between E. guineensis and P. dactylifera are located within genic regions of the genome with low TE content, and which are therefore less likely to be submitted to TE-dependent structural variations. More generally, gaining a better understanding of genome structures within the Borassus genus could also help reconcile our results with previous published reports of successful transfer of SSR markers developed from other palm sources to Borassus flabellifer (see references cited in Table 8, Methods section). Indeed, since the genome size of B. flabellifer (7.58 Gb; Jaume Pellicer, unpublished data) is only marginally smaller than that of B. aethiopum, significant differences in genome composition may be underlying the lack of SSR transferability between both species.

Table 8 Characteristics of the palm SSR markers tested for transferability to B. aethiopum

In any case, from the low number of successfully transferred microsatellite markers we could only identify one displaying polymorphism in our B. aethiopum test panel, making it impossible to rely on for analysis of genetic diversity. Still, the fact that so little microsatellite polymorphism (2 out of 18 amplifying primer pairs: 11.1%) could be detected in this subset of 20 palms sampled across different locations throughout Benin is somewhat surprising and its reasons remain to be elucidated. In addition to possibly being a symptom of habitat fragmentation, this low diversity might also result from the extremely long juvenile phase that has been attributed to this palm species. Indeed, floral maturity has been reported to occur 30 to 50 years after germination [68]. The manner of seed and pollen dispersal, which have so far not been studied extensively in B. aethiopum, might also play a role. Indeed, in pollen-mediated gene flow species, the distance the pollen travel is of importance in the occurrence of crossing between populations [69, 70].

Regarding the development of novel SSR markers, our results are similar to other studies based on the use of high-throughput sequencing techniques in species where very little information is available [22, 71]. We identified 57 microsatellite loci, from which we selected 11 markers displaying polymorphism that were used to assess the genetic structure of B. aethiopum sampled from different sites in Benin. We find low genetic diversity, with an average He value (0.354) that is substantially below those reported for B. flabellifer (0.417) [45] and for other non-timber forest products such as Khaya senegalensis (0.53) [72] and Phyllanthus sp. (0.607 and 0.582 for Phyllanthus emblica and Phyllanthus indofischeri respectively [73]. The positive F value that we observed in the majority (6 out of 9) of locations in the present study indicates an overall deficiency of heterozygotes across sites. This deviation from the Hardy-Weinberg equilibrium (HWE) might reflect low gene flow through pollen and seed dissemination, leading to crosses between related individuals, as supported by the low average number of migrants between sites. Accordingly, our data reveal limited genetic distances among collection sites, with values that are lower than those reported for others palm species. Indeed for B. flabellifer, genetic distances ranged from 0.716 to 0.957 [74] and among natural E. guineensis accessions an average of 0.769 was observed [75]. Both our Fst values and AMOVA analysis point to intra-site differentiation as being the main source of genetic variation.

As illustrated by the global agreement between our PCoA and Bayesian analyses, Beninese B. aethiopum samples cluster into two main groups that are mostly dependent on geo-climatic regions and geographic distances between collection sites, although the correlation between genetic and geographic distance is poorly significant. There might be further genetic separation between Southern B. aethiopum samples and those from the Central sites of Agoua and Biguina, resulting in the splitting of one group into two subgroups. However, with our current dataset it is not possible to achieve this level of discrimination in our analyses. Additional sampling campaigns from intermediate locations in the Central and Northern regions will be necessary in order to make progress on the subject.

Among the nine locations studied in Benin, samples from Savè appear to be the most diversified (He = 0.451) and constitute the exception to the general distribution according to geographical distances. This site located in the Sudano-Guinean transition zone of Benin is currently the most active for the production of B. aethiopum hypocotyls, and it acts as a supplier for the whole national territory ([76]; V.K. Salako, personal communication), suggesting that it might be the largest population of B. aethiopum in the country. Moreover, individuals sampled in Savè appear to be genetically distinct from those sampled in other locations of the Central region and closer to those originating from the Northern region, despite the considerable geographical distances involved in the latter case. A part of the explanation for the genetic distance observed between the Western (i.e. Biguina and Agoua) and the Eastern (Savè) collection sites within the Central region may reside in their physical separation by the Ouémé river, which further forms a natural corridor between Savè and the sites of Trois Rivières and Malanville in the North-East (see Fig. 1) [77]. We postulate that seed dispersal by humans and/or animals along this corridor might have played a major role in the observed pattern of genetic diversity and explain the singularity observed in Savè. As a matter of fact, members of the Bariba ethnic group, who live in the Eastern part of the country up to Malanville, share strong historical ties with the Shabè people from Savè, and exchanges between both groups are frequent [78]. The same corridor is also used annually for transhumance by the Fulani people [79], for whom B. aethiopum is an important plant: the role of their mobility in the dispersal of the plant, similar to what has been proposed for Caesalpina bonduc [80], is therefore plausible. Regarding the impact of animal migrations, Salako et al. [31, 32] detected the presence of B. aethiopum seeds in elephant dungs and hypothesized that elephants may have played important role in the seed dissemination for this species through fruit consumption and long-distance herd migrations. In support to this assumption, Savè is part of a continuous forest corridor connecting with the Northern region that was likely used by elephants in their migrations. Up until 1982, the seasonal occurrence of the animal has been reported in the Wari-Maro forest of Central Benin [81].

The specific microsatellite markers developed in this study from the partial genomic sequencing of B. aethiopum appear to be efficient to assess the genetic diversity and population structure of this species. Additionally, and provided that genome divergence is not too extensive to allow marker transferability, our SSR markers may also been used in a palm species that belongs to the same genus and that is reported to share parts of its distribution area, namely Borassus akeassii B.O.G., which has long been confused with B. aethiopum due to its similar morphology [82]. High-throughput sequencing techniques are an effective way of developing new microsatellite markers in plant species without significant molecular data. The increasing technical performances and financial affordability of these technologies make it feasible to overcome the difficulties arising in case studies such as ours, where marker transfer was proved to be limited or ineffective.

Conclusions

To our knowledge, the data presented in the present article constitute the first sizeable molecular resource available for Borassus aethiopum, which we have made available to the scientific community at large in order to facilitate the implementation of an increasing number of studies on this palm species. Using 11 newly identified SSR markers, we have also performed the first analysis of the genetic diversity of B. aethiopum in an African country, which we see as a first step towards the elaboration of an evidence-based strategy for sustainable resource management and preservation in Benin. Our results support the hypothesis that pollen and seed dispersal mainly occur within sites, leading to crosses among related individuals. The exception to this general rule in the region of Savè (Center) seems to indicate long-range transfer of genes as a result to animal and/or human movements towards and from forest reserves of the North. Further research into the characteristics of these migrations and their impact on gene flow among B. aethiopum populations is required in order to confirm this assumption. As a complement to the present work, the acquisition of agro-morphological data is currently under way, in a bid to elucidate the reproductive development and breeding system of the species. As a longer-term perspective, we also plan to extend our analysis of B. aethiopum diversity to the West African sub-region, and leverage the data acquired to improve knowledge of other species within the Borassus genus, and of palms diversity as a whole.

Methods

Plant material sampling and DNA extraction

Samples of Borassus aethiopum were collected from wild populations in nine distinct sites (three located in protected forest areas, six in farmlands) that were distant from each other by at least 50 km and which spanned the three main climatic regions encountered in Benin (Fig. 1). According to White [83], Benin covers three contrasted climatic regions which are the Sudanian region in the North, the Sudano-Guinean region in the Center and the Guineo-Congolian region in the South. Along a South-North gradient, the rainfall regime switches from bimodal to unimodal, the climate becomes globally drier [29] and the density of B. aethiopum distribution increases [31]. At each location, young leaves from 10 male and 10 female adult trees separated by at least 100 m were collected and stored in plastic bags containing silica gel until further processing. The complete list of samples and their characteristics is available in Additional file 2.

Genomic DNA was extracted from 250 mg of leaves ground to powder under liquid nitrogen using the Chemagic DNA Plant Kit (Perkin Elmer, Germany), according to the manufacturer’s instructions on a KingFisher Flex™ (Thermo Fisher Scientific, USA) automated DNA purification workstation. Final DNA concentration was assessed fluorometrically with the GENios Plus reader (TECAN) using bis-benzimide H 33258 (Sigma-Aldrich) as a fluorochrome.

Transferability of palms microsatellite markers: selection and amplification

A total of 80 SSR markers from previous studies were selected for assessment of their transferability to B. aethiopum: 44 developed for Phoenix dactylifera [67]; 25 developed for Elaeis guineensis [44, 62]; and 11 developed for Cocos nucifera [65]. The respective sequences and origins of these primer sets are displayed in Table 8.

Transferability of the 80 palm SSR markers was assessed on a representative subset of 20 B. aethiopum individuals sampled at the different locations, plus four positive controls from each.

source species for these markers (i.e. P. dactylifera, C. nucifera, and E. guineensis). Microsatellite amplification was performed with a modification of the M13-tailed Primers protocol [63] adapted to the use of fluorescent labelling [64]. The PCR reaction was performed on 20 ng of leaf DNA in volume of 20 μL with the following final concentrations or amounts: 1X PCR buffer, 200 μM dNTP, 2 mM MgCl2, 0.4 pmol M13-tailed forward primer, 4 pmol M13 primer, (5′-CACGACGTTGTAAAACGAC-3′) fluorescently labeled at the 5′ end with FAM, HEX or TAMR, 4 pmol reverse primer, and 0.5 U of KAPA Taq polymerase (Sigma-Aldrich). The following program was used: 3 min of initial denaturation at 95 °C, followed by 35 cycles of 30 s at 95 °C, 30 s at 50 °C and 72 °C for 1 min and a final extension at 72 °C for 5 min. The resulting amplification products were then diluted to 1/10th, mixed with 0.5 μL of an internal size standard (GeneScan 500 ROX, Thermo Fisher Scientific), and denatured for 5 min at 94 °C prior to separation through capillary electrophoresis on an Applied Biosystems 3500 Genetic Analyzer (Thermo Fisher Scientific). Amplification products visualization was performed using the GeneMapper software version 3.7 (Applied Biosystems).

De novo identification of microsatellite loci in the B. aethiopum genome, marker selection and diversity analysis

One B. aethiopum leaf sample (originating from the Togbin site) was randomly selected and used for genomic DNA purification according to the protocol of Mariac et al. [84]. The DNA was then used for the construction of an Illumina paired-end library, as described in Mariac et al. [85], before high-throughput sequencing on a MiSeq v3 platform (Illumina; average read size 250 bp). Demultiplexing of the raw data output was performed using the Maillol script (https://github.com/maillol/demultadapt), with a 0-mistmatch threshold. Adapters were eliminated using Cutadapt version 1.10 [86]. (http://code.google.com/p/cutadapt/) with the following parameters: overlap length = 7, minimum length = 35 and quality = 20. High-quality reads (Q > 30) were filtered using the following script: https://github.com/SouthGreenPlatform/arcad-hts/blob/master/scripts/arcad_hts_2_Filter_Fastq_On_Mean_Quality.pl and the resulting filtered reads were deposited into GenBank under BioProject ID PRJNA576413. Paired-end reads were then merged using FLASH version 1.2.11 (https://github.com/SouthGreenPlatform/arcad-hts/blob/master/scripts/arcad_hts_3_synchronized_paired_fastq.pl). Finally, microsatellite motif detection and specific primer design were carried out after elimination of redundant sequences using the QDD software version 3.1.2 [87] with default settings (detailed in Additional file 3).

Using selected primer pairs, test amplifications were performed with two randomly selected B. aethiopum DNA samples, then primers showing successful amplification were further tested for polymorphism detection among seven randomly selected DNA samples. The M13 Tailed Primers protocol described previously was used, with the following program: 3 min of initial denaturation at 95 °C, followed by 35 cycles of 30 s at 95 °C, 30 s at 55 °C and 72 °C for 1 min and a final extension at 72 °C for 5 min. Separation and vizualization of amplification products were performed as described previously. Finally, the primer pairs enabling successful and unambiguous amplification of polymorphic bands were used for the analysis of genetic diversity among the complete set of 180 B. aethiopum individuals under the same PCR conditions.

Data analysis

Amplification products were scored using the GeneMapper software version 3.7 (Applied Biosystems) and only unambiguous amplification products were considered for data analysis. Genetic diversity parameters were calculated for each locus and each sampling location using the GenAlEx software version 6.502 [88]. Expected heterozygosity (He) was calculated using the formula:

$$ \mathrm{He}=1-\sum {\mathrm{p}}_{\dot{\mathrm{i}}}^2 $$

where pi is the frequency of each allele. The fixation index (F) was calculated as:

$$ \mathrm{F}=1-\frac{{\mathrm{H}}_0}{\mathrm{H}\mathrm{e}} $$

where Ho is observed heterozygosity and He is expected heterozygosity [89].

F-statistics analysis assessing genetic differentiation (Fst), genetic identity, number of migrants (Nm) [90] and non-hierarchical analysis of molecular variance (AMOVA) for estimating genetic differentiation within and among locations were performed with the same software. Allelic richness was calculated using the SPAGeDi software version 1.5 (http://ebe.ulb.ac.be/ebe/SPAGeDi.html [91];). Consecutively to K determination (see below), successive hierarchical AMOVA analyses were carried out with K = 2 and K = 3. The Mantel permutation test was used for assessing the correlation between genetic and geographic distances between sampling sites [92, 93]. Two Principal Coordinates Analyses (PCoA) enabling the visualization of genetic variation distribution across individuals and sampling sites, respectively, were performed using GenAlEx.

The STRUCTURE software version 2.3.4 [94] was used for the determination of the most probable number of clusters for population structure (K value). Using the admixture model, eight simulations were performed for each inferred K value, with a running length composed of 300,000 burn-in periods and 50,000 Markov chain Monte Carlo (MCMC) replicates. The output from this analysis was then used as input in the Structure HARVESTER online program version 0.6.94 (http://taylor0.biology.ucla.edu/structureHarvester/) to determine the optimal value of K using the ΔK method of Evanno et al. [95] and allowing for different estimates of K in accordance with Janes et al [50]. Based on the resulting values of K, a clustering analysis of the studied sampling sites was performed and graphical output was generated using CLUMPAK’s main pipeline (http://clumpak.tau.ac.il [96];). In order to further assess genetic clustering, a UPGMA tree based on Fst values using 1000 bootstrap replications was constructed using the POPTREE2 software [97].