Background

Island floras are often characterized by high levels of endemism and groups of closely related but morphological and ecological divergent species that are mostly the result of single colonisation events followed by radiation e.g. [1, 2]. New Caledonia was cited as one of the 34 biodiversity hotspots recognized by Conservation International [3, 4]. Nearly 75% of the native flora is endemic [5], which is the fourth highest for an island [6]. While the continental part of New Caledonia (mainly Grande Terre) was entirely submerged during the Eocene (until 37 mya), a thick layer of heavy-metal-rich oceanic mantle accumulated [7]. Today, around one-third of the main island, Grande Terre, is still overlaid with ultramafic substrates. Generally, Grande Terre is a substrate mosaic [8], which is cited as one reason for the high level of endemism found there e.g. [9]. The climate in New Caledonia ranges from tropical to subtropical, and the main island is split by a mountain range into a humid eastern and a dry western part with prevailing winds and rain coming from the south-east. Taking climatic and geological factors together, Grande Terre has a wide range of environmental diversity. The main vegetation types in New Caledonia are evergreen humid forests, maquis, dry forests, littoral vegetation, and savannah [10].

One plant group that has taken advantage of many available habitats on New Caledonia is Diospyros, which is the largest genus (> 500 species in its broad circumscription [11]) of Ebenaceae, a pantropical family of woody plants. In New Caledonia Diospyros species range from sea level up to ca. 1250 m (the highest point New Caledonia is 1628 m), and species are found in all vegetation types except mangroves, with several species co-occurring in micro-sympatry (Table 1).

Table 1 Occurrence of Diospyros species in different habitats in New Caledonia

Diospyros colonised New Caledonia via long-distance dispersal at least four times [12]. In previous studies based on low-copy nuclear and/or multiple plastid markers [12, 13], it was possible to resolve phylogenetic relationships for the majority of Diospyros species, except for one group of endemics from New Caledonia. Of the 31 New Caledonian Diospyros species, 24 belong to this clade of closely related endemics. In previous analyses, this strongly supported group is related to species found on islands throughout the Indian and Pacific Oceans as far east as Hawai’i [12, 13]. However, due to extremely low levels of sequence divergence, it was not possible to tease apart relationships between these species (they formed a hard polytomy in most individual trees, and there was little informative variation that permitted clustering of pairs or groups of species). Most of these closely related species are morphologically and ecologically clearly differentiated (for examples see [13]), and several species are narrow endemics restricted to small areas.

Amplified fragment length polymorphism (AFLP; [14]) is a fingerprinting technique that has proven to be useful for revealing phylogenetic relationship among closely related taxa (e.g. Hypochaeris, [15]; Lactuca, [16]; Phylica, [17]; Trollius, [18]; Ranunculus alpestris, [19]; Puya, [20, 21]; Araucaria, [22]). In contrast to standard phylogenetic markers, AFLP variation is spread across the whole genome, spanning both coding and non-coding DNA regions and may therefore be more representative of overall genetic patterns present as well as being highly informative for phylogenetic analyses at the low phylogenetic level [23, 24]. Compared to other fingerprinting techniques AFLP shows increased reproducibility and does not require any prior knowledge of the analysed genomes. However, there are some detrimental issues to consider when working with AFLP data; these include potential non-homology and non-independence of fragments, asymmetry in the probability of loss/gain of fragments, and problems in distinguishing heterozygote from homozygote bands e.g. [23, 25]. Despite these difficulties, several authors have used AFLPs to reveal phylogenetic relationships corroborated by analyses of other types of data, especially for species that have diverged recently or radiated within a short period of time e.g. [15, 17, 23, 26].

In this study we focus on this group of closely related species of Diospyros endemic to New Caledonia (Figure 1). Our aim was to clarify species boundaries as well as phylogenetic relationships between these New Caledonian Diospyros species. Integrated in a broader context, the outcome of our research should help us better understand the factors behind and mechanisms of speciation and radiation on islands.

Figure 1
figure 1

Bayesian maximum clade credibility tree of New Caledonian Diospyros species based on plastid and nuclear DNA data (taken from Turner et al.[13]). Bold branches are supported (> 70% bootstrap and Bayesian posterior probability). Accessions in blue correspond to the white group found in Structure and PCO, green ones to the grey group (light blue/green accessions included in current data set, dark blue/green accessions failed in current analysis but colour indicates the group to which they most probably belong), accessions in black are not included in the present study.

Results

After excluding 186 replicates the final matrix used for analyses contained 192 individuals and 792 fragments. The AFLP profiles showed good reproducibility with a mean error-rate of 2.4% across all replicated samples. Because the focus of this study was on the phylogenetic relationships between species and species limits rather than intra-specific population genetics, we are presenting and discussing mostly the results of inter-specific relationships. We are presenting here only unrooted trees due to the low resolution of their backbone. We analysed the data using neighbour-joining (NJ) dendrograms and principal coordinate analysis (PCO) with different distance methods, and in both cases the Dice distance gave the highest resolution of relationships between species.

The NJ analysis resulted in a star-like dendrogram with a backbone of short branches lacking bootstrap support greater than 75%. All species except D. minimifolia, D. parviflora and D. vieillardii form single clusters in the NJ tree (Figure 2A). However, only eight (D. calciphila, D. cherrieri, D. inexplorata, D. impolita, D. pustulata, D. trisulca, D. umbrosa and D. yahouensis) of the 21 included species form clusters with bootstrap higher than 80%. The Bayesian inference (BI) produces a similar result. All species except D. labillardierei, D. minimifolia, D. pancheri and D. parviflora form single clusters in the BI tree (Figure 2B). Apart from D. flavocarpa, D. revolutissima, D. tridentata and D. vieillardii, all clustered species have high (> 0.95) posterior probabilities.

Figure 2
figure 2

Phylogenetic dendrograms inferred from the data collected in this study. Each species is shown in a different colour. Colours were selected randomly and do not indicate any grouping. A: Neighbour joining dendrogram based on Dice distances. Black dots indicate nodes with > 80% bootstrap support. B: Bayesian maximum clade credibility dendrogram. Black dots indicate nodes with > 0.95 Bayesian posterior probability. Picture credits: D. calciphila: H. Benoît, http://www.endemia.nc; D. cherrieri: C. Chambrey; D. erudita, D. pancheri, D. pustulata, D. umbrosa, D. vieillardii: D. & I. Létocart, http://www.endemia.nc; D. flavocarpa, D. minimifolia, D. revolutissima, D. sp. Pic N’ga: J. Munzinger; D. glans, D. parviflora: J.-L. Ruiz, http://www.endemia.nc; D. impolita: J. Barrault, http://www.endemia.nc; D. labillardierei: B. Turner; D. perplexa, D. yahouensis: V. Hequet; D. veillonii: R. Amice, http://www.endemia.nc.

PCO separated accessions into two main groups (hereafter named “white” and “grey”) that can be subdivided into six subgroups (Figure 3). Within the “white” group (defined in the Structure results below) subgroup one includes D. vieillardii (individuals indicated by squares in Figure 3), subgroup two D. calciphila (triangles) and subgroup three the rest of the individuals from this group (circles). In the “grey” group (more extensively described in the Structure results below) subgroup four included D. flavocarpa, D. umbrosa and D. vieillardii (indicated by squares in Figure 3), subgroup five D. erudita and D. glans (triangles) and subgroup six the remaining individuals (circles). A PCO of populations (not shown) based on the pair-wise F ST distances obtained from the AMOVA resulted in similar groups and subgroups of populations as those obtained from the individual-based PCO. Structure analysis gave the highest value of ∆K for K = 2 plus few other suboptimal K values (Figure 4A and B). However the latter contained clusters with negligible membership (“empty” clusters). Both K = 3 and K = 6 resulted in three visible clusters, with one cluster being only found in significantly admixed samples (Additional file 1). Visualisation of K =16 and K = 21 showed two clusters only and both analyses are highly similar to each other (Additional file 1). It has been argued the ad-hoc Evanno method [27] favours by default K =2 over K = 1 when searching for the correct number of clusters [28]. However, PCO separated individuals included in our analyses into two groups as well, and therefore we consider K = 2 as representative for our sample set. For K = 2, the allele-frequency divergence between the two groups was 0.0074. One group (“grey”) includes the majority of accessions (Figure 4C). The other group (“white”) includes D. calciphila, D. labillardierei (population 13 and accession BT179), D. minimifolia (majority of individuals), D. pustulata, D. sp. Pic N’ga, D. tridentata (accessions BT206 and BT207), D. veillonii (accession BT224) and D. vieillardii (population 37 [except accession BT017], population 39 [except accession BT100] and population 41). Seven individuals appear to be admixed (less than 90% identity with one of the groups); most of those are D. vieillardii. Several species (D. labillardierei, D. minimifolia, D. tridentata, D. veillonii and D. vieillardii) and even some populations comprise individuals belonging to each of the two groups.

Figure 3
figure 3

PCO of individual accessions based on Dice distances. Shading of the base-grid of the figure marks the two main groups inferred by Structure analysis – white and grey. Each species is shown in a different colour. Colours were selected randomly and do not indicate any grouping.

Figure 4
figure 4

Results of the Structure analysis. A) Delta K values of the K values tested. B) Mean Ln likelihood of K values tested. C) Clustering of K = 2. The two groups are marked in different shades (white and grey).

In order to quantify the amount of genetic variation between species we have performed a non-hierarchical AMOVA with species assigned as “populations”. This analysis showed as little as 30% of the variation to occur among the species. However, in the Structure, PCO, NJ and BI analyses several species seemed to be formed by genetically distinct populations assigned to different clusters and coming in distinct positions in the tree. To avoid mixing up of cryptic variation within a group, we run further AMOVAs with populations assigned as sample localities, despite the relatively low sample size per locality. Results of non-hierarchical AMOVA in this case indicate a higher level of differentiation between populations, resulting in an F ST of 0.38. There was no visible difference in gene diversity between stands of co-occurring species and isolated populations. Several hierarchical AMOVAs (except one based on the Structure results) were not significantly more informative than the non-hierarchical AMOVAs (Table 2). Grouping populations according to geography or ecology, explains a surprisingly low amount of the variation (1.4 – 1.6%). Furthermore, allocating populations to the 21 included species assigns a relatively high percentage of variation at the between-species level (19.4%), but with a highly similar F ST value to the non-hierarchical AMOVA results. When higher-level groupings paralleled the Structure results, we obtained the highest F ST value (0.4), albeit the percentage of variation between the two clusters as defined by Structure was only 9.5%, lower than the percentage of differentiation shown between species. Removing seven admixed samples (less than 90% membership form each of the two groups based on the Structure results) from the AMOVA gave nearly the same results as the analysis including them (Table 2).

Table 2 Results of different AMOVAs conducted

The average gene diversity over loci within populations ranged from 0.03 in D. erudita (population 4) to 0.12 in D. parviflora (population 22). Contrary to predictions, the highest number of polymorphic sites, pair-wise differences and average gene diversity were not found in the admixed populations (according to the Structure results) but in D. parviflora (for details see Additional file 2).

Discussion

“Explosive” radiations featuring rapid opportunistic morphological and ecological diversification are phenomena previously reported for some islands (e.g. [29] and references therein). Extreme ancestral bottlenecks, together with on-going hybridization and incomplete lineage sorting, can prevent phylogenetic reconstruction in cases of island radiations if they have been recent and produced many species [30]. However, a good understanding of phylogenetic relationships within radiating groups is key for further evolutionary studies into mechanisms and whether change is adaptive, due to drift in small populations or other phenomena [29].

For the endemic New Caledonian Diospyros species, previous studies, based on multiple plastid [12] and low-copy nuclear [13] markers, showed 21 species to be closely related (Figure 1) and were not able to clearly resolve phylogenetic relationships among them. In the combined data set (plastid and nuclear markers; [13]) only seven of the 21 species included were found to form highly supported groups of accessions from single species. Individuals belonging to each of the remaining 14 species failed to cluster according to their taxonomic circumscription. Dating analysis based on plastid and low-copy nuclear markers showed that the common ancestor of this clade of endemic New Caledonian Diospyros species has arrived in New Caledonia around nine million years ago [13]. Diospyros vieillardii has been shown to be sister to the rest of this endemic clade and separated from the rest of the species around 7.2 million years ago.

Results of the current study using genome-wide AFLP markers reveal that most species form unique groups paralleling recognised species. Around one-third (eight species, NJ dendrogram, Figure 2A) and one-half (11 species, Bayesian tree, Figure 2B) of the species, are genetically distinct with high support (Figure 2). However, the overall AFLP results prove unable to clearly resolve the backbone of trees, similar to previous results obtained from analyses of DNA sequence data [13]. Intra-specific variation was greater (~80%) than that found at inter-specific level (~20%). This low ratio of among- versus within-species divergence in the context of considerable morphological and ecological divergence is indicative of a recent diversification [22]. Such a process can explain why we were able to get clear species boundaries for most species but were unable to clearly resolve phylogenetic relationships between them.

Two species that did not form well-defined clades (D. minimifolia and D. parviflora) were previously considered by White [31] to show variability in leaf morphology that may indicate that they are in fact a collection of several species. For D. minimifolia White [31] mentioned that the type population (close to population 15 of this study) has smaller leaves compared to other populations of this species. In our results this population clusters together with the majority of the D. minimifolia accessions; the population that is separated from the rest (population 16) is from Gaji. According to White [31]D. parviflora is a wide-spread species, showing considerable variability of leaf morphology even within populations, making it impossible to differentiate these into different species. Our results show all accessions of D. parviflora, except those from Plateau de Tango (population 24), to form a group. All included accessions from D. parviflora are from ultramafic localities.

To our surprise, the AFLP results do not show any significant grouping according to ecological (edaphic, climatic, elevational), geographical or morphological factors (Additional file 3). The two weakly differentiated groups revealed by Structure and PCO also do not correspond to any conspicuous phenotypic characteristics. The allele-frequency divergence between the two groups found by Structure is low, which explains why we did not observe the two groups in the Bayesian and NJ tree-building results. Taken together, these results indicate that positive selection has perhaps acted on few genomic regions [32] and has resulted in phenotypic diversification of New Caledonian Diospyros. Variation in copy number of specific genomic regions may be an additional aspect of molecular variation that, although invisible to AFLP markers, could form the basis of adaptation to different environmental conditions [33].

The individuals of D. vieillardii, D. umbrosa and D. flavocarpa form a minimally isolated group (squares in the grey group) in the PCO (Figure 3). Previous phylogenetic analyses (Figure 1) showed these three species to be sister to the rest of the taxa. Due to its morphological and ecological features D. sp. Pic N’ga from Île des Pins could be a hybrid between D. calciphila and D. vieillardii, but D. vieillardii is now not known from this island. In PCO, individuals of this putative species are located between individuals of D. calciphila and D. vieillardii (Figure 3). The split between the two groups observed (Figures 3 and 4) could be relatively old, separating two lineages that developed in isolated regions. For instance, dry periods of the Pleistocene caused aridification in many areas, and some vegetation types persisted only in local refugia e.g. [3436]. After climatic conditions became more favourable, the two groups probably expanded rapidly into newly suitable habitats where they overlapped; the time scale of these fluctuations (ca. 0.02 – 0.1 myr; [37]) was probably not enough to allow woody species with long generation time such as Diospyros to diverge and become permanently reproductively isolated [22]. There are a few admixed individuals in the Structure analysis (Figure 4), which implies that hybridization might have played a role in evolution of this group.

Accelerated rates of evolution at few genes as a result of positive selection could have resulted in the morphological and ecological diversification apparent today in this group of New Caledonian Diospyros species. Furthermore, in addition to retention of ancestral polymorphisms, frequent gene flow could have acted against genome-wide genetic differentiation between the species. Barriers to gene flow between these species may be highly porous, with only few genes responsible for ecological and morphological adaptations evolving on distinct trajectories under strong selection, which leaves the rest of their genomes open to gene flow [38]. Finding these few genes with AFLP is realistically improbable because they are a miniscule component in comparison the rest of these genomes. In the case of a recent and rapid radiation in plants, it could be argued that the bulk of regions sampled by AFLP have not evolved quickly enough to accumulate substitutions that could indicate species relationships. Our results are similar to those found in various other island genera (e.g. Araucaria in New Caledonia, [22]; Ourisia in New Zealand, [39]).

Diospyros vieillardii, which is sister to the rest of the taxa belonging to this group of New Caledonian endemics [12, 13], is confined to ultramafic soils, which supports the hypothesis of this being an exaptation of the progenitor of this New Caledonian Diospyros clade to ultramafic soils when the whole island was still covered by heavy-metal-rich substrates; similar findings have been made in other plant groups in New Caledonia e.g. [9]. Later, erosion reduced the extent of this geological layer to one third of the island [7], and existing species began to move onto other substrates where they subsequently diverged, forming distinct species. Such observations have been made in various other New Caledonian groups (e.g. Araucaria, [22]; Spiraeanthemum, [35]; Codia, [40]). A few studies have examined the adaptive basis and processes involved in rapid radiations in New Caledonia e.g. [41] and Hawai’i (e.g. lobeliads, [42]; silverswords, [43]). Linking ecological parameters and/or phenotypic traits associated with speciation has to be done with caution because range alterations, subsequent evolution, and species extinctions might have erased initial signals found in only a few genes. Therefore, the associations observed today may be misleading, and the specific conditions/traits that were indeed linked to speciation, if any, may no longer be present [44].

Further work involving common garden experiments would provide insights into the effect of environmental conditions on morphological traits and therefore plasticity of genomes of the New Caledonian Diospyros species. Unfortunately, such experiments are time and cost intensive. It is difficult to obtain ripe fruits of all Diospyros species, and in addition it is difficult to germinate and grow them, which is a crucial aspect of conducting such experiments. Reciprocal transplantation of seedlings across environments are of course more easily conducted than common garden experiments, but they are still time consuming and costly; in addition species adapted to one soil type often will not survive when transplanted to other soil types.

Conclusions

Although New Caledonian Diospyros are morphologically and ecologically diverse, they show little genetic divergence (based on DNA sequences and AFLP data). In this case of the endemic clade of New Caledonian Diospyros, AFLP data did not provide enough information to resolve phylogenetic relationships between the species, but it was sensitive enough for testing for the presence of genetic species boundaries. However, the AFLP results exhibit a good correlation with morphology-based species concepts. Further studies of this New Caledonian Diospyros group with deeper sampling of the genome using next generation sequencing methods are needed to get a clearer picture of the processes that formed this group.

Methods

Material

Material from New Caledonian Diospyros species was collected on the main island (Grande Terre) and on a smaller island, Île des Pins. When possible, we collected five individuals per population. Collecting population samples from tropical trees/shrubs is not always easy because the trees can be tall (and leaves therefore out of reach) and individuals are often far from each other. Collecting ten individuals in an area of ten square meters also does not make much sense for a study like this because these individuals are probably offspring from the same mother plant. As the focus of the present study is on the phylogenetic relationships between the species and not on population genetics within species, the authors consider the small size of the samples we collected to be sufficient. For widespread species, we collected populations throughout their range. For distribution of sampling sites, see Figure 5. From samples where fertile material was available, a voucher was made with several duplicates sent to the herbaria at Noumea (NOU), University of Montpellier II (MPU) and the University of Vienna (WU). When sterile, one voucher per population was taken; this was compared to already existing collections in Noumea Herbarium (NOU) from the same location and referred to that species if similar. In total we included in the present study 231 individuals of New Caledonian Diospyros species, which correspond to 20 identified and one unidentified species (due to absence of diagnostic reproductive organs at the time of collection), giving 47 populations in total. Details of the 192 individuals (43 populations) for which we were able to get useable results are given in Table 3. Silica-gel-dried material was used for DNA extraction.

Figure 5
figure 5

Map with sampling localities. Dots indicate sampling sites; the numbers associated with each dot refer to the list of sampling sites on this figure. Those numbers are used throughout the present study to characterize sampling sites.

Table 3 Table of accessions; showing all individuals used in this study

DNA extraction

For DNA extraction, a modified sorbitol/high-salt CTAB method [46] was used (for details see [13]).

AFLP

Preparation and amplification of fragments followed the protocol of Vos et al. [14] with some modifications. Restriction of genomic DNA with two restriction enzymes and ligation of double-stranded adaptors to the resulting restricted fragments were performed in one step in a thermal cycler (Veriti, AB, Life Technologies; 37°C for 2 h followed by a 30 min hold at 17°C). Reactions comprised 1.1 μL 10x T4 DNA ligase buffer (Promega), 1.1 μL 0.5 M NaCl, 0.55 μL BSA (1 mg/ mL; New England BioLabs), 50 μM MseI adaptors (genXpress), 5 μM EcoRI adaptors (genXpress), 1 U MseI restriction endonuclease (New England BioLabs), 5 U EcoRI restriction endonuclease (New England BioLabs), 1 U T4 DNA ligase (Promega), and 0.5 μg DNA and were made up to a total volume of 11 μL with water. Ligated DNA fragments were diluted 10-fold with sterile water. Preselective amplification reactions contained 1.14 μL 10x RedTaq PCR reaction buffer (Sigma), 0.2 U RedTaq DNA polymerase (Sigma), 0.22 μL dNTPs (10 mM; AB, Life Technologies), 0.58 μL preselective primer pairs (EcoRI-A and MseI-CT, each 5 μM; Sigma), and 2 μL diluted restriction-ligation product, and were brought with water to a total volume of 10 μL. Amplification was carried out in the same machine used for restriction-ligation with the following profile: 2 min at 72°C, 20 cycles of 10 sec denaturing at 94°C, 30 sec annealing at 56°C, 2 min extension at 72°C, and a final extension step for 30 min at 60°C. The preselective PCR products were diluted 10-fold with sterile water. Reactions for selective amplification contained 0.5 μL 10x RedTaq PCR reaction buffer (Sigma), 0.1 U RedTaq DNA polymerase (Sigma), 0.11 μL dNTPs (10 mM; AB, Life Technologies), 0.27 μL MseI-primer (5 μM; Sigma), 0.27 μL EcoRI-primer (1 μM; Sigma), and 1 μL diluted preselective amplification product and were brought to a total volume of 5 μL with water. They were carried out in a GeneAmp PCR System 9700 (AB, Life Technologies) with the following profile: 1 min at 94°C, 9 cycles of 1 sec at 94°C, 30 sec at 65-57°C (reducing the temperature at 1°C per cycle), 2 min at 72°C, 25 cycles of 1 sec at 94°C, 30 sec at 56°C, 2 min at 72°C and a final extension for 30 min at 60°C. The selective PCR products were purified using Sephadex G-50 Superfine (GE Healthcare Bio-Sciences) applied to a MultiScreen-HV 96-Well Plate (Millipore) in three steps of 200 μL each and settled at 750 g (1, 1 and 5 min, respectively). The same speed was used for centrifugation of the samples (5 μL of each selective PCR product), again for 5 min. Two microliters of the eluate were combined with 10 μL HiDi and 0.1 μL GeneScan 500 ROX (AB, Life Technologies) and denatured for 3 min at 95°C before running them on a capillary sequencer (3130xl Genetic Analyzer, AB, Life Technologies).

The selective primer pairs (6Fam-EcoRI-AGC/MseI-CTGA, Vic-EcoRI-ATG/MseI-CTCG and Ned-EcoRI-ATC/MseI-CTGA) were chosen because they generated clear and not too many bands (thus decreasing the risk of fragments co-migrating by chance), with sufficient variability in preliminary tests. Although the genome size of the New Caledonian Diospyros species (1C-value: 1.5 – 2.3 pg; [13]) is smaller than the mean 1C-value of eudicots (2.7 pg, [47]), we found the AFLP profiles generated with Msel primers with four selective bases much clearer than those obtained from primers with just three selective bases.

Reproducibility was checked by repeating ca. 80% of the samples. This high number of repetitions was necessary because of initial difficulties with fragment sizing.

Scoring and phylogenetic analysis

Sizing and scoring of the data was performed with GeneMarker v2.2.0 (SoftGenetics). After pre-analysis using default settings, sizing profiles of all samples were checked and where necessary manually corrected. Most of these corrections concerned one of the following peaks of the size standard: 35 bp, 50 bp and 139 bp. These peaks were often not correctly recognized by the GeneMarker program. High-quality sizing-profiles (score > 90) were obtained for all samples. A panel of scorable fragments was established for each primer combination, and fragments between 65 – 510 bp were scored. The relative fluorescent unit (RFU) threshold was set at 40. Automatic scoring was conducted using Local Southern peak call, peak saturation, base line subtraction, spike removal, pull up correction, and a stutter peak filter of 5% (as described in [48]). The results were exported as presence/absence matrix. The outcome of the automatic scoring was manually checked and corrected for errors. These errors mostly concerned peaks for which shape was atypical. In total 486 samples corresponding to 231 individuals were scored. From 186 individuals replicate samples were performed (between two and five replicates per individual). The differences between the different samples (replicates) were counted and divided by the total number of phenotypic comparisons to get the error rate (calculated according to Bonin et al. [49]). After initial analysis (neighbour-joining, NJ) of the complete data set, replicates of samples and obviously failed samples were excluded from further analyses. As replicated samples of the corresponding individuals mostly clustered together, selection of samples from each individual for further analyses was random and not according to any pattern or protocol. For the final analyses we ended up with 192 individuals.

All three primer-combinations were combined in a single matrix and analysed together. Different distance measures were tested for their power to resolve relationships with our data set. Distance matrixes were calculated in PAUP* v4b10 ([50]; Nei-Li distance) and SplitsTree v4.12.6 ([51]; uncorrected P, Dice, corrected and uncorrected Hamming). Phylogenetic relationships based on previously mentioned distance matrices were reconstructed using SplitsTree v4.12.6 [51] to create unrooted NJ dendrograms. To assess robustness of branches NJ-bootstrap (NJ-BS) analyses were performed using SplitsTree v4.12.6 [51] and PAUP* v4b10 [50]. Bayesian inference (BI) was conducted with BEAST v1.7.5 [52], with two runs each 20 million generations, sampling every 1,000th generation and removal of the first 30% of trees as burn in.

To visualise the pattern of genetic clustering of individuals and populations, we plotted principal coordinate analysis (PCO) using the R-package scatterplot3d [53] based on an individual Dice distance matrix, and respectively, on AMOVA-derived pair-wise F ST distances calculated with Arlequin v3.5.1.2 [54]. To investigate further significant groupings of the included individuals we used the program Structure v2.3.3 [55, 56] on the Bioportal computing cluster of the University Oslo [57]. We ran Structure for K = 1–23 with 10 replicates each and a model based on admixture and independent allelic frequencies, without taking into account information regarding sampling localities. Each run had 3 million iterations with 10% additional burn in. The calculation of deltaK (∆K; [27]) and preparation of the input file for Clumpp was done with Harvester [58]. Production of a combined file from the ten replicates of the best K was perfomed using Clumpp v1.1.2 [59] with the full search algorithm. The graphical representation of Structure results was prepared with Distruct v1.1 [60].

Both non-hierarchical and hierarchical analyses of molecular variance (AMOVA) and calculations of population statistics were conducted using Arlequin v3.5.1.2 [54]. For hierarchical AMOVAs groups have been defined based on different possible clusterings (Additional file 4) according to Structure results, taxonomy, distribution patterns and ecological traits.

Availability of supporting data

AFLP presence/absence matrix and phylogenetic analyses are deposited in treeBASE under study 14798 (http://purl.org/phylo/treebase/phylows/study/TB2:S14798).