Background

The Caribbean has been an important model system for studying biotic over-water dispersal from continents and island colonization [1]-[4], as well as vicariance [5],[6] as mechanisms for the origin of diversity, and within-island diversification as mediators of species richness and endemism [7],[8]. The geological evolution of the region has certainly had a strong influence on the diversification of species there, and a general understanding of the former is crucial to an understanding of the latter.

The larger islands of the Greater Antilles (i.e. Cuba, Hispaniola, Jamaica and Puerto Rico) were repeatedly submerged until the mid/late Eocene (~40 Ma) [3],[6]. A general terrene uplift is likely to have occurred during the mid-Eocene and the early Oligocene (~45-30 Ma), and some authors hypothesized the existence of a land corridor connecting northern South America to the Greater Antilles and subaerial Aves Ridge (GAARlandia, ~35-33 Ma) [6],[9], although this is still under debate [10]. Hispaniola and Puerto Rico were physically connected until the formation of the Mona Passage, becoming fully separated during the late Oligocene to early Miocene (~30-20 Ma) [11],[12]. Later, during the early to mid-Miocene, the aerial connection between eastern Cuba and northern Hispaniola was interrupted by the expansion of the Windward Passage (~17-14 Ma) [6],[13].

Northern and southern Hispaniola paleoislands collided in the mid Miocene (ca. 15–10 Ma) [6],[14],[15], triggering the initial uplift of south-western Hispaniolan mountains as well as the significant elevation of the Cordillera Central [16],[17]. Multiple marine incursions in the Cul-de-Sac/Enriquillo depression repeatedly separated northern and southern paleoislands until the Plio-Pleistocene (~2.5 Ma) [15],[18]. Cuba was fragmented into distinct land blocks comprising the current western, central and eastern parts of the island until the late Miocene, when the closure of the Havana-Matanzas Channel began some 8–6 Ma [6]. On the other side, Jamaica was continuously submerged until ca. 12 Ma [19]. The western Jamaica land block was temporally aerial and connected to Central America during the early to mid-Eocene [6],[20], whereas eastern Jamaica (Blue Mountains Block) was apparently connected to GAARlandia through the southern peninsula of Hispaniola during ~35-33 Ma [4],[6]. Most Bahamian shallows and keys were repeatedly submerged during the Pliocene and Pleistocene (~4-0.5 Ma) [21].

The butterfly genus Calisto (Nymphalidae, Satyrinae, Satyrini) is the only satyrine group occurring in the Caribbean region [22],[23]. This genus exhibits remarkable radiation and significantly contributes to the high butterfly endemism seen in the region [24],[25]. The genus Calisto comprises 44 described species, all geographically restricted to single islands [23],[26]-[29]; 11 distributed in Cuba, 1 in Puerto Rico, 1 in Anegada Island, 1 in Jamaica, 2 in the Bahamas and the remaining 28 species occurring in Hispaniola. Molecular data has given insight into the cryptic condition of several taxa in Hispaniola [27], as well as assisted in determining the phylogenetic relationships of Cuban taxa [28].

Even though the monophyly of the genus appears to be clear [27], its position within the taxonomic tribe Satyrini has not been resolved. Morphological studies classify Calisto within the subtribe Pronophilina [23], closely related to the Neotropical genus Eretris[30],[31]. However, this has not been corroborated at the molecular level [32]. Certain morphological similarities have even led some authors to propose African affinities with, for instance, the subtribe Ypthimina (Satyrini) [33] and the satyrine tribe Dirini [34],[35].

Regardless of the phylogenetic position of Calisto, a continental origin of the genus is the most plausible explanation, as no other extant satyrine butterflies with the potential of being a closely related group are found in the Greater Antilles; thus, its ancestors would have necessarily arrived to the Caribbean from the nearby American continent [31],[33],[36]. Once Calisto colonized the Greater Antilles, further differentiation by vicariance [31],[37], within-island diversification [28],[36] or adaptive radiation [27] might have shaped the evolution of these butterflies.

In this study, we aim to elucidate the phylogenetic affinities and to identify the main drivers of the diversification and distribution of Calisto by using a secondarily calibrated molecular phylogeny. We also aim to reconstruct the historical biogeography of Calisto and to evaluate possible changes in diversification rates throughout the evolution of the genus. Intra-island differentiation appears to be an important factor for the radiation of these butterflies, a phenomenon observed in other Caribbean animal lineages [2],[4],[9],[38]-[40]. However, even though rapid diversification driven by ecological evolution is plausible explanation considering the diversity of Caribbean habitats, niche saturation and island size may have imposed diversification limits [38] which could have restricted the diversity and geographical distribution of Calisto.

Results

Systematics and divergence dates of Calisto

Our phylogenetic inferences using single gene datasets are congruent with the combined analyses, recovering the main clades within Calisto (Additional file 1). Moreover, the combined analyses were consistent regardless of the method used and the partitioning strategy (Figure 1). A summary of the dataset properties is presented in Table 1.

Figure 1
figure 1

BI consensus phylogeny using the combined dataset partitioned by gene. Support values are represented by symbols on the left of each node, where the upper symbol is the bootstrap (BS) support value from the ML analysis, and the left bottom symbol is the posterior probability (PP) of the Bayesian Inference (BI) from the gene partition analysis and from the partition-by-bins analysis on the right. Filled stars are strong support values of 0.95-1.00 and 90–100 for PP and BS respectively, stars are 0.85-0.94 and 75–89, filled circles are 0.75-0.84 and 65–74 whereas circles are 0.50-0.74 and 50–64. Dashes (−) are unresolved nodes on each analysis. Branch lengths represent expected substitutions/site estimated in the BI analysis.

Table 1 Partition strategies for phylogenetic analyses of the combined dataset

Calisto nubila split early in the evolution of the genus, becoming an old and separate entity. The lineage did not apparently diversify further within Puerto Rico, although C. anegadensis on Anegada Island might have been derived from it based on morphological similarities [26]. Three main monophyletic groups from Hispaniola are identified: the lyceius-, the confusa-hysius and the chrysaoros clades. Calisto zangis from Jamaica is likely to have had Hispaniolan ancestors as it belongs to the “lyceius clade”. The monophyletic group consisting of Cuban and Bahamian Calisto is closely related to Hispaniolan lineages such as C. arcas and the “chrysaoros clade”, although the relationship among them was not resolved with strong support (Figures 1 and 2). A revised checklist of the genus Calisto is presented in Table 2.

Figure 2
figure 2

Dated phylogram and a consensus biogeographical history. The ultrametric tree is scaled in Ma. Symbols on each critical node/branch are depicted as the most likely scenarios: vicariance, dispersal or founder-event. Colours on each symbol represent the level of support. Horizontal bars on nodes represent 95% credibility intervals. The phylogeny in the bottom left is the Satyrini tree, with the Calisto clade showing in red. Extant distributions of Calisto, following the subdivision of the Greater Antilles, are represented by coloured squares. The main geological events through time are depicted on top of the figure following the time scale in Ma. Lineage Through Time (LTT) plot of extant Calisto diversity (log scale) vs. time (Ma) is shown above the phylogeny, whereas the LTT of the Cuban clade is below the tree and the LTT of the Hispaniolan lineages is in the bottom of the figure. LTT plots follow the time scale of the phylogeny in Ma. Confidence intervals for LTT are displayed as coloured ranges.

Table 2 Revised checklist of the genus Calisto (Lepidoptera: Nymphalidae: Satyrinae: Satyrini)

The genus Calisto was not recovered within any valid Satyrini subtribes. Instead, our BEAST reconstructions place it sister to all sampled subtribes except Euptychiina with low support values (posterior probability around 0.60-0.65) (Additional file 1). The exclusion of the genus Euptychia (which apparently caused long branch attraction in a different dataset [32]) only increases the support for such a placement to moderate values (around 0.80-0.85). Using birth-death/Yule and normal/uniform as tree processes and calibration distributions respectively does not result in any significant difference in both tree topology and estimated ages (Additional file 1, Figure 2). Height posterior distributions displayed normally whereas summarizing the trees as means or medians height showed no significant difference. The crown age of Calisto is inferred at 31 Ma (±5 Ma) in all cases except when Yule process and the calibration normal distribution are used together, in which case the estimate is at 33 Ma (±7 Ma).

Historical biogeography reconstruction

There was no statistical difference in the global likelihood between the non-time-stratified analyses NS0 and NS1 (Table 3). Excluding unlikely area connections (NS1), resulted in a Puerto Rico-northern Hispaniola (PR-nH) distribution on the crown node of Calisto, whereas NS0 equally preferred PR along with both nH and sH (southern Hispaniola) (Table 4, Figure 3). Similarly, NS0 and NS1 were unable to discern between dispersal and vicariance for the origin of Cuban Calisto. The time-stratified analysis TS1 favoured vicariance over dispersal in all cases and TS2 inferred a PR-sH origin of Calisto and vicariance for the origin of Cuban diversity. However, TS2 analysis did not improve the global likelihood of the inference over TS1. Root optimizations significantly favoured a PR-sH distribution and vicariance as the cause of the Cuban clade split from its sister Hispaniolan lineages.

Table 3 Estimated parameters and global likelihoods on each of the biogeographical analyses
Table 4 Biogeographical reconstructions for the evolution of Calisto
Figure 3
figure 3

Geological history of the Greater Antilles and Bahamas and the evolution of Calisto. a) The crown node of extant Calisto occurred in the late Oligocene, and the split of Puerto Rico and Hispaniola coincided with the divergence of both faunas. b) In the middle Miocene, Hispaniola and Cuba were physically separated, promoting the isolation of lineages on both islands. c) The creation of new niche space in Hispaniola and Cuba triggered the radiation of Calisto by the mid/late Miocene; Cuban land blocks were unified and Hispaniolan mountain ranges were rapidly uplifted during the late Miocene. d) Temporal isolation/connection of areas within each island during the glacial/interglacial cycles of the Pleistocene. e) Present-day Greater Antilles coded and coloured as our biogeographical analyses. Maps were modified from [6]. Area connectivity and dispersal rates used in our biogeographical analyses are shown below each time period (a: 31–20 Ma, b: 10–20 Ma, c: 5–10 Ma, and d: 5 Ma to present). Upper-right of each table (a-d) are area-adjacency values as used in BioGeoBEARS and values in (e) were used in Lagrange C++. Dispersal probability, as used in TS analyses, are displayed below on each table. LD is long-distance dispersal including one extra area. Values of 0.0001 were assigned to LD involving more than one water barrier and extra areas.

The estimation of the parameter j (founder-event speciation) significantly improved the DEC models. The global likelihood of TS1 was improved using BioGeoBEARS because, in contrast to Lagrange C++, we were able to constrain the area-connectivity through time slices. NS1-j preferred dispersal in critical nodes, i.e. the colonization of Jamaica, Cuba and the Bahamas, as well as a widespread origin of Calisto (PR-nH-sH) followed by vicariance. However, from all four models used in BioGeoBEARS, Akaike weights and likelihood-ratio test (LRT) suggested that TS1-j had a higher probability of being the best model, followed by TS1. Dispersal to Jamaica and the Bahamas are fully recovered in both TS1 and TS1-j from BioGeoBEARS, whereas vicariance is favoured as an explanation of the origin of Cuban Calisto only in TS1 analysis.

Diversification processes within Calisto

The ΔAICRC critical value for small phylogenies, as estimated in laser, is 4 [47]. The observed value for Calisto is significantly higher than this threshold (ΔAICRC = 13), favouring a rate-variable diversification model. However, there was no statistical difference between the rate-variable models Yule-3-rates (Y3r) and the logistic density-dependent (DDL) (ΔAICY3r – DDL = 3.8). The diversification of the main Calisto tree excluding the Cuban lineage also fits the rate-variable process (ΔAICRC = 11) better, but there was not strong preference among DDL, Yule-2, and −3-rates (ΔAICDDL – Y3r = 2.0; ΔAICDDL – Y2r = 3.8) (Table 5).

Table 5 Diversification dynamics of Calisto as reconstructed by the R package laser

From all rate-variable models in DDD, only those with a whole Calisto shift under diversity-dependent process are preferred with Akaike weights higher than 0.1. One single shift in the K parameter (“clade-level carrying capacity”) at 14 Ma fits 2–3 times better than shifts in K along with speciation or extinction rates. The decoupling of parameters for the Cuban taxa alone from the main Calisto tree was not enough to explain the radiation of the genus. Cuban and Hispaniolan taxa analyzed separately did not have constant diversification rates; rates changed possibly due to increased speciation, diversity-dependence processes, or a combination of both (Akaike Weights were unable to discern among models). Including the number of missing taxa into the models when possible did not affect the recovered estimations (Table 6).

Table 6 Diversification dynamics of Calisto as reconstructed by the R package DDD

Discussion

Colonization of the Greater Antilles by Calisto

The variability in our dataset (39% and 28% of all characters were variable and phylogenetically informative respectively; Table 1) is similar to previous inter-generic studies in Nymphalidae [32],[48],[49], but relatively higher than intra-generic studies [50],[51]. The genus Calisto is most likely a “relict” satyrine group that might have colonized the Greater Antilles during the uplift of GAARlandia (~35-33 Ma) [32]. Our dating estimates, indeed, confirm that it is an old and independent lineage, and its crown age (31 ± 5 Ma) provides evidence in support of the GAARlandia origin. Previous attempts to date the diversification of Calisto were done based only on a pairwise substitution rate for mitochondrial evolution [27]. This latter study deduced younger ages (4–8 Ma) but did not actually carry out a timing of the divergence analysis, rather they only calculated pairwise genetic distances with Kimura 2-parameter without an adequate model testing.

It is not the first time that GAARlandia is invoked to explain butterfly geographic range expansion. It is the case for the nymphaline subtribe Phyciodina [52], the satyrine subtribe Pronophilina [32], and certain lineages within the papilionid tribe Troidini [53]. The idea of indirect over-water dispersal by “hitch hiking” on hurricanes or flotsams rafts seems unlikely. Adult butterflies respond to incoming bad weather by taking refuge [36] whereas a high mortality of eggs, larvae and pupae is observed when they are exposed to marine water [54]. Calisto, when compared to most other butterflies, are rather sedentary, and hence the direct and indirect dispersal capabilities of Calisto make a dispersalist model less likely.

According to Iturralde-Vinent’s vicariance model [6], after GAARlandia, Hispaniola and Puerto Rico split around 20–30 Ma, whereas in our study, extant Calisto species in both islands have their most recent common ancestor at 27 ± 5 Ma. Furthermore, the Cuban clade branched off from a Hispaniola lineage at 21 ± 4 Ma, but did not apparently diversify into any extant Calisto until 14 ± 3 Ma, while the last aerial connection between blocks of Hispaniola and Cuba existed until 14–17 Ma. Therefore, the evolution of Calisto is better explained by the main predictions of the Caribbean paleogeographical model of colonization rather than the stochastic dispersalist scenario.

The inclusion of Jamaica into the vicariance model is less supported by the paleogeographical reconstructions, although a remote connection between the Blue Mountains block with GAARlandia has not been discarded [4],[6]. The sole extant Jamaican Calisto split from its Hispaniolan sister taxa at 14 ± 4 Ma. At that time, large portions of Jamaica began to uplift and the entire island remained above water afterwards, and hence the colonization of Jamaica by rare long-distance dispersal events is the most likely explanation for the origin of the endemic sole species found there, Calisto zangis.

Historical biogeography of Calisto

Biogeographical reconstructions were significantly improved when we constrained dispersal probability and area-connectivity following the paleogeographic history of the Caribbean. Moreover, we found for the first time, statistical support for long-distance dispersal in the colonization of the Bahamas and Jamaica by estimating a founder-event parameter using a more general DEC model. Vicariance was recovered as the main explanation for the first diversification event of Calisto, although we did not find a fully supported dispersal/vicariance origin for the Cuban clade. Whereas NS1 and TS1 in BioGeoBEARS and TS2 did significantly recover vicariance, other analyses did not favour either dispersal nor vicariance. This could be due primarily to, first, the assumptions made by the models and, second, the different approaches to node reconstruction. In the first case, vicariance is favoured when incorporating area connectivity through time (TS) but dispersal is recovered by adding the parameter j (founder-event or long-distance dispersal speciation). In the second case, Lagrange infers ancestral states by local optimization whereas BioGeoBEARS reports ancestral states under the most likely model. This difference is evidenced in NS1 (analysis replicated using both software programs) where vicariance is only reconstructed under the global most-likely inference by BioGeoBEARS.

We believe, given the paleogeographic scenario and our dating estimations which correlate with the former, that the most plausible explanation for the colonization of Cuba is vicariance. Furthermore, the whole of extant Cuban diversity is monophyletic and sister to a Hispaniola lineage, as predicted by the vicariant model. Dispersal into Cuba 10–25 Ma, on the other hand, was not “long-distance” because both land blocks were quite close apart, if not physically connected. Thus, if dispersal were actually the main process, we would expect several independent Cuban lineages of varied ages surviving to the present (see extinction rates in “Diversification of Calisto” section) (Figure 2).

Vicariance driving speciation within islands is significantly recovered for Hispaniolan fauna during two instances, at 10–13 Ma and 4–6 Ma. The first vicariant instance is independently evidenced in two lineages with simultaneous shifts in ancestral ranges, the lyceius and the confusa-hysius clades. The dating estimates are congruent with the major uplift of the Cordillera Central which might have provided new ecological opportunities and created isolated populations [39],[40]. Presence of local adaptations are evidenced not only by the disjunctive distributions of several sister-species pairs found on the northern/southern Hispaniola paleoislands respectively (e.g. C. tasajera and C. schwartzi are allopatrically adapted to mesophilic and forested montane habitats in Cordillera Central (nH) and Sierra de Bahoruco (sH) [46],[55],[56]), but also by ecological niche restrictions. For instance, sister species-pairs within both major clades feed, as larvae, exclusively on distinct bunch grasses, and have morphologically adapted to specific altitudinal ranges. Species inhabiting lower altitude and warmer areas are smaller than their sister montane species [27],[46], suggesting an adaptation for thermoregulatory efficiency [57].

The second instance of vicariant process within Hispaniola occurred during the Pliocene as evidenced in the lyceius and chrysaoros clades. Although an uplift of the Cordillera Central might have played a role in separating populations, the most likely explanation for the northern/southern paleoislands distributions might be related to the inundation of the Cul-de-Sac/Enriquillo depression, which acted as an effective barrier. Ecological niche shifts might be another plausible explanation for the lyceus clade members having differentiated during the Pliocene. As larvae, they feed on the bunchgrass Uniola virgata, which provides a unique niche and would have required significant adaptations [56].

The crown node ancestral distribution of Cuban and Bahamian Calisto is recovered as “eastern Cuba (eC)”. Its sister taxa are Hispaniolan lineages that occur in the northwestern Cordillera Central (Massif du Nord in Haiti) [22],[46], which is the closest region to eastern Cuba. Dispersal to central and western Cuba from “eC” appears to be the likeliest biogeographic scenario [28], although vicariance as the main process is only detected in NS1 from BioGeoBEARS. Dispersal dates are in line with the closure of the Havana-Matanzas Channel at 8–5 Ma, as well as with the accretion of Bahamian shallows and keys in the Pliocene/Pleistocene [6]. The two Bahamian lineages have distinct ancestral areas, while C. sibylla has an older origin and its source area is “eC”, C. apollinis dispersed more recently from “western Cuba (wC)” (Figure 3).

Calisto diversification on the Greater Antilles and the Bahamas

The species richness of Calisto across islands is largely unequal. Such a pattern has been previously reported as the consequence of island size and age, ecological limits and habitat diversity [8],[38],[58]. Munroe [59],[60] pointed out that extant Calisto diversity is distributed unequally among islands more likely due to speciation rather than to differential immigration, and that extinction was extremely low, especially in Hispaniola. The calculation of diversification rates and ancestral states in this study suggested that the extant geographical distribution of Calisto reflects the rapid diversification within Hispaniola and Cuba during two instances, at 25 and 14 Ma, while inter-island flow was negligible for the entire genus.

Calisto is the most species-rich butterfly genus in the West Indies because it was able to expand its ecological niche (e.g. feeding on distinct bunchgrasses, tolerance to montane temperate and tropical conditions), which raised up the “ecological limits” on Calisto diversification. One sole change in the K parameter (“carrying-capacity for species diversity”) is enough to explain the evolution of the whole genus. The recovered date of this shift at 14 Ma is congruent with an increase in ecological opportunity in Hispaniola and Cuba and a time at which new environments were being created as a result of geological processes (e.g. uplift of Cordilleras, unification of Cuban land blocks) [39],[40]. The decoupling of clade “carrying capacity” and/or diversification rates of the Cuban lineage as the only explanation for the genus species richness is not supported. Nonetheless, the arrival of Calisto to an unoccupied island of Cuba did certainly provide for new heretofore empty niches to be colonized. The most likely scenario for such a decoupling was at 14 Ma, as recovered in DR1 analysis. However, because such a date is confounded with the availability of new niches in Hispaniola, a model including one single shift in K for the whole genus was preferred.

Adaptive radiation and the origin of island endemism of West Indies insects remain statistically untested. Under a phylogenetic framework, indirect evidence of adaptive radiation could be inferred based on diversification rate shifts: i.e. a rapid increase followed by a gradual reduction of diversification rate under a diversity-dependent process [61]. Calisto butterflies might have undergone two increases in diversification before they rapidly reached a “carrying-capacity” limit. The first one occurred during the uplift of Cordillera Central at 25 Ma (SR1 analysis), triggering a growth in Calisto diversification rate until all available niches were gradually occupied, at which time, probably, the speciation rate declined linearly with diversity (K = 12). It is unlikely that the extinction rate rose, as it was near zero in all of our estimations. The second major radiation took place at 14 Ma (discussed above), but it is more plausible that the diversification rate increased due to a shift in K rather than by a sole increase in speciation rate. Furthermore, the “Inf.” values of K recovered for the Cuban clade in DR analyses might be an indication that the diversification rate has not yet reached its “ecological limit”. The Cuban clade, when analyzed independently, better fits a 2-yule-rate, with 5 times larger speciation rate at 10 Ma than when the lineage branched off at 21 Ma.

An intriguing question is why the observed diversification dynamics of Calisto on Cuba and Hispaniola were not replicated on Jamaica and Puerto Rico, the third and fourth largest islands of the West Indies, respectively. Whereas Calisto are usually locally adapted to particular habitats within Cuba and Hispaniola, the single species on each of the other two islands are widespread. While some diverse Hispaniolan lineages feed as larvae on bunch grasses, the Puerto Rican C. nubila is adapted to widespread-wide-blade grass feeding [56]. According to Turner, similar, relatively adaptable oviposition behaviour is exhibited by C. zangis of Jamaica [62]. Perhaps in this indiscriminate behaviour lies the explanation for the fact that these two species were able to colonize their entire respective islands instead of forming separate disjunctive populations as did their Hispaniolan congeners. Such wide distribution and relatively good dispersal abilities of these relatively larger Calisto species (Sourakov, pers. obs.) may have increased gene flow and hence prevented divergence. Further research on the natural history, dietary preferences and behaviour of Calisto is necessary to corroborate our speculations.

Conclusions

The phylogenetic and biogeographical evidence presented in this study agrees with the Caribbean paleogeographical model of colonization (Figures 2 and 3). Vicariant models explaining the diversification of Calisto have already been proposed based on their extant geographical distribution [31],[33],[36],[60], although some authors had favoured the alternative dispersalist explanation [27],[63]. Here we observed that the evolution of Calisto passed through both vicariant processes and long-distance dispersals. However, the most important means for diversity origination in this largest genus of West Indies butterflies, was intra-island rapid radiation through key innovations (e.g. unusual larval hostplant, adaptation to montane, temperate and tropical conditions) and the availability of ecological niches triggered by environmental changes (e.g. accretion of mountain ranges, different island configuration and area-connectivity through time). Nonetheless, more rigorous tests and associations between ecological niche spectrum, phenotypic variability and selection within these butterflies are needed to give the adequate weight to abiotic factors (geographic and climatic) and niche specializations in the observed burst followed by a slowdown in diversification rates.

Methods

Taxon sampling

We included 36 out of the 44 described Calisto species (Additional file 2). Species sampling took place across the entire geographical distribution of the genus in the Greater Antilles, except for the Anegada Island where only one species occurs. Our analyses also included DNA sequences previously reported from taxa across the tribe Satyrini and Calisto[27]-[29],[32] (Additional file 2). Species identifications were based on morphology and the DNA barcode region was used for further corroboration [64]. Voucher photographs are available at the Nymphalidae Systematics Group (NSG) Voucher Database (nymphalidae.utu.fi) and in BOLD (boldsystems.org).

Dataset acquisition

Genomic DNA was isolated from two butterfly legs using the QIAGEN’s DNeasy kit. We used sequences of six standard molecular markers for nymphalid butterflies, one mitochondrial – COI (1487 bp) – and five nuclear genes – CAD (850 bp), EF-1α (1240 bp), GAPDH (691 bp), RpS5 (617) and wingless (400 bp). Primer pair sequences and laboratory protocols are described in [65]. DNA Sanger sequencing was carried out by the company Macrogen and each gene sequence was edited and manually aligned using the program BioEdit v7.0.5 [66]. Datasets were generated in different input formats using the web application VoSeq v1.7.0 [67].

Phylogenetic analyses

We used single-gene and combined datasets. We partitioned our single-gene datasets by codon position and our combined dataset by gene sequences in all analyses. In addition we used character groupings of similar relative evolutionary rates as an alternative strategy for Bayesian Inference (BI) [68], after determining that the gene trees were not in conflict with each other. We used the software TIGER [69] to subdivide our combined dataset into 12 “bins” each containing a number of characters with similar relative rates: bin1 = 2739, bin2 to bin5 = 0, bin6 = 12, bin7 = 7, bin8 = 21, bin9 = 87, bin10 = 525, bin11 = 1269 and bin12 = 637. We combined the “bins” that contained fewer than 500 sites (bin2 to bin9) with the invariable bin1, resulting in four character groupings which were used for our alternative partitioning approach (Table 1).

We performed 1000 Maximum Likelihood (ML) pseudo-replicates analyses using RaXML v7.3.1 [70] on the Bioportal server [71], selecting the thorough bootstrap algorithm and the mix option for the evolutionary model. The BI analyses were carried out using MrBayes v3.2.1 [72] on the Bioportal server. We performed 10 million generations with sampling every 1000 generation and four chains, one cold and three heated, for two independent runs. The parameters and models of evolution were unlinked across character partitions. We selected the mixed evolutionary model option in all BI analyses whereas in the alternative partitioning strategy, we selected the corresponding model for each “bins” as calculated in jModelTest 0.1.1 [41] based on Bayesian Information Criterion (BIC). The convergence of the two runs on each BI was ascertained by visual inspection of the log-likelihoods stationary distribution, discarding the first 25% sampled trees, as well as by checking that the final average standard deviation of split frequencies was below 0.05 and that the potential scale reduction factor (PSRF) for each parameter was close to 1.

Time of diversification estimates

Because there is no fossil record reported for the genus, we reconstructed a broader phylogeny including most of the representatives of the Satyrini subtribes that are closely related to Calisto[32] (Additional file 2) and constrained it with secondary calibration points from a fossil-calibrated Nymphalidae phylogeny [73]. We selected only one terminal per Calisto species to maximize the gene coverage in the resulting dataset. We also made an analysis excluding the genus Euptychia because long branch attraction affecting the position of Calisto has been reported [32]. The selected calibration points were chosen from well-supported monophyletic groups: the root of the tree to 49.1 ± 5 Ma, the crown age of the tribe Satyrina to 24.7 ± 4 Ma and the crown age of Euptychiina excluding Euptychia, Paramacera and Cyllopsis to 35.1 ± 4 Ma.

The dating analyses were run in BEAST v1.7.4 [74] and executed on the Bioportal server. We partitioned our dataset by gene sequence and set the corresponding substitution model as calculated in jModelTest (Table 1) and the uncorrelated log-normal relaxed clock model for each partition. We applied either the Birth-Death or the Calibrated Yule speciation processes as the tree prior in separate analyses to investigate the impact of this parameter on the final age estimates. In addition, the calibration points were modelled as either normal distributions (soft bounds) or uniform ranges (hard bounds). Finally, we set the mean rate of the molecular clock (ucld prior) with a uniform distribution between 0.0 and 10.0 and left other priors as default.

Each analysis was run four independent times for 50 million generations each and sampling trees and parameters every 5000th generation. We discarded the first 2500 sampled trees from each run as burnin. We verified in Tracer v1.5 the convergence and good mixing of MCMC as well as the Effective Sample Size of each estimated parameter to be higher than 200. Output .log and .tre files were combined in LogCombiner v1.7.4 after resampling a third of the post-burnin trees from each run. Trees were summarized in TreeAnnotator v1.7.4 into a single maximum clade credibility tree with node information calculated as mean heights.

Historical biogeography reconstruction

We used our dated chronogram for Calisto as the input tree, excluding outgroups and C. pulchella because its distribution has been altered by sugar cane introduction, on which it is currently a pest [33]. The following subdivision of areas was set: “PR” – Puerto Rico; “nH” – the northern Hispaniola paleoisland, including Cordillera Central/Massif du Nord, Sierra de Neiba/Chaîne des Matheux, and eastern Hispaniola; “sH” – the southern Hispaniola paleoisland, including Sierra de Bahoruco/Chaîne de la Selle and Massif de la Hotte in Tiburón Peninsula; “Ja” – Jamaica; “eC” – the eastern Cuba, including Nipe – Sagua – Baraoca and Sierra Maestra mountain ranges; “wC” – the central and western Cuba, including Guamuhaya and Guaniguanico mountain ranges; “Ba” – the Bahamas. Distributional ranges of Calisto were taken from several sources [22],[27]-[29],[33],[46].

We used the Dispersal-Extinction-Cladogenesis (DEC) model as implemented in Lagrange C++ [75],[76]. DEC is a realistic and flexible model for biogeographical reconstructions that estimates the probabilities (likelihoods) of ancestral geographical distributions, and it allows the parameterization of dispersal through time according to the geological history of a region. We conducted analyses using a non-time-stratified approach (NS) and different dispersal rates across time slices (stratified, TS). The non-time-stratified analysis NS0 was conducted under default settings. The maximum distributional range was constrained to three areas and we excluded distributions with unlikely area-connectivity (e.g. Puerto Rico and Western Cuba) in NS1 analysis. TS1 used four time slices, subdividing the phylogeny at 5 Ma, 10 Ma and 20 Ma. Dispersal rate matrices were constructed according to the paleogeographical configuration on each time slice. Probabilities to disperse were set to 0.75 when two areas were adjacent, to 0.5 when two areas were weakly separated by a geographical barrier (e.g. the Cul-de-Sac/Enriquillo depression), to 0.1 when two areas were separated by water of a distance less than 200 km (e.g. northern Hispaniola and eastern Cuba), to 0.01 for long-distance dispersal, including one extra area and/or >200 km water-crossing (e.g. Puerto Rico to southern Hispaniola), and to 0.0001 for other kinds of long-distance dispersal.

We found a particular node in TS1 analysis to be unlikely (the Cuban-Bahamian subclade including C. sibylla and C. apollinis). This group had a crown age of 10 ± 2 Ma and an ancestral range eC-Ba after TS1. Paleogeographically, this is improbable because the Bahamas were submerged at least until the Pliocene (~5 Ma). We thus constrained such node to “eC” in TS2 because Lagrange, as it is currently implemented, does not allow the exclusion of unlikely area-connectivity through time slices.

Moreover, several sets of area distribution were independently constrained at the root of the Calisto tree to maximize the global likelihood of NS1 and TS2 and to compare the statistical support of likely ancestral ranges. We also used the R package BioGeoBEARS [77],[78] which implements the DEC model similar to Lagrange C++ but with the possibility of increasing the number of free parameters. We allowed the founder-event speciation parameter j to be estimated in NS1-j and TS1-j to evaluate the importance of long-distance dispersal across islands. Another advantage of BioGeoBEARS is that distinct area-connection through time is allowed, hence we created an area-connectivity matrix for each time slice in TS1 and TS1-j (Figure 3).

Diversification of Calisto

We used the packages laser[79], ape[80] and DDD[61],[81] in R [82] to investigate the mode of diversification of extant Calisto taxa. Lineage Through Time (LTT) plots with confidence intervals representing a pure-birth null hypothesis model were made using ape. We compared different models of cladogenesis allowing temporal shifts in diversification rates using the Akaike Information Criterion differentials (ΔAICRC = AICRC (best rate-constant model) - AICRV (best rate-variable model)) as implemented in laser. We also computed the ΔAICRC separately for the Calisto phylogeny, excluding the Cuban species.

We used the R package DDD to fit the best phylogenetic diversification model that would explain the evolutionary history of Calisto. The analyses included three main models: a constant-rate evolution (CR), a shift in net diversification rate at some point in time (SR) and a decoupling of rates between the Cuban clade and the remaining taxa (DR). CR models incorporated either constant birth-death process (CR0), a decrease in speciation rate following a density-dependent process without extinction (CR1) or a decrease in speciation rate following a diversity-dependent process, including the estimation of extinction rate (CR2). SR models were set up to: one shift in speciation rate (yule-2-rate model) (SR0), one shift in species carrying capacity K (SR1), one shift in K and extinction rate (SR2), or one shift in K and speciation rate (SR3). DR models described one single shift in K for the Cuban clade (DR1), one shift in K and extinction rate for the Cuban clade (DR2), one shift in K and speciation rate for the Cuban group (DR3), and one shift in K, speciation and extinction rates for Cuban taxa (DR4). Moreover, we conducted CR and SR analyses for the main Calisto tree, excluding the Cuban clade, and for the Cuban clade independently. Comparisons between different phylogenetic diversification models were done using Akaike weights.

Availability of supporting data

The data sets supporting the results of this article are available in the TreeBASE repository, in http://purl.org/phylo/treebase/phylows/study/TB2:S16186?format=html[83].

Additional files