Introduction

Diversifying coevolution between plants and insects is thought to be a major factor leading to species diversity in tropical rainforests (Ehrlich and Raven 1964), but before the development of modern molecular phylogenetic methods it was difficult to reconstruct plant–insect coevolutionary history owing to a dearth of fossil records of plant–insect interactions (Futuyma 2000). Now, by using molecular clocks and advanced molecular phylogenetic techniques, we can easily estimate when adaptive radiation and diversification of a taxonomic group occurred (Rambaut and Bromham 1998). For example, Becerra (2003) calibrated the molecular phylogenetic timelines of Neotropical Bursera hosts and the Blepharida leaf beetles that feed on them, and showed that the lineages of these beetles and their plant hosts diverged synchronously, suggesting that they reciprocally diversified (codiversified) during the past 112 million years (Myr).

Myrmecophytic Macaranga trees (Euphorbiaceae) have hollow stems, which Crematogaster (subgenus Decacrema) ants use as their nest sites while tending the Coccus scale insects that feed on the tree sap. The trees also provide food resources for the ants, both food-bodies secreted by stipules and young leaves, and honeydew excreted by the scale insects (Fiala et al. 1989; Heckroth et al. 1998). The ants, in turn, protect the trees that they inhabit from vines and herbivores (Itioka et al. 2000; Itino and Itioka 2001). Phylogenetic comparison of Macaranga, Decacrema, and Coccus has shown that (1) the minimum age of the plant–ant mutualism is 20–16 Myr, and the two taxa have codiversified since then (Davies et al. 2001; Itino et al. 2001; Quek et al. 2004, 2007), and (2) the minimum age of the scale insects is 9–7 Myr; thus, the scales are relative latecomers in the evolutionary history of the tripartite symbiosis (Ueda et al. 2008, 2010).

In addition, Arhopala lycaenid butterflies (amphimuta subgroup) parasitize the protective plant–ant mutualism. Arhopala caterpillars eat Macaranga leaves, evading the ants’ attacks by providing nectar to aggressive ants (Maschwitz et al. 1984). Okubo et al. (2009) surveyed the MacarangaArhopala interaction, and showed that each lycaenid feeds on one or two closely related Macaranga species in a species-specific manner (Table 1). Of approximately 200 species of Arhopala, only 5 species feed on Macaranga, which are categorized to amphimuta subgroup (Eliot 1963, 1972). Most of basal Arhopala species prefer to eat Fagaceae whereas the most recently diverged amphimuta subgroup has strict feeding habit to Macaranga (Maschwitz et al. 1984; Megens et al. 2005). The morphological and behavioral specialization of the amphimuta subgroup to the MacarangaDecacrema protective mutualism suggests intimate coevolution between the lycaenids and the plants or ants, or both (Maschwitz et al. 1984; Megens et al. 2004a, b; Megens et al. 2005).

Table 1 Host plant associations of Arhopala species feeding on Macaranga

Two hypotheses, the codiversification model (Johnson and Stinchcombe 2007) and the latecomer model (Ueda et al. 2008), can potentially explain the community formation process between Arhopala and the Macaranga system. According to the codiversification model, the host–parasite interaction of the lycaenids with the Macaranga system began at the same time as the MacarangaDecacrema symbiosis (20–16 Myr ago) and the lycaenids’ diversification occurred synchronously with the MacarangaDecacrema codiversification as a result of continuous community interaction over macroevolutionary time. In contrast, in the latecomer model, the lycaenids associated with the pre-existing Macaranga system, and the host (plant or ant) adaptation then led to the lycaenids’ diversification. Megens et al. (2004a) inferred that the earliest radiation of genus Arhopala occurred between 7 and 11 Myr ago, but mentioned that these estimates are likely to be an underestimation because of the extreme compositional bias at third-codon. To test these hypotheses, in this study we reconstructed the molecular phylogeny of Arhopala lycaenid butterflies feeding on Macaranga trees by using mitochondrial and nuclear genes to estimate the timeline of the adaptive radiation and diversification of the lycaenids. We then inferred the minimum age of the ArhopalaMacaranga interaction and the timeline of the subsequent diversification of the lycaenids.

Materials and methods

Sampling

More than 1,000 trees representing 16 Macaranga species were comprehensively surveyed for the presence of Arhopala species in three locations (Lambir Hills National Park and Kuching in Sarawak, Borneo, and Gunung Tebu in Terengganu, Peninsula Malaysia) from May 1999 to August 2007. Thirty-five lycaenids belonging to five Arhopala species (A. amphimuta, A. dajagaka, A. major, A. moolaiana, and A. zylda), all in the amphimuta subgroup, that feed on Macaranga trees (Megens et al. 2004b, 2005), were collected from seven Macaranga species. Most of the lycaenid specimens analyzed here were previously studied by Okubo et al. (2009), who studied ecological interaction between Arhopala and Macaranga. In addition, five sequences of the three Arhopala species studied by Megens et al. (2004b), who described genus-level phylogeny of Arhopala, were used in the phylogenetic analyses (one A. amphimuta sequence and two sequences each of A. major and A. moolaiana). For outgroups, we sequenced (1) A. pseudocentaurus, a member of the centaurus group, a sister clade of the amphimuta subgroup (Megens et al. 2004b), and (2) A. kinabala, a member of the agesias subgroup, a distant clade from the amphimuta subgroup (Megens et al. 2004b). Host plant species, collection locations, and GenBank accession numbers of the samples are listed in Table S1 in the Supplementary Material.

DNA extraction, PCR, and sequencing

DNA was extracted from a single, ethanol-preserved leg of each lycaenid with a DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocols. One mitochondrial gene, cytochrome oxidase I (COI), and two nuclear genes, wingless (WG) and elongation factor 1α (EF-) were amplified by polymerase chain reaction (PCR) using Takara Ex Taq polymerase (Takara Bio, Shiga, Japan). The PCR primers are given in Table S2. The PCR temperature profile used for COI was 30 cycles of 95°C for 30 s, 50°C for 30 s, and 72°C for 90 s, and that used for WG and EF- was 30 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 30 s. After amplification, the PCR products were purified with ExoSap-IT (USB, Cleveland, OH, USA). Cycle sequencing reactions for both strands were performed with a BigDye Terminator v1.1 Cycle Sequencing Kit (ABI, Weiterstadt, Germany) on an ABI 3130 Genetic Analyzer.

Sequence alignments and character statistics

Mitochondrial COI and nuclear EF- and WG sequences were edited and aligned using SeqScape v. 2.5 (ABI, Weiterstadt, Germany). Of the 45 specimens used in this study, 38 represent unique haplo-genotypes. Base frequency homogeneity was tested separately for each dataset with Chi-square test using PAUP* 4.0b10 (Swofford 2002). Parsimony-uninformative sites were excluded from the test. The Chi-square test did not reject the hypothesis of homogeneity of nucleotide frequencies in every pair of taxa (p > 0.69; Table S3). To test conflicts in phylogenetic signal among each dataset, we conducted an incongruence length differences (ILD) test (Farris et al. 1994) in PAUP* 4.0b10 with heuristic searches with tree bisection and reconnection (TBR) and 100 random addition replicates for each. The ILD test revealed no conflict between each dataset (COI vs. EF- = 0.75; COI vs. WG = 0.47; EF- vs. WG = 0.65). The degree of substitution saturation in the third codon position was assessed by plotting the transitions (ti) and transversions (tv) ratio against genetic distance for each dataset using DAMBE with Xia’s method (Fig. S1; Xia and Xie 2001). The substitution model of the third codon position for each dataset followed as the result of the model selections described in the next paragraph. The substitution saturation at the third codon position was not detected for every dataset (p < 0.01; Fig. S1). Following the results of the character statistics, we used all datasets and all positions for each dataset for phylogenetic analyses.

Phylogenetic analyses

Best-fitted substitution models were selected for each codon position of each gene, based on Bayesian information criterion 5 (BIC5) using KAKUSAN 3 software (Tanabe 2007). Maximum likelihood (ML) analysis was performed with TREEFINDER version October 2008 (Jobb et al. 2004) and the models selected by BIC5 (Table S4). Clade support was assessed with 1000 bootstrap replications in TREEFINDER. In addition, Bayesian posterior probabilities and maximum parsimony (MP) bootstrap support were obtained with MrBayes version 3.1.2 (Huelsenbeck and Ronquist 2001) and PAUP* 4.0b10 (Swofford 2002), respectively. The models selected by BIC5 were also used in the Bayesian analysis, using the default run settings in which two independent analyses are performed with four chains each (one cold and three heated). The Bayesian analysis was run for 5 million generations, sampling every 1,000 generations. We assessed the log-likelihood for each sampling point against generation time to identify when the Markov chains reached a stationary distribution, and accordingly discarded the initial 1,000 trees as burn-in. The parsimony bootstrap support was assessed with 1,000 bootstrap replicates, using heuristic searches with TBR and ten random addition replicates for each.

Age estimation

Because lycaenid butterflies lack a robust fossil record (Kandul et al. 2004), we estimated divergence times using relative divergence rate in the COI gene and set a calibration point on the ML phylogeny of the lycaenids. The COI gene has been reported to have the lowest rate heterogeneity in arthropods (Gaunt and Miles 2002), and the COI substitution rate converges to about 1.5% per Myr within the insects (Quek et al. 2004). This rate has been widely used for age estimation of several taxa of Lycaenidae (e.g., Als et al. 2004; Lohman et al. 2008), and also of Crematogaster ants and Coccus scale insects inhabiting Macaranga (Quek et al. 2004, 2007; Ueda et al. 2008, 2010). Thus, we used solely the COI divergence rate (uncorrected pairwise distance) to estimate the lycaenids’ divergence times. First, the heterogeneity of the substitution rate of COI in the ML topology was tested by using the likelihood ratio test (LRT) with the models selected by BIC5, implemented in PAUP. The LRT results showed significant deviations from rate constancy (p < 0.05). Second, the range of divergence times of node A (Fig. 1) was estimated by non-parametric rate smoothing (NPRS, Sanderson 1997), implemented in TreeEdit v. 1.0 (Rambaut and Charleston 2002). In NPRS, mean uncorrected pairwise distances between sister taxa were calculated using MEGA v 2.1 (Kumar et al. 2001), and three well-supported nodes of varying genetic divergence (nodes A, B, and C; 2.1–5.8%) were chosen as calibration points for estimating the age of node A. The range of divergences was used for dating divergences in various arthropod groups (Table 4 in Quek et al. 2004). The age range at node A was estimated to be 3.87–1.95 Myr, which was chosen as the calibration time unit range in the Bayesian analysis. Finally, the lycaenids’ divergence times were estimated by using a Bayesian approach, as implemented in the mcmctree program in the PAML v.4.2 package (Yang 2007). A Bayesian global-clock analysis was run using the HKY + Γ substitution model, sampling 1 million generations every two generations, after discarding the initial 100,000 steps as burn-in.

Fig. 1
figure 1

ML phylogeny of Arhopala lycaenids feeding on Macaranga estimated from 1,010 bp of mitochondrial DNA sequences (COI) and 988 bp of nuclear DNA sequences (WG 393 bp and EF- 595 bp). The numbers above the branches indicate the ML bootstrap support (left of slash) and the Bayesian posterior probabilities (right of slash); the MP bootstrap support is shown below each branch. When a node was not recovered in the MP bootstrap or Bayesian posterior probability analyses, an asterisk is substituted for the node support values. Branch lengths are proportional to time inferred by Bayesian inference using the COI molecular clock and gray bars indicate 95% confidence intervals (Table 2)

Results and discussion

Molecular phylogenies of the Arhopala lycaenids feeding on Macaranga were inferred from a total of 1,998 bp from the three genes (mitochondrial COI and nuclear WG and EF-) by ML, MP, and Bayesian analyses. Monophyly of the amphimuta subgroup and of each species was recovered in the ML, MP, and Bayesian topologies (Fig. 1), but the relationships among the species were not well supported; node B was poorly supported by MP bootstrapping, and nodes D and E were poorly supported by both MP bootstrapping and Bayesian posterior probability analyses (Fig. 1).

The divergence time of the lycaenids was estimated from the COI molecular clock rate within the insects (1.5% per Myr; Quek et al. 2004). The minimum estimated age of the amphimuta subgroup that feed on Macaranga was 2.05 Myr, indicating a Pleistocene divergence (Table 2; Fig. 1, node C), whereas the minimum age of the MacarangaCrematogaster association is 20–16 Myr (Quek et al. 2007) and that of MacarangaCoccus is 9–7 Myr (Ueda et al. 2008, 2010). In addition, the divergence times of the two outgroups, A. pseudocentaurus and A. kinabala, were estimated to be 4.77 and 2.79 Myr ago, respectively (Table 2; Fig. 1, nodes A and B), which are also younger than the tripartite symbiosis (9–7 Myr). Thus, the lycaenids probably became involved in the Macaranga system after the origin of the tripartite symbiosis.

Table 2 Mean age and 95% confidence interval (CI) of nodes in the Arhopala phylogeny obtained by Bayesian inference, based on the COI molecular clock (1.5% divergence per million years)

The time-calibrated phylogeny also showed that the species divergence of the Arhopala on Macaranga (2.05–1.68 Myr ago; Table 2; Fig. 1, nodes C–F) occurred during the Pleistocene, suggesting that a rapid radiation of the amphimuta subgroup occurred during the Pleistocene. Moreover, all Macaranga species and the mtDNA lineages of the Decacrema ants and Coccus scales were already in existence in the late Pliocene (Quek et al. 2007; Ueda et al. 2010), which also indicates that the species divergence of the lycaenids occurred after the origin of the tripartite symbiosis. Therefore, the lycaenids might have diversified to adapt to the pre-existing plant, ant, and scale insect symbiosis. Given the younger origin and later diversification of the lycaenids compared with the origin of the plant, ant, and scale symbiosis, the association between the Arhopala lycaenids and the Macaranga tripartite system is not consistent with the codiversification model of the coevolutionary history of the MacarangaDecacrema mutualism but it is consistent with the latecomer model.

To explain the diversification of genus Arhopala, two hypotheses have been proposed. Corbet (1946) suggested that vicariance in the Pleistocene accelerated speciation of the genus Arhopala. The Pleistocene glacial cycles caused sea levels to fluctuate, with the result that land bridges between islands and the mainland formed and resubmerged repeatedly (Medway 1972; Morley 2000). In contrast, Megens et al. (2004b), who inferred the minimum age of the genus Arhopala to be 11–7 Myr, hypothesized that large-scale climatic changes in the mid-Miocene accelerated Arhopala speciation. The South East Asian forest experienced a gradually cooling climate and intensified monsoons around 8 Myr ago (Zachos et al. 2001). Our age estimate for when the lycaenids began feeding on Macaranga suggests that the divergence (Table 2; Fig. 1, 2.05–1.68 Myr ago) occurred during the Pleistocene. Thus, Pleistocene vicariance may have played an important role in the diversification of the amphimuta subgroup, in accordance with Corbet’s (1946) hypothesis. However, we did not investigate the geographical context of the lycaenid diversification because the lycaenid sampling locations were insufficiently widespread for phylogeographic inferences. Further sampling throughout the distribution area of myrmecophytic Macaranga (Sumatra, the Malay Peninsula, and Borneo; Fiala et al. 1999; Davies et al. 2001) would be necessary to clarify the postglacial biogeography of the lycaenids.

Cool, dry periods during the Pleistocene might have also helped the ancestor of the amphimuta subgroup to become established as a parasite on Macaranga by weakening the ants’ ability to defend the trees. In rare instances, A. kinabala is known to feed on several species of Macaranga when the ants’ defense ability is weakened by natural physical disturbances (Itioka, unpublished data). In addition, the feeding habit of Arhopala may have shifted from generalist to specialist. The centaurus group, a sister clade of amphimuta subgroup, is extremely polyphagous and feeds on several families of plants including Euphorbiaceae (Megens et al. 2005). If so, Decacrema ants, as well as Macaranga trees, may have played an important role in the origin and diversification of the Arhopala lycaenids. However, we have not investigated the specificity of the relationship between Arhopala and Decacrema. Phylogenetic comparison of Macaranga, Decacrema, and Arhopala might reveal whether the plants or the ants induced the lycaenid diversification and specialization.

The timeline inferred in this study, which indicates only the gross pattern of the adaptive radiation and diversification, is based on all known species of Arhopala lycaenids that feed on Macaranga. Further sampling throughout the Asian tropics is likely to reveal more lycaenid genotypes and result in some change in the estimated ages. However, our estimated minimum age of the lycaenids seems reasonable because it is consistent with the timing of the drastic climate changes caused by cyclical glaciation (Medway 1972) and the ages of migration events across the South China Sea in Decacrema ants and Coccus scale insects (Quek et al. 2007; Ueda et al. 2010). In addition, we should consider a possible deviation from the COI divergence rate of around 1.5% per Myr because rate constancy has been rejected. Therefore, we also estimated the age range using known COI rates (1.3–2.3%; Brower and DeSalle 1998; Quek et al. 2004). Using the range of COI rate, the minimum age of the amphimuta subgroup ranged from 1.34 to 2.36 Myr ago, which are not older than the Pleistocene and much younger than the tripartite symbiosis. Thus rate deviation may not introduce any substantial error on our conclusion. It is also plausible that there may have been extinctions of older taxa feeding on Macaranga trees, but the hypothesis is, unfortunately, untestable.

Coevolution between insects and their host plants has been proposed to be an important factor promoting global biodiversity, but few studies have demonstrated the historical process within a community context, including multipartite plant–insect interactions. Here, we showed that the Arhopala lycaenids were latecomers to the evolutionary history of the Macaranga-based community. The Macaranga community also includes other insect taxa that have evolved specific interactions, such as leaf-galling gall midges (Cecidomyiidae) and ant-predatory mirid bugs (Itino and Itioka 2001). Further phylogenetic analyses of these multipartite associations may provide insight into evolutionary dynamics of plant–insect communities in the South East Asian tropics.