Background

Ecological speciation is a process in which polymorphism within populations (e.g. in resource use or habitat preference) ultimately induces the appearance of two sister species, each adapted to a different niche [14]. According to Rundle and Nosil [2], three principal components must be involved: i) a source of divergent selection, ii) a form of reproductive isolation, and iii) a genetic mechanism linking divergent selection to reproductive isolation. Among plant-feeding insects, several empirical studies support this scenario [1, 5, 6], which can also occur for insects with a parasitic lifestyle, in particular within the upper trophic levels. For such organisms, ecological differentiation between sister species can also be driven by the ecological differentiation of their hosts via a process called sequential or cascading speciation [79]. If pervasive enough, such processes should lead to the clustering of phylogenetically related specialists.

Additionally, transitions between generalists to specialists (and vice-versa) are also occurring and, so far, empirical data provide a mixed picture about the relative frequencies of evolution toward specialisation and generalization [1013]. However, transitions from generalist ancestors to specialized species are probably recurrent as (i) generalist species are unlikely to produce “jack-of-all trades-master of none” genotypes because of genetic or physiological trade-off [1416]; (ii) the subsequent acquisition of specialized genotypes may be a primary step towards speciation [1719]; and (iii) specialist species may be more prone to extinction [13, 20]. At a phylogenetic level, both kinds of transitions should lead to the mixing of both specialists and generalists within the same cluster.

Questions of (i) the host range (specialist versus generalist) of ancestral species of current specialists and (ii) the distribution of host ranges within a phylogeny were recently addressed by Hardy and Otto [21]. They illustrated them using two notions, respectively “the musical chairs hypothesis” (specialists originate from specialists through host switch) and the “oscillation hypothesis” (specialists originate from generalists, with some specialists widening their host range before the next speciation event). The extent to which one of these scenarios is more frequent has nevertheless still to be evaluated rigorously for the organisms with a parasitic lifestyle.

Parasitoids are organisms (mainly Hymenoptera and Diptera) whose pre-imaginal life depends on the successful exploitation of a single host [22, 23]. Behind this simple definition, a great diversity of life history strategies and physiological adaptations are observed. In particular, the ovipositor allows egg-laying by the female and is thus a key organ especially for species that are exploiting concealed or protected hosts [24, 25]. The features (in particular the length) of this organ and its ability to evolve could contribute to drive specialization and/or speciation. Focusing on the “Eupelmus urozonus species group” (Hymenoptera: Eupelmidae), we examine here whether the host range is subject to phylogenetic constraints and/or whether the ovipositor length is a significant driver of host use.

Within the subfamily Eupelminae (33 genera), the genus Eupelmus Dalman is the most diverse, with 91 available valid species names in the Palaearctic region [26]. Species of Eupelmus are primary or facultative secondary ectoparasitoids whose larvae develop as idiobionts on the immature stages (larvae, pupae and more rarely eggs) of many insects (beetles, flies, moths, wasps or cicadas) that are concealed or protected in plant tissues (stems, galls, fruits or seeds) [27]. Most Eupelmus are considered as generalist parasitoids [27, 28]. However, because of both the extreme sexual dimorphism characterizing the subfamily and the existence of species groups possibly hiding cryptic species, the systematics and the evolutionary ecology of these species remain poorly understood. This situation is well illustrated with the “E. urozonus species complex/group” which was repeatedly investigated [27, 2931] until its recent revision within the Palaearctic region by Al khatib et al. [32, 33], which identified 11 new species in this region. Semantically, the term “complex” used in Al khatib et al. [32, 33] is substituted here by the term “species group” (Al khatib et al. in preparation). As a consequence of this unsuspected biodiversity, most of the published host records for these species are unreliable because all of the common species with a comparatively short ovipositor (E. gemellus Al khatib, 2015, E. confusus Al khatib, 2015, and especially E. kiefferi De Stefani, 1898) were misidentified as E. urozonus Dalman, 1820, while the two common species with a comparatively long ovipositor (E. azureus Ratzeburg, 1844 and E. annulatus Nees, 1834) were both frequently mistreated under E. annulatus [29, 34].

In the present study, we first provide a reliable molecular phylogeny of the “E. urozonus species group” using a multi-locus approach. Then, for most of the species, we compile host records and data on ovipositor length. We finally carry out a comparative analysis to evaluate the role of phylogenetic constraints in the evolution of ovipositor length and host range as well as the role of the ovipositor’s length in determining the host range.

Methods

Sampling

A total of 31 species, with 91 individuals, sampled in the Palaearctic region were included in this study.

  • Eighteen of the 21 species within the “urozonus species group” that were recently revised using both morphological and molecular characters [32, 33]: E. acinellus Askew, 2009, E. annulatus, E. azureus, E. cerris Förster, 1860, E. confusus, E. fulvipes Förster, 1860, E. gemellus, E. janstai Delvare and Gibson, 2015, E. kiefferi, E. longicalvus Al khatib & Fusu, 2015, E. minozonus Delvare, 2015, E. opacus Delvare, 2015, E. pistaciae Al khatib, 2015, E. priotoni Delvare, 2015, E. purpuricollis Fusu & Al khatib, 2015, E. simizonus Al khatib, 2015, E. tibicinis Bouček, 1963 and E. urozonus.

  • Thirteen species were used as outgroup including (i) species belonging to the three subgenera of Eupelmus sensu Gibson (1995): Eupelmus [E. atropurpureus Dalman, 1820, E. matranus Erdős, 1947, E. microzonus Förster, 1860, E. pini Taylor, 1927 and E. vindex Erdős, 1955]; Macroneura Walker [E. falcatus (Nikol’skaya, 1952) and E. seculatus Kalina, 1981], and Episolindelia Girault [E. linearis Förster, 1860, E. testaceiventris (Motschulsky, 1863) and E. juniperinus thuriferae Askew, 2000]; and (ii) species belonging to other genera within Eupelminae, Reikosiella (Hirticauda) [R. aff. rostrata (Ruschka, 1921)] and Anastatus Motschulsky [Anastatus sidereus (Erdős, 1957) and Anastatus aff. temporalis Askew, 2005]. The species were identified by the authors using the available identification keys [29, 31, 3537].

Specimens were killed with ethyl acetate and preserved in 95 % ethanol at −20 °C until DNA extraction. After the DNA extraction, the voucher specimens were prepared as explained in Al khatib et al. (2014) for the morphological examination. The vouchers are deposited in the following institutions and private collections: AICF, Lucian Fusu collection, Al. I. Cuza University, Iasi, Romania; BMNH, Natural History Museum, London, UK; CBGP, Centre for Biology and Management of Populations, Montpellier, France; CNC, Canadian National Collection of Insects, Arachnids and Nematodes, Agriculture & Agri-food Canada, Ottawa, ON, Canada; FALPC, Fadel Al khatib personal collection, Faculty of Agricultural Engineering, University of Aleppo, Syria; GDPC, Gérard Delvare personal collection, Montpellier, France; MNHG, Museum of Natural History of Geneva, Switzerland; MNHN, National Museum of Natural History, Paris, France; NHRS, Naturhistoriska riksmuseet, Stockholm, Sweden. The depository’s acronyms of voucher specimens are included in (Additional file 1: Table S2; Additional file Dryad: doi:10.5061/dryad.115m1). Sampling information (host-plants, collection dates, and localities) is listed in Table 1.

Table 1 Sample information for the specimens included in the phylogenetic analysis

Marker choice

Seven markers displaying various rates of molecular evolution were used: two coding portions of mitochondrial genes (Cytochrome oxidase I, COI and Cytochrome b, Cytb), two coding regions of nuclear genes (the F2 copy of elongation factor 1-alpha, EF- and Wingless, Wg) and three (at least partially) non-coding regions of other nuclear genes (the mitotic checkpoint control protein, Bub3; the ribosomal protein L27a, RpL27a, and the ribosomal protein S4, RpS4). All these markers were previously used for phylogenetic analyses in arthropods. COI and Cytb have been used to resolve insect molecular phylogenies at shallower taxonomic levels [3841]. The Wg gene has provided a useful tool for the reconstruction of phylogenetic relationships at lower to intermediate taxonomic levels in different insect groups [32, 38, 4145]. EF- has proven to evolve at slow rates and provide phylogenetic information at deeper levels (i.e. family relationships) [39, 4651]. The Bub3 gene is more rarely used [52, 53] for inferring phylogenetic relationships at a similar taxonomic level as Wg. Finally, ribosomal proteins RpL27a and RpS4 have been used with success to infer the phylogeny of Hymenoptera associated with oak galls or figs [39, 5456].

DNA extraction, PCR amplification and sequencing

Genomic DNA was extracted from a single individual using the Qiagen DNeasy kit (Hilden, Germany) with some minor modifications with regard to the manufacturer’s protocol. Entire specimens were incubated at 56 °C for 15–17 h and DNA extraction was performed without destruction of the specimens, to allow subsequent examination of morphology (see § Sampling). Primer sequences are given in Additional file 1: Table S1.

For the two mitochondrial genes (COI and Cytb), the PCR mix was prepared in 20 μl as follows: 1 μl of DNA (1–55 ng/μl), 14.64 μl of Milli-Q water, 2 μl of 10x PCR buffer containing MgCl2 (1x), 1 μl of 10 μM primer cocktail (0.5 μM), 0.16 μl of dNTPs 25 mM each (0.2 mM) and 0.2 μl of 5 U/μl Taq DNA Polymerase (Qiagen, Hilden, Germany).

For the nuclear genes (Bub3, EF1-α, RpL27a, RpS4 and Wg), the PCR mix was realised in 25 μl as follows: 2 μl of DNA (1–55 ng/μl), 19.825 μl of Milli-Q water, 2.5 μl of 10x PCR buffer containing MgCl2 (1x), 0.175 μl of 100 μM primer cocktail (0.7 μM), 0.2 μl of dNTPs 25 mM each (0.2 mM) and 0.125 μl of 5 U/μl Taq DNA Polymerase (Qiagen, Hilden, Germany).

PCR conditions for Wg and COI were as described in [32]. Those for other genes were as follows: Cytb: 94 °C for 5 min, followed by 40 cycles of (i) 94 °C for 1 min, (ii) 50 °C for 1 min, and (iii) 72 °C for 90 s with a final extension at 72 °C for 10 min; nuclear markers: 94 °C for 4 min, followed by 40 cycles of (i) 94 °C for 30 s, (ii) 58 °C for EF-, 48 °C for Bub3, 57 °C for RpS4 and 55 °C for RpL27a, (iii) 72 °C for 5 min with final extension at 72 °C for 5 min.

In the absence of amplification or if the signal was too weak, we improved yields of PCRs by using 2x QIAGEN Multiplex PCR Master Mix (Qiagen, Hilden, Germany). In this case, PCRs were performed in a 25 μl reaction volume: 2 μl of DNA, 16.5 μl of Milli-Q water, 0.125 μl of 100 μM primer cocktail (0.5 μM) and 6.25 μl of 2x QIAGEN Multiplex PCR Master Mix (1x) and PCR conditions were as specified in the QIAGEN® Multiplex PCR kit: 95 °C for 15 min, followed by 40 cycles of (i) 95 °C for 30 s, (ii) 48 °C-58 °C for 90 s, (iii) 72 °C for 1 min, with final extension at 72 °C for 10 min.

All PCRs were performed on a GeneAmp 9700 thermocycler. PCR products were visualized using the QIAxcel Advanced System and QIAxcel DNA Fast Analysis Kit (Qiagen). PCR products were sent to GENOSCREEN (Lille, France) or to BECKMAN COULTER GENOMICS (Stansted, United Kingdom) for sequencing in both directions. All sequences were deposited in GenBank (Additional file 1: Table S2).

Sequence alignment and phylogenetic analysis

Alignment

Sequences were aligned using Muscle [57] with the default settings as implemented in SeaView v4.4.1 [58] and subsequently visually checked. To assess the impact of indels on the phylogenetic resolution, highly divergent blocks present in Bub3, RpS4 and RpL27a alignments were either included in or excluded from the analyses. These blocks were removed using Gblocks [59] with the default settings as implemented in SeaView. Alignments of COI, Cytb, EF- and Wg were translated to amino acids using Mega v5.1 [60] to detect potential frame-shift mutations and premature stop codons, which may indicate the presence of pseudogenes.

Gene by gene analysis

To detect (i) possible inconsistencies linked to contamination during laboratory procedures, (ii) poor-quality sequences, (iii) possible pseudogenes or other artefacts, and (iv) to evaluate the impact of the Gblock procedure on the individual phylogenetic resolution, genes were first analysed separately using a maximum likelihood approach (ML).

Concatenated datasets analysis

Phylogenetic analyses were performed on concatenated nucleotide sequences using both ML and Bayesian methods. Four partitioning schemes were compared: (i) two partitions: one for the two mitochondrial genes (COI and Cytb) and another for all nuclear markers (Wg, EF-, Bub3, RpS4 & RpL27a); (ii) six partitions: one for the two mitochondrial markers (COI and Cytb) and one for each nuclear marker (Wg, EF-, Bub3, RpS4 and RpL27a); (iii) seven partitions: one for the 1st and 2nd codon positions of the mtDNA, one for the 3rd codon positions of mtDNA, and one for each nuclear gene (Wg, EF-, Bub, RpS4 and RpL27a); (iv) nine partitions: same as above with Wg and EF- further partitioned by codon position (1st and 2nd codon positions versus 3rd positions).

Bayes factors (BF) [61, 62] were used to compare the four partitioning schemes. Harmonic means of the likelihood scores were used as estimators of the marginal likelihoods. Following [61] and [63], Bayes factors were calculated using the following formula: BF = 2 × (lnM1-lnM0) + (P1-P0) × ln (0.01) where lnMi and Pi are the harmonic-mean of the ln likelihoods and the number of free parameters of the model i, respectively. BF values were interpreted following [61] and [62], with BF values between 2 and 6, between 6 and 10 and higher than 10 indicating positive evidence, strong evidence, and very strong evidence favouring one model over the others respectively.

Evolution models and phylogenetic reconstruction

For the separated and concatenated datasets, the best-fitting model was identified using the Akaike information criterion (AIC) as implemented in jModelTest v0.1.1 [64].

For both gene-by-gene and concatenated analyses, maximum likelihood analyses and associated bootstrapping were performed using RAxML v8.0.9 [65]. The GTRCAT approximation of models was used for ML bootstrapping (1000 replicates). Bootstrap percentages (BP) ≥85 % were considered as strong support and BP < 65 % as weak.

Bayesian analyses were performed only on the concatenated dataset using a parallel version of MrBayes v3.2.2 [66]. Model parameters for each data partition were independently estimated by unlinking parameters across partitions. Parameter values for the model were initiated with default uniform priors, and branch lengths were approximated using default exponential priors. Bayesian inferences were estimated using two simultaneous, independent runs of Markov Chain Monte Carlo (MCMC), including three heated and one cold chains. The Metropolis-coupled MCMC algorithm [67] was used to improve the mixing of Markov chains. Analyses were run for 20 × 106 generations with parameter values sampled every 2000 generations. To ensure convergence, 40 × 106 generations were used for the most complex partitioning scheme (9 partitions) with parameter values sampled every 4000 generations. To increase and improve the swap frequencies of states between cold and heated chains, the heating temperature (T) was set to 0.01 for the most complex partitioning scheme cleaned with Gblocks and to 0.02 for all other datasets. Convergence was assessed using the standard deviation of split frequencies given by MrBayes and the Effective Sample Size (ESS), as estimated using Tracer v1.6.0 [68]. The first 25 % of the tree samples from the cold chain were discarded and considered as burn-in. Posterior probabilities (PP) ≥ 0.95 were considered as strong support and PP < 0.90 as weak.

Analyses were conducted using the CIPRES Science Gateway (www.phylo.org) [69].

Evolutionary properties of marker sequences

For each partition of the concatenated datasets (without Gblocks cleaning), base composition, substitution rates, and among sites rate variation (α) were estimated and compared. We also compared rate variation among partitions, considering the parameter m (rate multiplier).

Comparative analysis

Evolution of ovipositor length

The ovipositor of Hymenoptera is a complex organ that exhibits great interspecific variation (see for instance [23]). In species of Eupelmus, part of the ovipositor is easily visible at the extremity of the abdomen (the ovipositor sheaths), while the rest is concealed in the abdomen. The use of this visible part as a “proxy” of the total ovipositor length is a priori tempting in order to avoid damaging of specimens of newly described species known from very few individuals [32, 33]. In order to validate the use of this proxy, a total of 34 individuals of comparatively common species (e.g. E. azureus, E. confusus, E. gemellus, E. kiefferi, E. pistaciae, and E. urozonus) were dissected and, for each individual, we measured the length of the ovipositor stylet, the visible part of the ovipositor sheath and the metatibia (see dataset on Dryad: doi:10.5061/dryad.115m1). Measurements of the length of the ovipositor sheaths and hind tibia followed Al khatib et al. [32] (Additional file 2: Figure S18 A and C). The length of the ovipositor stylet (first and second valvulae) was measured from the articulation of the second valvula with the articulating bulb to the apex of the second valvula (Additional file 2: Figure S18 B). Using this dataset, we found evidence of linear relationships between the ovipositor sheath (response variable) and either the ovipositor stylet or the metatibia as predictors (data not shown). Moreover, no interaction was found between these two predictors and the host species (respectively F5df,20df = 1.23 with p = 0.34 and F5df,22df = 1.20 with p = 0.34). This suggests that the visible part of the ovipositor sheath can indeed be used as a reliable proxy of the entire ovipositor.

As a consequence, a first analysis was performed on the 19 species of the “E. urozonus species group” for which information about the ovipositor sheaths and the metatibia were available. This analysis includes a total of 121 individuals, with at least 2 individuals/species except for E. priotoni and E. simizonus (only one individual in each case). In most of the cases, we tried to select individuals from at least two geographical locations and/or, for generalist species, two host species (see dataset on Dryad: doi:10.5061/dryad.115m1). Both the absolute length of the ovipositor sheath (“AOS”) and the ratio (“ROS”) between the ovipositor sheaths and the metatibia were taken into account, the second one being potentially less sensitive to environmental-induced phenotypic plasticity (host and/or abiotic conditions). AOS/ROS medians were then calculated for each Eupelmus species and these medians were used for the subsequent analysis (see below).

Two tests were then performed: (i) a Mantel test of the correlation between pairwise genetic distances (“phylogenetic matrix”) and pairwise differences in AOS/ROS (“morphological matrix”). (Dis) similarities were estimated as |di-dj|/[(di + dj)/2] (di and dj being the AOS/ROS medians obtained for species i and j respectively); (ii) the detection of a phylogenetic signal based on categories of AOS/ROS. For this purpose, “long ovipositors” (AOS/ROS exceeding the third quartile) were distinguished from “short ovipositors” (AOS/ROS below this threshold). Briefly, the sum of state changes was calculated, leading to a D statistic that could be tested against two theoretical distributions: a phylogenetic randomness and a Brownian distribution, this latter being underlain by a continuous trait evolving along the phylogeny at a constant rate [70].

Influence of phylogeny and ovipositor length on host range

A second analysis was restricted to a subset of 13 species for which host range was also available. Most of the information about host range was obtained from Al khatib et al. [32] and from Gibson and Fusu (in prep). Jean Lecomte (comm. pers.) communicated the rearing of E. confusus from curculionid larvae. Taken as a whole, our host survey is probably not exhaustive but nevertheless encompassed a total of several thousands of individuals of the “E. urozonus species group” and, with regard to the host’s diversity, 95 insect species representing 22 families and 6 orders (see dataset on Dryad: doi:10.5061/dryad.115m1). Taken as a whole, these host insects were distributed on 18 plant families. Dissimilarities in host range were calculated—at three taxonomic levels (species, family and order) for the host insect and at one level (family) for the host plant—using the Bray-Curtis distance, each host taxon being treated qualitatively (at least one record versus none). This information was summarized and presented as “ecological matrices”. Correlations between “phylogenetic”, “morphological” and “ecological” matrices were tested using simple (2 matrices) or partial (3 matrices) Mantel tests, the relevance of these last tests having been repeatedly discussed (see for instance [71] and [72]).

Fig. 1
figure 1

Phylogram of relationships among species of the “Eupelmus urozonus species group” obtained from the concatenated dataset alignment (5000 bp and 9 partitions) without the Gblocks cleaning of divergent blocks. Uppercase letters refer to clades discussed in the text. Nodes with likelihood bootstrap (BP) values <65 have been collapsed. BP (≥65) and Bayesian posterior probabilities (≥0.90) are indicated at nodes. Each line represents a sequenced individual with information in the following order: molecular code, species and country

Moreover, three kinds of traits were investigated using D-statistics (see previous paragraph):

  1. (a)

    Host specificity (“specialists” which were reared from a single host species versus “generalists” that were reared from more than one host species). This specificity was evaluated at the order-family taxonomic level and at the species level. Because one may argue that our sampling underestimates specialists, we also performed this analysis under the assumption that all the rare species (E. janstai, E. longicalvus, E. minozonus, E. priotoni, E. purpuricollis, E. vindex) could be specialists.

  2. (b)

    Ability (“Yes” or “No”) to successfully parasitize some well-represented insect taxa at the ordinal level (Coleoptera, Diptera, Hymenoptera and Lepidoptera) or at the family level (Cynipidae within Hymenoptera and Cecidomyiidae within Diptera).

  3. (c)

    The ability (“Yes” or “No”) to exploit some main host plants (whatever the host insect), host plant being treated at the family level (Asteraceae, Fagaceae, Rosaceae, Salicaceae, etc.).

Software and packages

Manipulations of files and statistical tests were conducted using the software R (http://www.R-project.org - version 3.0.3 – 2014-03-06) with the following packages “ade4” (Euclidian transformation of matrices) [73], “ape” (phylogeny) [74], “caper” (comparative analysis), “ecodist” (Mantel tests) [75] and “vegan” (similarities between host ranges) [76].

Results

Alignments and single-marker analyses

Successful amplification and sequencing was completed for all gene regions used in this study. However, sequencing failures occurred for some markers for a few individuals. Genbank accessions of the sequences obtained for all analysed genes are given in Additional file 1: Table S2. The final matrix contained 91 specimens. No stop codons, frame shifts, insertions or deletions were observed in coding gene regions.

The numbers of aligned base pairs, variable sites and parsimony-informative sites for each gene are summarized in Table 2. As expected, mitochondrial genes showed more parsimony-informative sites compared to nuclear markers (472 out of 1085 bp). Among the nuclear markers, EF- exhibited the lowest number of variable and parsimony-informative sites (respectively 116 and 106 out of 517 bp). For RpL27a, removing the highly divergent alignment blocks significantly reduced the number of variable and parsimony-informative sites (from 54 to 38 % for variable sites and from 34 to 30 % for parsimony-informative sites). This loss consequently affected the resolution of the corresponding inferred topology (Additional file 2: Figure S16 and Figure S17). In contrast, the Gblocks procedure did not affect the number of variable and parsimony-informative sites for Bub3 and RpS4 and the resolution of the corresponding topologies (Additional file 2: Figures S12 – S15).

Table 2 Numbers and percentage of aligned base pairs, variable sites and parsimony-informative sites for the genes used in this study

Evolution models and partitions in the concatenated dataset

Alignment lengths of the concatenated datasets with or without the exclusion of highly divergent blocks were 3197 bp and 5000 bp respectively. For all partitions, the best-fitting substitution model was the general time reversible model (GTR) with among-sites rate variation (ASRV) modelled by a discrete gamma distribution (Γ) [77] for which we used four categories. For all Bayesian analyses, after discarding 25 % of the samples as burn-in, the ESS value of each parameter largely exceeded 200, which indicated that convergence of runs was reached. Sixteen combined trees were obtained (Additional file 2: Figures S1 – S8). For all combined datasets, Bayes factors showed that the most complex partitioning scenario (9 partitions) was preferred over the three less complex ones (Table 3).

Table 3 Partitioning strategy selecting using Bayes factors (Harmonic-Mean) in Bayesian analyses

Evolutionary properties of the markers

Model parameter estimates for each partition of the Bayesian analysis of the “9 partitions without Gblocks cleaning dataset” are depicted in Table 4.

Table 4 Evolutionary properties of the partitions used in the study

As expected, the mitochondrial partitions showed high base compositional bias (71.4 and 89.8 % of A/T for the first two positions and the third codon position respectively). Among the nuclear gene partitions, RpL27a, Bub3 and RpS4 were A/T-biased (77.9, 70 and 68.8 %) while the A/T percentage in the 3rd codon positions in Wg and EF- was only 32 and 45 % respectively.

With the exception of EF- 1st and 2nd codon positions (18.9 %), there was an overall higher rate of A-G and C-T transitions (from 60.8 % for RpL27a up to 91.6 % for mtDNA 3rd codon positions). More precisely, mtDNA (all codon positions), Bub3 and Wg 1st & 2nd codon positions were in excess of C-T transitions.

For protein-coding genes (mtDNA, EF- and Wg), the rate multiplier parameter (m) was higher for the 3rd codon positions. Thus, mtDNA 3rd codon positions evolved more than sixteen times faster than the fastest nuclear gene (RpL27a).

The shape parameter of the gamma distribution (α) was also higher for the 3rd codon position of the protein coding genes, indicating that these positions show lower rate heterogeneity among sites. Additionally, α was lower for Bub3 than for RpS4 and RpL27a, indicating that Bub3 had a greater rate of heterogeneity among sites.

Phylogenetic trees inferred from concatenated datasets

Impacts of alignment strategy and reconstruction methods

ML and Bayesian topologies obtained from the concatenated alignments without Gblocks cleaning were more resolved than those obtained with removal of poorly aligned blocks. Whatever the partitioning scheme and regardless of whether or not divergent blocks were included in the analyses, most internal nodes were nevertheless statistically supported (BP value ≥ 65, PP value ≥ 90). Moreover, the 18 species recently defined by Al khatib et al. [32] and E. vindex were recovered as a monophyletic group.

Overall, topologies showed three major clades (A, B, C) that emerge on highly supported basal nodes (Figs. 1 and 2 and Additional file 2: Figures S1–S8). Three topological conflicts were observed depending on whether or not the Gblocks cleaning step was performed: (i) Clade A was not supported in topologies inferred from the datasets cleaned using Gblocks (Fig. 2 and Additional file 2: Figures S5–S8); (ii) E. vindex was sister to the rest of clade C in the topologies inferred from data sets cleaned using Gblocks (Fig. 2 and Additional file 2: Figures S5–S8), while it was sister to E. confusus and E. pistaciae (clade B) without Gblocks cleaning (Fig. 1 and Additional file 2: Figures S1–S4); (iii) the relationships of E. matranus and E. pini were resolved when Gblocks was used (PP = 1 and 0.98 respectively) (Fig. 2 and Additional file 2: Figures S5–S8), but not resolved without Gblocks cleaning of data sets (Fig. 1 and Additional file 2: Figures S1–S4). Taken as a whole, we decided to favour the alignment without the Gblocks procedure for the comparative analysis in order to favour the resolution for the terminal nodes.

Fig. 2
figure 2

Phylogram of relationships among species of the “Eupelmus urozonus species group” obtained from the concatenated dataset alignment (3197 bp and 9 partitions) with Gblocks-default parameters. Uppercase letters refer to clades discussed in the text. Nodes with likelihood bootstrap (BP) values <65 have been collapsed. BP (≥65) and Bayesian posterior probabilities (≥0.90) are indicated at nodes. Each line represents a sequenced individual with information in the following order: molecular code, species, and country

Fig. 3
figure 3

Mapping of ovipositor size and host ranges (host insect and related plants) along the multi-locus phylogeny of the “Eupelmus urozonus species group”. The phylogenetic tree used is derived from the Fig. 1. For convenience, sizes of branches were modified but the topology remains unchanged. In Fig. 3a, boxplots are shown for the absolute (AOS in μm) and relative (ROS – no unit) lengths of the ovipositor for each Eupelmus species. In each case, the vertical dotted line separates “short” versus “long” ovipositors. In Fig. 3b, the host specificity is indicated at three levels (from up to down): order, family, and species. Each rectangle indicates a possible host and the black ones indicate that at least one Eupelmus specimen was obtained from this host. In Fig. 3c, the plant host is indicated at the family level

Molecular relationships within the “Eupelmus urozonus species group”

ML and Bayesian analyses performed on the most complex partitioning scheme without Gblocks cleaning produced similar topologies with only a few differences for poorly supported nodes (Additional file 2: Figure S1). We therefore mapped all node support values (BP & PP) on the ML topology (Fig. 1).

In all analyses, the “E. urozonus species group” was recovered as monophyletic (Fig. 1) with a strong support. The group was subdivided into three clades, “clades” being defined here as a statistically-supported basal divergence including several species:

  • Clade A included E. acinellus, E. annulatus, E. azureus, E. cerris, E. gemellus, E. longicalvus and E. simizonus, whose relative positions were not resolved to the exception of the sister species relationship between E. acinellus and E. gemellus (BP = 100, PP = 1).

  • Clade B included three species with E. vindex being sister to E. confusus plus E. pistaciae with strong support (BP = 92, PP = 1).

  • Clade C included the remaining species and namely E. fulvipes, E. janstai, E. kiefferi, E. minozonus, E. opacus, E. priotoni, E. purpuricollis, E. tibicinis and E. urozonus. Within clade C, two well-supported (in each case, BP = 100, PP = 1) subclades—“sub-clade” being defined as a more terminal divergence including at least 2 species—can be distinguished (i) E. opacus, E. priotoni, E. purpuricollis and E. janstai; (ii) E. minozonus and E. urozonus. These two subclades together with E. tibicinis, whose exact phylogenetic position remains unclear, form a well-supported monophyletic group (BP = 98, PP = 1).

Comparative analysis and host uses

There were significant interspecific differences for both the absolute (AOS—Kruskal-Wallis test: χ2 16df = 93.7; p < 10−3; E. priotoni and E. simizonus discarded because of lack of replicates) and relative (ROS—Kruskal-Wallis test: χ2 16df = 109.2; p < 10−3; E. priotoni and E. simizonus also discarded) ovipositor lengths (Fig. 3a). AOS ranged from 398 μm in E. minozonus to a maximum of 1179 μm in E. cerris while ROS ranged from a minimum of 0.58 in E. fulvipes to a maximum of 1.16 in E. janstai. Even if AOS and ROS medians were significantly correlated one with another (Kendall’s rank correlation: z = 2.73; p = 0.006), some discrepancies were observed as for E cerris which exhibits the highest AOS but an intermediate ROS (Fig. 3a).

Within the “Eupelmus urozonus species group”, there was no significant correlation between similarity in ovipositor length and phylogenetic distance (Mantel test for AOS: r = 0.09, p = 0.39 – Mantel test for ROS: r = 0.08, p = 0.44). When ovipositor length was treated as a binary variable with “long” ovipositors being those above the third quartile (4 or 5 cases among the 19 species), the observed D-statistics for AOS (0.13) and ROS (1.33) never departed from a random distribution (respectively p =0.13 and p =0.61) or a Brownian one (respectively p =0.48 and p =0.14). Consequently, it seems that no strong clustering existed on the length of the ovipositor sheaths. Remarkable differences in the length of the ovipositor sheaths were even observed between some sister species: E. acinellusE. gemellus in clade A and E. janstaiE. purpuricollis in clade B (Fig. 3a).

Taken as a whole, our results indicated that both Cynipidae and Cecidomyiidae constitute the main host species for West Palearctic "E. urozonus species group" (Fig. 3b). Yet, contrasted feeding regimes (specialists versus generalists) were observed (Fig. 3b). Only three (E. acinellus, E. pistaciae and E. tibicinis) of the 13 species are strict specialists, with a distribution (D = 2.38) not significantly departing from both a random (p = 0.79) or a Brownian distribution (p =0.11). At the family and order level (same distributions), three other species were specialists of Cynipidae—E. azureus (reported on 21 host species), E. cerris (2 hosts) and E. fulvipes (4 hosts)—and one (E. opacus) on Cecidomyiidae. At these levels, the relative distribution of specialists and generalists (D = 1.65) does not differs from a random (p =0.72) or Brownian distribution (p =0.10) and, as shown in Fig. 33b, about 50–60 % of the described species in each of the three clades were specialists. The absence of a phylogenetic signal still holds under the assumption that all rare species (E. janstai, E. longicalvus, E. minozonus, E. priotoni, E. purpuricollis, E. vindex) are specialists. Departures from a random distribution is never significant (host species’ level: D =1.04 with p =0.51 – host order’s level: D =1.52 with p =0.76) while a significant departure is observed from a Brownian distribution at the host order’s level (host species’ level: p =0.12 – host order’s level: p =0.031). Interestingly, contrasted host ranges were observed between sister species: E. gemellus (six host species distributed in 3 orders)—E. acinellus (one host species) within clade A and E. confusus (thirteen species distributed in four orders)—E. pistaciae (one host species) within clade B (Fig. 3b).

We investigated the ability of the “E. urozonus species group” to parasitize host species belonging to Coleoptera, Diptera, Hymenoptera and Lepidoptera (ordinal level) or Cecidomyiidae within Diptera and Cynipidae within Hymenoptera (familial level) (see Fig. 3b). However, in all these cases, we were not able to observe significant departures from a random or a Brownian distribution (See Additional file 3: Table S4).

Correlations between phylogenetic, morphometric (absolute or relative lengths of the ovipositor sheaths, AOS and ROS) and ecological (host ranges) matrices were also tested using simple or partial Mantel tests, at each of the three levels (species, family and order). Overall, the Mantel coefficients ranged between −0.07 and +0.14 and were never significantly different from zero (see Additional file 4: Table S3). At the host species level, such a result could be explained by the fact that only 24 % of the hosts (mostly Cynipidae) are shared by at least two species of the “E. urozonus species group”. As a consequence, this level of investigation may be too precise to detect any signal. However, such a limit cannot be taken into account at the two other taxonomic levels since about half of the host families and all host orders except Neuroptera are shared by at least two species of Eupelmus. Taken as a whole, these results confirm those obtained using D-statistics about the absence of significant phylogenetic constraints on the host range evolution. The relative ovipositor length also does not appear to be a significant driver of the host use.

When host plants rather than host insects are taken into account, 18 plant families were identified (see Fig. 3c), eight of which being used by only one Eupelmus species. However, four main families were used by at least four Eupelmus species: Asteraceae (4 species), Fagaceae (9 species), Rosaceae (5 species) and Salicaceae (4 species). For each of these families, no phylogenetic signal was detected using the D-statistics (See Additional file 3: Table S4). Additionally, no correlation was found between the related ecological matrix and the phylogenetic, and/or morphometric (AOS/ROS) matrices (see Additional file 5: Table S5).

Discussion

Phylogenetic relationships within the “E. urozonus species group”

Phylogenetic inter-specific relationships within the “E. urozonus species group” occurring in the Palaearctic region were recently investigated by Al khatib et al. [32] based on morphological characters and two genetic markers (mitochondrial COI and nuclear Wg). This study showed an unsuspected diversity but it (i) failed to resolve phylogenetic relationships at both deep and intermediate levels, (ii) highlighted some discrepancies among tree topologies at the shallowest nodes resulting from COI and Wg sequences, (iii) did not include morphologically divergent but potentially phylogenetically closely related species. By considering new species and adding more informative markers, the present study improved the knowledge on the evolutionary history of the “E. urozonus species group”.

Although the phylogenetic resolution was proven to be sensitive to inclusion or exclusion of divergent blocks by using Gblocks procedure from the sequence alignments, we obtained a reliable phylogeny which strongly supported the monophyly of our focus group of Eupelmus, including the 18 species treated in Al khatib et al. [32] and E. vindex, which is morphologically distinct from other members of the group in the shape of the syntergum and the anterior displacement of the ovipositor sheaths (Gibson & Fusu, in prep). Additionally, the included species of the “E. urozonus species group” were distributed in three strongly supported clades, referred here as A, B and C (Fig. 1).

The molecular monophyly of the Palaearctic “E. urozonus species group” reflected in our concatenated datasets can be also supported through morphology. Al khatib et al. (in prep.) recently compared and combined the results of phylogenetic inferences using the molecular data presented here with morphological data. The main conclusion of this complementary work seems to be the structuration of Eupelmus as a set of independent species groups (including our focus group). Their delineation and their morphological supports are therefore not detailed here.

Despite using several loci from both the nuclear and mitochondrial genomes, some of the focal taxa remain poorly resolved. We expect that newer methods that dramatically increase the number of loci will help to better resolve these relationships (see for instance [78]).

Ecological differentiation within the “E. urozonus species group”

The diversification of parasitic organisms has been explained by various processes linking ecological specialization and speciation. For parasitoids, phylogenetic information and reliable host ranges are necessary to describe the patterns (distribution of generalist and specialist species) and to understand the underlying processes (e.g. “musical chairs” versus “oscillation”). This motivated the present work. Although members of the genus Eupelmus are usually described as generalist ectoparasitoids [27, 28], our study nevertheless leads to a more complex pattern. Our results indeed showed the coexistence of “strict” specialists restricted to one specific host (i.e. E acinellus, E. pistaciae, E. tibicinis), intermediate specialists that can parasitize various species of Cynipidae (i.e. E. azureus, E. cerris and E. fulvipes) and generalists that are able to successfully develop on different insect orders (i.e. E. annulatus, E. confusus, E. gemellus and E. kiefferi).

This diversity in host use observed in the “E. urozonus species group” does not seem to be driven by phylogenetic history as generalists and specialists were recovered in each of the three clades. Moreover, some sister species exhibited fully contrasted ecologies (generalist species cited first): E. confususE. pistaciae and E. gemellusE. acinellus. In this last case, because the facultative hyperparasitism lifestyle is recorded for some species of Eupelmus, we strongly suspect that E. gemellus develops as a hyperparasitoid of E. acinellus on Mesophleps oxycedrella (Lepidoptera). If this is true, it would mean that none of these generalists (E. confusus and E. gemellus) share any hosts with its sister species. Even if it is not the case, such contrasting patterns of host use remain, to our knowledge, rare in parasitoid species.

Quite similar conclusions arose when host plants instead hosts insects were taken into account. There was indeed no correlation between host plant ranges, phylogenetic and/or morphometric constraints. Moreover, the use of the four main plant families (Asteraceae, Fagaceae, Rosaceae and Salicaceae) did not seem to be constrained by the phylogenetic history. The underlying rationale of this complementary analysis was that host plants could at least partly determine ecological specialization of Eupelmus species insofar as the parasitoid species could use, innately or through learning, plant-linked cues in order to locate favourable environments, be the cues emitted passively (olfactory or visual information) or actively (synomones) (see for instance [7981]). One criticism to this approach would, of course, be the level (plant family) at which our analysis was performed since it implies that only well-conserved cues could be detected.

A final facet of our investigation was the potential role of the ovipositor sheaths (as a proxy of the ovipositor length) as a driver of host use. The rationale was that (i) ovipositor structure could be constrained by the phylogenetic history of the species and, (ii) ovipositor length could determine accessibility to different hosts [82, 83]. None of these hypotheses was however verified, ovipositor length appearing to be a very labile trait within our focus group.

Another driver of host range evolution could be the complexity of gall communities exploited by the Eupelmus species. Indeed, in numerous cases, Eupelmus species are occurring with numerous parasitoid species belonging to different chalcid families (e.g. Torymidae, Eurytomidae or Pteromalidae) which seem to be more functionally adapted to their hosts (see for instance [34, 84] and [85]). Such recurrent interspecific competitions may represent a potential limit for the abundance of Eupelmus but may also, ultimately, offer evolutionary opportunities. In particular, such an ecological intimacy could promote some switches towards unusual but ecologically related host insects and/or transitions towards other developmental modes (hyperparasitism or even predation). Such kind of adaptations may be illustrated by E. tibicinis, a specialist predator of the eggs of the red cicada, Tibicina haematodes (Scopoli, 1763) (Hemiptera: Tibicinidae).

Conclusions

This paper provides comprehensive information about the ecological differentiation within the Palaearctic species of the “E. urozonus species group” and contributes to our understanding of ecological specialization in parasitoids. Although further investigations are required, the intimate mixing of generalist and specialist species along the phylogeny leans toward the “oscillation hypothesis” (sensu Hardy and Otto [21]). It also raises new questions at both the inter- and intra-specific levels. At the intra-specific level, more detailed population genetics studies would be useful to test the existence of “host races” within generalist species, which could be a way to, (i) explain the capacity of a single species to develop in different hosts and (ii) offer opportunities for the recurrent apparition of specialized lineages and ultimately species. At the interspecific level, the partitioning of the available resources within sympatric Eupelmus species and with other chalcid wasps remains unclear. This would probably require a better knowledge of potential and realised host ranges, interspecific interactions (e.g., competition and hyperparasitism) and investigations on the influence of host plants on the associated parasitoids (e.g., attraction/repellence; phenology and structure of galls). Finally, an agronomic output of such investigations would be a better knowledge of the actual potential of some Eupelmus species to regulate certain insect pests such as the olive fruit fly, Bactrocera oleae (Gmelin, 1790) [8689] or the chestnut gall wasp Dryocosmus kuriphilus Yasumatsu, 1951 [9092].

Availability of supporting data

The data sets supporting the results are available in Dryad (doi: 10.5061/dryad.115m1).

All sequences are available in Genbank (http://www.ncbi.nlm.nih.gov/genbank). Genbank accession numbers are given in Additional file 2: Table S2.