Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum

Dennis, Alice B.; Ballesteros, Gabriel I.; Robin, Stéphanie; Schrader, Lukas; Bast, Jens; Berghöfer, Jan; Beukeboom, Leo W.; Belghazi, Maya; Bretaudeau, Anthony; Buellesbach, Jan; Cash, Elizabeth; Colinet, Dominique; Dumas, Zoé; Errbii, Mohammed; Falabella, Patrizia; Gatti, Jean-Luc; Geuverink, Elzemiek; Gibson, Joshua D.; Hertaeg, Corinne; Hartmann, Stefanie; Jacquin-Joly, Emmanuelle; Lammers, Mark; Lavandero, Blas I.; Lindenbaum, Ina; Massardier-Galata, Lauriane; Meslin, Camille; Montagné, Nicolas; Pak, Nina; Poirié, Marylène; Salvia, Rosanna; Smith, Chris R.; Tagu, Denis; Tares, Sophie; Vogel, Heiko; Schwander, Tanja; Simon, Jean-Christophe; Figueroa, Christian C.; Vorburger, Christoph; Legeai, Fabrice; Gadau, Jürgen

doi:10.1186/s12864-020-6764-0

Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum

Research article
Open access
Published: 29 May 2020

Volume 21, article number 376, (2020)
Cite this article

Download PDF

You have full access to this open access article

BMC Genomics Aims and scope Submit manuscript

Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum

Download PDF

Alice B. Dennis ORCID: orcid.org/0000-0003-0948-9845^1,2,3^na1,
Gabriel I. Ballesteros^4,5,6^na1,
Stéphanie Robin^7,8,
Lukas Schrader⁹,
Jens Bast^10,11,
Jan Berghöfer⁹,
Leo W. Beukeboom¹²,
Maya Belghazi¹³,
Anthony Bretaudeau^7,8,
Jan Buellesbach⁹,
Elizabeth Cash¹⁴,
Dominique Colinet¹⁵,
Zoé Dumas¹⁰,
Mohammed Errbii⁹,
Patrizia Falabella¹⁶,
Jean-Luc Gatti¹⁵,
Elzemiek Geuverink¹²,
Joshua D. Gibson^14,17,
Corinne Hertaeg^1,18,
Stefanie Hartmann³,
Emmanuelle Jacquin-Joly¹⁹,
Mark Lammers⁹,
Blas I. Lavandero⁶,
Ina Lindenbaum⁹,
Lauriane Massardier-Galata¹⁵,
Camille Meslin¹⁹,
Nicolas Montagné¹⁹,
Nina Pak¹⁴,
Marylène Poirié¹⁵,
Rosanna Salvia¹⁶,
Chris R. Smith²⁰,
Denis Tagu⁷,
Sophie Tares¹⁵,
Heiko Vogel²¹,
Tanja Schwander¹⁰,
Jean-Christophe Simon⁷,
Christian C. Figueroa^4,5,
Christoph Vorburger^1,2,
Fabrice Legeai^7,8 &
…
Jürgen Gadau⁹

4946 Accesses
16 Citations
11 Altmetric
Explore all metrics

Abstract

Background

Parasitoid wasps have fascinating life cycles and play an important role in trophic networks, yet little is known about their genome content and function. Parasitoids that infect aphids are an important group with the potential for biological control. Their success depends on adapting to develop inside aphids and overcoming both host aphid defenses and their protective endosymbionts.

Results

We present the de novo genome assemblies, detailed annotation, and comparative analysis of two closely related parasitoid wasps that target pest aphids: Aphidius ervi and Lysiphlebus fabarum (Hymenoptera: Braconidae: Aphidiinae). The genomes are small (139 and 141 Mbp) and the most AT-rich reported thus far for any arthropod (GC content: 25.8 and 23.8%). This nucleotide bias is accompanied by skewed codon usage and is stronger in genes with adult-biased expression. AT-richness may be the consequence of reduced genome size, a near absence of DNA methylation, and energy efficiency. We identify missing desaturase genes, whose absence may underlie mimicry in the cuticular hydrocarbon profile of L. fabarum. We highlight key gene groups including those underlying venom composition, chemosensory perception, and sex determination, as well as potential losses in immune pathway genes.

Conclusions

These findings are of fundamental interest for insect evolution and biological control applications. They provide a strong foundation for further functional studies into coevolution between parasitoids and their hosts. Both genomes are available at https://bipaa.genouest.org.

Apicomplexa

The numbers of fungi: contributions from traditional taxonomic studies and challenges of metabarcoding

Article Open access 28 April 2022

On Being a Caterpillar: Structure, Function, Ecology, and Behavior

Background

Parasites are ubiquitously present across all of life [1, 2]. Their negative impact on host fitness can impose strong selection on hosts to resist, tolerate, or escape potential parasites. Parasitoids are a special group of parasites whose successful reproduction is fatal to the host [3, 4]. The overwhelming majority of parasitoid insects are hymenopterans that parasitize other terrestrial arthropods, and they are estimated to comprise up to 75% of the species-rich insect order Hymenoptera [4,5,6,7]. Parasitoid wasps target virtually all insects and developmental stages (eggs, larvae, pupae, and adults), including other parasitoids [4, 8,9,10]. Parasitoid radiations appear to have coincided with those of their hosts [11], and there is ample evidence that host-parasitoid relationships impose strong reciprocal selection, promoting a dynamic process of antagonistic coevolution [12,13,14].

Parasitoids of aphids play an economically important role in biological pest control [15, 16], and aphid-parasitoid interactions are an excellent model to study antagonistic coevolution, specialization, and speciation [17, 18]. While parasitoids that target aphids have evolved convergently several times, their largest radiation is found in the braconid subfamily Aphidiinae, which contains at least 400 described species across 50 genera [9, 19]. As koinobiont parasitoids, their development progresses initially in still living, feeding, and developing hosts, and ends with the aphids’ death and the emergence of adult parasitoids. Parasitoids increase their success with a variety of strategies, including host choice [20, 21], altering larval development timing [22], injecting venom during stinging and oviposition, and developing special cells called teratocytes to circumvent host immune responses [23,24,25,26,27]. In response to strong selection imposed by parasitoids, aphids have evolved numerous defenses, including behavioral strategies [28], immune defenses [29], and symbioses with heritable endosymbiotic bacteria whose integrated phages can produce toxins to hinder parasitoid success [12, 30, 31].

The parasitoid wasps Lysiphlebus fabarum and Aphidius ervi (Braconidae: Aphidiinae) are closely related endoparasitoids of aphids (Fig. 1) [9, 11, 38]. In the wild, both species are found infecting a wide range of aphid species although their host ranges differ, with A. ervi more specialized on aphids in the Macrosiphini tribe and L. fabarum on the Aphidini tribe [39, 40]. Experimental evolution studies in both species have shown that wild-caught populations can counter-adapt to cope with aphids and the defenses of their endosymbionts, and that the coevolutionary relationships between parasitoids and the aphids’ symbionts likely fuel diversification of both parasitoids and their hosts [41,42,43]. While a number of parasitoid taxa are known to inject viruses and virus-like particles into their hosts, there is thus far no evidence that this occurs in parasitoids that target aphids; recent studies have identified two abundant RNA viruses in L. fabarum [44, 45], but whether this impacts their ability to parasitize is not yet clear.

Aphidius ervi and L. fabarum differ in several important life history traits, and are expected to have experienced different selective regimes as a result. Aphidius ervi has been successfully introduced as a biological control agent in Nearctic and Neotropic regions. Studies on both native and introduced populations of A. ervi have shown ongoing evolution with regard to host preferences, gene flow, and other life history components [46,47,48,49]. Aphidius ervi is known to reproduce only sexually, whereas L. fabarum is capable of both sexual and asexual reproduction. In fact, wild L. fabarum populations are more commonly composed of asexually reproducing (thelytokous) individuals [50], and this asexuality is not due to infection with endosymbionts like Wolbachia [51]. In asexual populations of L. fabarum, diploid females produce diploid female offspring via central fusion automixis [52]. While they are genetically differentiated, sexual and asexual populations appear to maintain gene flow; both reproductive modes and genome-wide heterozygosity are maintained in the species as a whole [50, 53, 54]. Aphidius ervi and L. fabarum have also experienced different selective regimes with regard to their cuticular hydrocarbon profiles and chemosensory perception. Lysiphlebus target aphid species that are ant-tended, and ants are known to prevent parasitoid attacks on “their“ aphids [55]. To counter ant defenses, L. fabarum has evolved the ability to mimic the cuticular hydrocarbon profile of the aphid hosts [56, 57]. This enables the parasitoids to circumvent ant defenses and access this challenging ecological niche, from which they also benefit nutritionally; they are the only parasitoid species thus far documented to behaviorally encourage aphid honeydew production and consume this high-sugar reward [55, 58, 59].

We present here the genomes of A. ervi and L. fabarum, assembled de novo using a hybrid sequencing approach. The two genomes are strongly biased towards AT nucleotides. We have examined GC content in the context of host environment, nutrient limitation, and gene expression. By comparing these two genomes, we identify key functional specificities in genes underlying venom composition, oxidative phosphorylation (OXPHOS), cuticular hydrocarbon (CHC) composition, sex determination, development (Osiris), and chemosensory perception. In both species, we identify putative losses in key immune genes and an apparent lack of key DNA methylation machinery. These are functionally important traits associated with success infecting aphids and the evolution of related traits across all of Hymenoptera.

Results

Two de novo genome assemblies

The genome assemblies for A. ervi and L. fabarum were constructed using hybrid approaches that incorporated high-coverage short read (Illumina) and long-read (Pac Bio) sequences, and were assembled with different strategies (Supplementary Tables 1 and 2). This produced two high quality genome assemblies (N50 in A. ervi: 581 kb, in L. fabarum: 216 kb) with similar total lengths (A. ervi: 139Mbp, L. fabarum: 141Mbp) but different ranges of scaffold-sizes (Table 1, Supplementary Table 3). The length of these assemblies is in range of that predicted by a kmer analysis with the K-mer Analysis Toolkit (KAT) (Supplementary Figure 1) [60], which predicted A. ervi at 142.83Mbp and L. fabarum at 99.26Mbp. However, the L. fabarum assembly is larger than the estimate from KAT; we suspect that this may be due to duplications in the assembly, and future work should address these duplications. These assembly lengths are also within previous estimates of 110-180Mbp for braconids, including A. ervi [61, 62] and are on par with those predicted in other hymenopteran genomes (Table 2). Both genomes were screened for potential contamination (Supplementary Figures 2 and 3, Supplementary Table 6, Additional files 1 and 2) based on BLAST [63] matches to host aphids and results of the program blobtools [64], which jointly examines GC content and sequencing depth. In addition to identifying likely bacterial scaffolds (A. ervi: 35 scaffolds/ 106Kbp removed, no scaffolds removed from L. fabarum), blobtools revealed one outlier scaffold in L. fabarum with high coverage and low GC content (tig00001511, 10,205 bp, 11.1% GC). A BLASTn search against the NCBI nt database matched this to the mitochondrial genome of Aphidius gifuensis. In this and other parasitoids, the mitochondrial genome has been shown to be highly enriched with AT repeats, with GC contents that are nearly as low as the 11.1% found in this L. fabarum scaffold (13.5–17.5%) [65]. The assemblies are available in NCBI (PRJNA587428, SAMN13190903–4) and can be accessed via the BioInformatics Platform for Agroecosystem Arthropods (BIPAA, https://bipaa.genouest.org), which contains the full annotation reports, predicted genes, and can be searched via both keywords and BLAST.

Table 1 Assembly and draft annotation statistics

Full size table

Table 2 Assembly summary statistics compared to other parasitoid genomes. All species are from the family Braconidae, except for N. vitripennis (Pteromalidae) and D. collaris (Ichneumonidae). Protein counts from the NCBI genome deposition

Full size table

We constructed linkage groups for L. fabarum using phased SNPs from the haploid sons of a single female wasp from a sexually reproducing population. This placed the 297 largest scaffolds (> 50% of the nucleotides, Supplementary Table 7, Supplementary Figure 4, Additional file 3) onto the expected six chromosomes [52]. With this largely contiguous assembly, we identified stretches of syntenic sequence between the two genomes, with > 60 k links in alignments made by NUCmer [66] and > 350 large syntenic blocks that match the six L. fabarum chromosomes to 28 A. ervi scaffolds (Supplementary Figures 5 and 6).

The Maker2 annotation pipeline predicted coding genes (CDS) in both genomes separately, and these were functionally annotated against the NCBI nr database [67], gene ontology (GO) terms [68, 69], and predictions for known protein motifs, signal peptides, and transmembrane domains (Supplementary Table 5). In A. ervi there were 20,328 predicted genes comprising 24.7Mbp, whereas in L. fabarum there were 15,203 genes across 21.9Mbp (Table 1). Matches to the BUSCO (Benchmarking Universal Single-Copy Orthologs) genes assessed completeness against the Insecta database genes at both the nucleotide level (A. ervi: 94.8%, L. fabarum: 76.3%, Supplementary Table 4) and protein level in the predicted genes (A. ervi: 93.7%, L. fabarum: 95.9%). These protein level matches are close to those found in other assembled parasitoid genomes, which report between 96 and 99% total coverage of BUSCO genes [32,33,34,35,36,37]. In both species, there was also high transcriptomic support for the predicted genes (77.8% in A. ervi and 88.3% in L. fabarum).

A survey of transposable Elements (TEs) identified a similar overall number of putative TE elements in the two assemblies (A. ervi: 67,695 and L. fabarum: 60,306, Supplementary Table 8). Despite this similarity, the overall coverage by repeats is larger in the assembly of L. fabarum (41%, 58Mbp) than in A. ervi (22%, 31Mbp) and both assemblies differ in the TE classes that they contain (Supplementary Table 8, Supplementary Figures 7 and 8). This could be the product of their different assembly methods. However, direct estimates from unassembled short read data suggest even higher repeat content in L. fabarum (49.1% vs. 29.3% in A. ervi), largely explained by differences in simple repeats and low-complexity sequences (Supplementary Table 9).

To examine genes that may underlie novel functional adaptation, we identified sequences that are unique within the predicted genes in the A. ervi and L. fabarum genomes. We defined these orphan genes as predicted genes with transcriptomic support and with no identifiable homology based on searches against the NCBI nr, nt, and Swissprot databases. We identified 2568 (A. ervi, Additional file 4) and 968 (L. fabarum, Additional file 5) putative orphans.

GC content

The L. fabarum and A. ervi genomes are the most GC-poor of insect genomes sequenced to date (GC content: 25.8 and 23.8% for A. ervi and L. fabarum, respectively, Table 1, Supplementary Figure 9, Additional file 6). This nucleotide bias is accompanied by strong codon bias in the predicted genes, meaning that within the possible codons for each amino acid, the two genomes are almost universally skewed towards the codon(s) with the lowest GC content (measured as Relative Synonymous Codon Usage, RSCU, Fig. 2). We examined potential constraints in codon usage between our two species’ genomes and taxa associated with this parasitoid-host-endosymbiont system (Supplementary Table 10). We found no evidence of similarity in codon usage (scaled as RSCU) nor nitrogen content (scaled per amino acid) between parasitoids and host aphids, the primary endosymbiont Buchnera, or the secondary endosymbiont Hamiltonella (Supplementary Figures 10, 11 and 12).

As selective pressure for translational efficiency, stability, and secondary structure should be higher in more highly expressed genes [70,71,72,73], we examined GC content in relation to expression level. We first explored constraints by looking at overall expression levels. In both species, the most highly expressed 10% of genes had significantly higher GC and higher nitrogen contents, although the higher number of nitrogen molecules in Guanine and Cytosine means that these two measures cannot be entirely disentangled (Additional file 7, Supplementary Figure 13). This is in line with observations across many taxa, and with the idea that GC-rich mRNA has increased expression via its stability and secondary structure [72, 73].

We next utilized available transcriptomic data from adult and larval L. fabarum to examine life-stage specific constraints. We found higher GC content in larvae-biased genes in L. fabarum (Fig. 3). This was true when we compared both the 10% most highly expressed genes in adults (32.6% GC) and larvae (33.2%, p = 1.2e-116, Fig. 3, Additional file 7), and this pattern holds even more strongly for genes that are differentially expressed between adults (upregulated in adults: 28.7% GC) and larvae (upregulated in larvae: 30.7% GC, p = 2.2e-80). Note that the most highly expressed genes overlap partially with those that are differentially expressed (Additional file 7). At the same time, nitrogen content did not differ in either comparison (Fig. 3).

Gene family expansions

To examine gene families that may have undergone expansions in association with functional divergence and specialization, we identified groups of orthologous genes that have increased and decreased in size in the two genomes, relative to one another. We identified these species-specific gene-family expansions using the Orthologous MAtrix (OMA) standalone package [74]. OMA predicted 8817 OMA groups (strict 1:1 orthologs) and 8578 Hierarchical Ortholog Groups (HOGs, Additional file 8). Putative gene-family expansions would be found in the predicted HOGs, because they are calculated to allow for > 1 member per species. Among these, there were more groups in which A. ervi possessed more genes than L. fabarum (865 groups with more genes in A. ervi, 223 with more in L. fabarum, Supplementary Figure 14, Additional file 8). To examine only the largest gene-family expansions, we looked further at the HOGs containing > 20 genes (10 HOG groups, Supplementary Figure 15). Strikingly, the four largest expansions were more abundant in A. ervi and were all identified as F-box proteins/Leucine-rich-repeat proteins (LRR, total: 232 genes in A. ervi and 68 in L. fabarum, Supplementary Figure 15, Additional file 8). This signature of expansion does not appear to be due to fragmentation in the A. ervi assembly; the size of scaffolds containing LRRs is on average larger in A. ervi than in L. fabarum (Welch two-sample t-test, p = 0.001, Supplementary Figure 16). The six largest gene families that were expanded in L. fabarum, relative to A. ervi, were less consistently annotated. Interestingly, they contained two different histone proteins: Histone H2B and H2A (Supplementary Figure 15).

Venom proteins

We examined the venom of both species using evidence from proteomics, transcriptomics, and manual gene annotation. The venom gland of L. fabarum is morphologically different from that of A. ervi (Supplementary Figure 17). A total of 35 L. fabarum proteins were identified as putative venom proteins by 1D gel electrophoresis and mass spectrometry, combined with transcriptomic and the genomic data (Supplementary Figure 18, Additional file 9) [42]. These putative venom proteins were identified based on predicted secretion (for complete sequences) and the absence of a match to typical cellular proteins (e.g. actin, myosin). To match the analysis between the two taxa, previously generated A. ervi venom protein data [24] were analyzed using the same criteria as for L. fabarum. This identified 32 putative venom proteins in A. ervi (Additional file 9). More than 50% of the proteins are shared between species (Fig. 4a and Additional file 9), corresponding to more than 70% of the predicted putative functional categories (Fig. 4b and Additional file 9). Among the venom proteins shared between both parasitoids, a gamma glutamyl transpeptidase (GGT1) was the most abundant protein in the venom of both A. ervi [24] and L. fabarum (Additional file 9). As previously reported for A. ervi [24], a second GGT venom protein (GGT2) containing mutations in the active site was also found in the venom of L. fabarum (Supplementary Figures 19 and 20).

Phylogenetic analysis (Fig. 5) showed that the A. ervi and L. fabarum GGT venom proteins occur in a single clade in which the GGT1 venom proteins group separately from GGT2 venom proteins, thus suggesting that they originated from a duplication that occurred prior to the split from their most recent common ancestor. As previously shown for A. ervi, the GGT venom proteins of A. ervi and L. fabarum are found in one of the three clades described for GGT proteins of non-venomous hymenopterans (clade “A”, Fig. 5) [24]. Within this clade, venomous and non-venomous GGT proteins had a similar exon structure, except for exon 1 that corresponds to the signal peptide only being present in venomous GGT proteins (Supplementary Figure 19). Several LRR proteins were found in the venom of L. fabarum as well, although these results should be interpreted with caution since the sequences were incomplete and the presence of a signal peptide could not be confirmed (Additional file 9). Moreover, these putative venom proteins were only identified from transcriptomic data of the venom apparatus and we could not find any corresponding annotated gene in the genome. This supports the idea that gene-family expansions in putative F-box/LRR proteins identified in the analysis with OMA are not related to venom production.

Approximately 50% of the identified venom proteins were unique to either A. ervi or L. fabarum (Additional file 9). However, many of these proteins had no predicted function, making it difficult to hypothesize their possible role in parasitism success. Among those that could be identified was apolipophorin in the venom of L. fabarum, but not in A. ervi. Apolipophorin is an insect-specific apolipoprotein involved in lipid transport and innate immunity, and is not commonly found in venoms. Among parasitoid wasps, apolipophorin has been described in the venom of the ichneumonid Hyposoter didymator [75] and the encyrtid Diversinervus elegans [76], but its function is yet to be deciphered. Apolipophorin is also present in low abundance in honeybee venom where it could have antibacterial activity [77, 78]. In contrast, we could not find L. fabarum homologs for any of the three secreted cysteine-rich toxin-like peptides that are highly expressed in the A. ervi venom apparatus (Additional file 9).

Key gene families

We manually annotated 719 genes in A. ervi and 642 in L. fabarum (Table 3) using Apollo, hosted on the BIPAA website: bipaa.genouest.org [79,80,81].

Table 3 Summary of manual curations of select gene families in the two parasitoid genomes

Full size table

Desaturases

Annotation of desaturase genes found that L. fabarum has three fewer desaturase genes than A. ervi (Table 3, Supplementary Table 12, Supplementary Figure 24). Examination of the cuticular hydrocarbon (CHC) profiles of L. fabarum and A. ervi identified several key differences. The CHC profile of L. fabarum is dominated by saturated hydrocarbons (alkanes), contains only trace alkenes, and is completely lacking dienes (Supplementary Figures 21 and 23). In contrast, A. ervi females produce a large amount of unsaturated hydrocarbons, with a substantial amount of alkenes and alkadienes in their CHC profiles (app. 70% of the CHC profile are alkenes/alkadienes, Supplementary Figures 22 and 23).

Immune genes

We searched for immune genes in the two genomes based on a list of 373 immunity related genes, collected primarily from the Drosophila literature (Additional file 10). We found and annotated > 70% of these in both species (A. ervi: 270, L. fabarum: 264 genes). We compared these with the immune genes used to define the main Drosophila immune pathways (Toll, Imd, and JAK-STAT, Supplementary Table 13) and conserved in a number of insect species [82,83,84]. In the genome of both wasps, some of the genes encoding proteins of the Imd and Toll pathways were absent (Supplementary Table 13, Supplementary Figure 25, Additional file 10). Only one GNBP (Gram Negative Binding Protein) involved in Gram positive bacteria and fungi recognition was found in A. ervi and L. fabarum, compared to the three known from Drosophila and 2 from Apis (Supplementary Table 13). PGRPs (Peptidoglycan Recognition Proteins) are involved in the response to Gram-positive bacteria [85], and we did not find any significant matches to these, although two short matches did not meet our selection criteria (blast matches >1e-5). Similarly, the only match to imd itself was very poor in A. ervi (e-value: 0.058, Additional file 10), and we could not find any match in L. fabarum. The components of the Toll and JAK/Stat pathways appear to be less affected than those of the Imd pathway, although in all cases the output effectors remained mainly unknown.

Osiris genes

The Osiris genes are an insect-specific gene family that underwent multiple tandem duplications early in insect evolution. These genes are essential for proper embryogenesis [86] and pupation [87, 88], and are also tied to immune and toxin-related responses (e.g.) [87, 89] and developmental polyphenism [90, 91].

We found 21 and 25 putative Osiris genes in the A. ervi and L. fabarum genomes, respectively (Supplementary Tables 14 and 15, Supplementary Figure 26). In insects with well assembled genomes, there is a consistent synteny of approximately 20 Osiris genes; this cluster usually occurs in a ~ 150kbp stretch and gene synteny is conserved in all known Hymenoptera genomes. The Osiris cluster is also largely devoid of non-Osiris genes in most of the Hymenoptera, but the assemblies of A. ervi and L. fabarum suggest that if the cluster is actually syntenic in these species, there are interspersed non-Osiris genes (black boxes in Supplementary Figures 27 and 28).

In support of their role in defense (especially metabolism of xenobiotics and immunity), these genes were much more highly expressed in larvae than in adults (Supplementary Table 15). We hypothesize that their upregulation in larvae is an adaptive response to living within a host. Because of the available transcriptomic data, we could only make this comparison in L. fabarum. Here, 19 of the 26 annotated Osiris genes were significantly upregulated in larvae over adults (Supplementary Table 15, Additional file 11). In both species, transcription in adults was very low, with fewer than 10 raw reads per cDNA library sequenced, and most often less than one read per library (Supplementary Tables 14 and 15).

OXPHOS

In most eukaryotes, mitochondria provide the majority of cellular energy (in the form of adenosine triphosphate, ATP) through the oxidative phosphorylation (OXPHOS) pathway. OXPHOS genes are an essential component of energy production, and their amino acid substitution rate in Hymenoptera is higher relative to any other insect order [92]. We identified 69 out of 71 core OXPHOS genes in both genomes, as well as five putative duplication events that are apparently not assembly errors (Supplementary Table 16, Additional file 12). The gene sets of A. ervi and L. fabarum contained the same genes and the same genes were duplicated in each, implying duplication events occurred prior to the split from their most recent common ancestor. One of these duplicated genes appears to be duplicated again in A. ervi, or the L. fabarum copy has been lost.

Chemosensory genes

Genes underlying chemosensory reception play important roles in parasitoid mate and host localization [93, 94]. Several classes of chemosensory genes were annotated separately (Table 3). With these manual annotations, further studies can now be made with respect to life history characters including reproductive mode, specialization on aphid hosts, and mimicry.

Chemosensory: soluble proteins (OBPs and CSPs)

Odorant-binding proteins (OBPs) and chemosensory proteins (CSPs) are possible carriers of chemical molecules to sensory neurons. Hymenoptera have a wide range of known OBP genes, with up to 90 in N. vitripenis [95]. However, the numbers of these genes appear to be similar across parasitic wasps, with 14 in both species studied here and 15 recently described in D. alloeum [33]. Similarly, CSP numbers are in the same range within parasitic wasps (11 and 13 copies here, Table 3). Interestingly, two CSP sequences (one in A. ervi and one in L. fabarum) did not have the conserved cysteine motif, characteristic of this gene family. Further work should investigate if and how these genes function.

Chemosensory: odorant receptors (ORs)

Odorant receptors (ORs) are known to detect volatile molecules. In total, we annotated 228 putative ORs in A. ervi and 156 in L. fabarum (Table 3). This is within the range of OR numbers annotated in other hymenopteran parasitoids, including: 79 in M. cingulum [96], 225 in N. vitripennis [97], and 187 in D. alloeum [33]. Interestingly, we annotated a larger set of ORs in A. ervi than in L. fabarum. One explanation is that A. ervi generally has more annotated genes than L. fabarum, and whatever broad pattern underlies the reduction in the gene repertoire of L. fabarum also affected OR genes. Another possibility is that the switch to an asexual reproduction has also led to a reduction in the number of OR genes, because pheromones linked to mate finding, recognition and courtship behavior are no longer necessary in an asexually reproducing species.

Chemosensory: ionotropic chemosensory receptors (IRs)

Ionotropic receptors (IRs) are involved in both odorant and gustatory molecule reception. In total, we annotated 38 putative IRs in A. ervi and 37 in L. fabarum (Table 3). Three putative co-receptors (IR 8a, IR 25a and IR 76b) were annotated in both species, one of which (IR 76b) was duplicated in A. ervi. This brings the total for the IR functional group to 42 and 40 genes for A. ervi and L. fabarum, respectively. This is within the range of IRs known from other parasitoid wasps such as Aphidius gifuensis (23 IRs identified in antennal transcriptome, Braconidae) [98], D. alloeum (51 IRs, Braconidae) [33] and N. vitripennis (47 IRs, Pteromalidae) [97]. A phylogenetic analysis of these genes showed a deeply rooted expansion in the IR genes (Supplementary Figure 29). Thus, in contrast to the expansion usually observed in hymenopteran ORs compared to other insect orders, IRs have not undergone major expansions in parasitic wasps, which is generally the case for a majority of insects with the exception of Blattodea [99].

Sex determination

The core sex determination genes (transformer, doublesex) are conserved in both species (Supplementary Table 17, Additional file 13). Notably, A. ervi possesses a putative transformer duplication. This scaffold carrying the duplication (scaffold2824) is only fragmentary, but a transformer duplicate has also been detected in the transcriptome of a member of the A. colemani species complex, suggesting a conserved presence within the genus [11]. In A. ervi, transformer appears to have an internal repeat of the CAM-domain, as is seen in the genus Asobara [100]. In contrast, there is no evidence of duplication in sex determination genes in L. fabarum. This supports the idea that complementary sex determination (CSD) in sexually reproducing L. fabarum populations is based on up-stream cues that differ from those known in other CSD species [101], whereas the CSD locus known from other hymenopterans is a paralog of transformer [102].

In addition to the core sex determination genes, we identified homologs of several genes related to sex determination (Supplementary Table 18). We identified fruitless in both genomes, which is associated with sex-specific behavior in taxa including Drosophila [103]. Both genomes also have homologs of sex-lethal which is the main determinant of sex in Drosophila [104], but not other insects. Drosophila has two homologs of this gene, and the single version in Hymenoptera may have more in common with the non-sex-lethal copy, called sister-of-sex-lethal. We identified homologs of the gene CWC22, including a duplication in A. ervi; this duplication is interesting because a duplicated copy of CWC22 is the primary signal of sex determination in the house fly Musca domestica [105]. Lastly, there was a duplication of RBP1 in both genomes. The duplication of RBP1 is not restricted to these species, nor is the duplication of CWC22, which appears sporadically in Braconidae. Together, these annotations add to our growing knowledge of duplications of these genes and provide possibilities for further examinations of the role of duplications and specialization in association with sex determination.

DNA methylation genes

DNA methyltransferase genes are thought to be responsible for the generation and maintenance of DNA methylation. In general, DNA methyltransferase 3 (DNMT3) introduces de novo DNA methylation sites and DNA methyltransferase 1 (DNMT1) maintains and is essential for DNA methylation [106, 107]. A third gene, EEF1AKMT1 (formerly known as DNMT2), was once thought to act to methylate DNA but is now understood to methylate tRNA [107]. In both A. ervi and L. fabarum, we successfully identified homologs DNMT3 and EEF1AKMT1. In contrast, DNMT1 was not detected in either species (Table 4, Supplementary Table 19).

Table 4 Summary of annotation of putative DNA methylation genes

Full size table

This adds to growing evidence that these genes are not conserved across Braconidae, as DNMT1 appears to be absent in several other braconid taxa, including Asobara tabida, A. japonica, and F. arisanus [108, 109]. However, DNMT1 is present in some braconids, including M. demolitor and Cotesia vestalis, and outside of Braconidae these genes are otherwise strongly conserved across insects [109]. In contrast, DNMT3, present here, is more often lost in insects [107].

This absence of DNMT1 helps to explain previous estimates of very low DNA methylation in A. ervi (0.5%) [108]. We confirmed these low levels of methylation in A. ervi by mapping previously generated bisulfite sequencing data (Supplemental Figure 30) [108] to our genome assembly. We aligned > 80% of their data (total 94.5Mbp, 625,765 reads). The sequence coverage of this mapped data was low: only 63,554 methylation-available cytosines were covered and only 1216 were represented by two or more mapped reads. Nonetheless, of these mapped cytosines, the vast majority (63,409) were never methylated, just 143 sites were always methylated, and two were variably methylated. Methylation-available cytosine classes were roughly equally distributed among three cytosine classes (CG: 0.154%, CHG: 0.179%, and CHH: 0.201%). This methylation rate is less than the 0.5% estimated by Bewick et al. 2017 [108] and confirms a near absence of DNA methylation in A. ervi. Given the parallel absence of DNMT1 in L. fabarum, it seems likely that both species sequenced here may have very low levels of DNA methylation, and that this is not a significant mechanism in these species.

This stark reduction in DNA methylation is interesting, given that epigenetic mechanisms are likely important to insect defenses, including possible responses to host endosymbiont-dependent mechanisms [110,111,112]. As with the immune pathways discussed above, this could reflect a loss that is an adaptive response to developing within endosymbiont-protected hosts. It is also interesting that while one epigenetic mechanism seems to be absent in both A. ervi and L. fabarum, we see an increase in histone variants in L. fabarum (based on the OMA analysis of gene family expansion), and these histones could function in gene regulation. However, whether there is a functional or causal link between these two observations is yet to be tested.

Discussion

We have used two new, high quality genome assemblies to investigate the basis of infectivity and specificity in two parasitoid wasps that infect aphids. Within this, we have found more predicted genes in A. ervi than in L. fabarum (Tables 1 and 2). Comparisons with other parasitoids suggest that the lower number of predicted genes in L. fabarum is more likely due to their loss than to a gene gain in A. ervi. However, it is important to recognize that predictive annotation is imperfect and any missing genes should be specifically screened with more rigorous methods. Importantly, we found relatively high BUSCO scores in the predicted genes, suggesting that our gene prediction was largely successful and was not impacted by the low GC-content. One contribution to the overall difference in gene numbers could be in the larger number of orphan genes that were identified in A. ervi. The evolutionary origin of these orphan genes is not known [113, 114], but their retention or evolution could be important to understanding specific functions or traits in this taxon.

The two genomes contained different patterns of predicted TE content. The spread of reported TE coverage in arthropods is quite large, even among Drosophila species (ca. 2.7–25%) [115], and variation in genome size has been broadly attributed to TE content [35, 116]. The variation we observe here suggests that differences in predicted TE content may be evolutionary quite labile, even within closely related species with the same genome size. However, this could also be a consequence of the assembly methods, and this should be further studied.