Background & Summary

Microplitis manilae Ahmead (Hymenoptera: Braconidae: Microgastrinae) is a solitary endoparasitoid wasp and is primarily distributed in the Asia-Pacific region1. It attacks several lepidopteran species, with Spodoptera species being its preferred target, including S. frugiperda, S. exigua and S. litura2, which of them are the world’s most significant agricultural pests3. M. manilae is thought to be an ideal biological control agent for Spodoptera spp.

So far, it has approximately 200 species have been recognized within Microplitis1, and some of them, i.e. M. croceipes, M. demolitor and M. mediator, have been widely used in biological pest control4,5,6. The virulence factors of Microplitis wasps that act to suppress or circumvent host immunity are primarily composed of polydnaviruses (PDV), venom, and teratocytes7,8. In recent years, the biology, ecology, and interaction with the host of Microplitis have been studied9,10. The study of the interactions between parasitoid wasps and their host insects, particularly the regulation of host immunity and development by parasitoid wasps, has great potential for increasing the use of parasitoid wasps in sustainable pest management in agriculture. To further understand the complex relationship between parasitoids and their hosts, high quality genome data would play an important role. The genome at the chromosome level may shed light on the evolution of parasites, the mechanisms of parasitism, the potential for developing new techniques for biological control and utilizing natural enemies as resources. However, only two fragmented genomes from the genus Microplitis (M. demolitor and M. mediator) are currently available in NCBI and a chromosome-level genome assembly for Microplitis spp. has not been published.

In this study, we used MGI short-read, ONT long-read and Hi-C sequencing technologies to assemble the M. manilae chromosome-level genome. The final genome size was 282.85 Mb with a scaffold N50 length of 25.23 Mb, and 268.17 Mb assembled genome sequences were successfully anchored on 11 chromosomes. In total, 15,689 protein-coding genes were identified, and 13,580 of them were functionally annotated.

Methods

Insect collection and rearing

The wasps Microplitis manilae were collected from maize fields in Dongfang City, Hainan Province, China (18.86°N, 108.72°E) in November 2020 and reared using their host Spodoptera frugiperda under laboratory conditions of 26 ± 1 °C, 65 ± 5% RH, and a 14 h light: 10 h dark photoperiod.

Sequencing

The extraction of DNA and RNA was performed on newly emerged male individuals that had been raised for five or more generations. Genomic DNA was obtained using the Blood & Cell Culture DNA Mini Kit (Qiagen, Hilden, Germany) for both long-read and short-read whole genome sequencing. RNA was isolated using the TRlzol reagent (Vazyme, Nanjing, China). The Hi-C library was generated using the restriction endonuclease DpnII. Long-read sequencing was carried out using the Nanopore PromethION platform (Oxford Nanopore Technologies, UK), with an insert size of approximately 20 kb. Short-read and transcriptome sequencing were performed using libraries with an insert size of 350 bp and sequenced on the MGISEQ. 2000 platform. The total data generated from the long-read sequencing was 76.31 Gb, while the total data generated from the short-read sequencing was 82.60 Gb (Table 1).

Table 1 Statistics of the DNA/RNA sequence data used for genome assembly.

Genome size estimation and assembly

The raw reads obtained from the MGISEQ. 2000 platform were subjected to quality control using fastp v0.21.011 to filter adapter sequences and low-quality reads. The remaining reads of MGI library were then used to estimate the genome size of M. manilae by GenomeScope v1.0.012 and analyze the 17-mer distribution with Jellyfish v2.3.013. The final genome size was estimated to be 297.29 Mb through K-mer analysis.

The draft genome is obtained by first assembling long reads and then polishing the results with short reads, which has been widely used in genome assembly research for different organisms recently14,15,16,17,18. NextDenovo v2.5.0 (https://github.com/Nextomics/NextDenovo) was used to assemble the initial assembly with ONT sequences. NextPolish v1.4.019 was then applied to polish the draft genome assembly using MGISEQ sequences. Juicer v1.6.220 was used to align Hi-C reads to the draft assembly and subject them to quality control. 3D-DNA21 was used to anchor primary contigs into chromosomes, then corrected the possible errors manually with Juicebox v1.11.0822. The final genome assembly of M. manilae was 282.85 Mb with a scaffold N50 of 25.23 Mb. The Hi-C analyses scaffolded 11 pseudomolecules (Fig. 1), anchoring 94.81% (268.17 Mb) of the genome assembly of M. manilae. The average GC content of M. manilae genome assembly was 31.26% (Table 2, Fig. 2).

Fig. 1
figure 1

Heat map of Hi-C assembly of Microplitis manilae. The scale bar represents the interaction frequency of Hi-C links.

Table 2 Summary statistics of the Microplitis manilae genome assembly.
Fig. 2
figure 2

Genome characteristics of Microplitis manilae. (1) Pseudo-chromosomes; (2) gene distribution; (3) GC content; (4) repeat distribution; (5) ncRNA distribution.

The genome completeness was evaluated with the BUSCO v4.1.4 pipeline23, searching against the insect_odb10 database24. The analysis identified 98.61% (single-copied genes: 97.88%, duplicated genes: 0.73%), 0.44%, and 0.95% of the 1,367 predicted genes in this genome as complete, fragmented, and missing sequences, respectively. These results suggested the assembled genome is highly complete.

Genome annotation

The genome of M. manilae was annotated for repetitive elements, non-coding RNAs (ncRNAs), and protein-coding genes (PCGs). The Extensive de novo TE Annotator (EDTA) pipeline25 was used to build TE libraries for repeat annotation initially. Non-LTR retrotransposons and any unclassified TEs missed by the TE annotators mentioned above were then identified by RepeatModeler v2.0.226. A comprehensive non-redundant TE library was generated combining with above results and Dfam3.227. RepeatMasker v4.1.2 (http://www.repeatmasker.org) was subsequently used to search for known and novel TEs. In the genomic sequences, a total of 152.37 Mb repetitive elements were identified, constituting 53.87% of the total. The most abundant repeating element was DNA transposons (13.54%), followed by long terminal repeats (LTR, 10.43%) and long interspersed nuclear elements (LINEs, 1.75%), while unclassified repeats made up 27.43% of the total (Table 3, Fig. 2). Infernal 1.1.228 was used to identify rRNAs, snRNAs, and miRNAs based on the alignment with the Rfam library29. tRNAscan-SE v2.0.630 was used to predict tRNAs. Finally, 1,894 noncoding RNAs were predicted, including 1,269 transfer RNAs (tRNAs), 194 ribosomal RNAs (rRNAs), 74 micro-RNAs (miRNAs), 63 small nuclear RNAs (snRNAs), and 294 others (Table S1, Fig. 2).

Table 3 Statistics of repetitive elements in the Microplitis manilae genome.

Three different strategies were applied for the annotation of PCGs: transcriptome-based prediction, de novo gene prediction, and homology-based prediction. In transcriptome-based prediction, the transcriptome was assembled from RNA-seq alignments by HISAT2 v2.2.131 and the candidate coding region was identified by PASA pipeline v2.4.1 (https://github.com/PASApipeline/PASApipeline). The repeat-masked genome was analyzed using AUGUSTUS v3.3.332 and SNAP v2006-07-2833 for de novo gene prediction. The protein sequences of hymenopteran species were downloaded from the NCBI Database as references for homology-based prediction. Exonerate v2.4.034 was utilized to align the reference proteins to the genome assembly and predict gene structures. Finally, a consensus gene set was created by integrating the genes predicted by the aforementioned three methods using EVidenceModeler v1.1.135. We predicted 15,689 protein-coding genes for the M. manilae genome by combining the evidences from the transcriptome, ab initio, and homology-based predictions. The average length of the predicted gene was 8,718 base pairs, while that of a protein-coding region was 1,575 bp. Exon and intron lengths on average were 319 and 1,814 bp, respectively. There were 4.9 exons on average per gene (Table 4).

Table 4 Statistics of gene structure annotation in the Microplitis manilae genome.

Gene functions were annotated using BLASTP v2.9.036 (-evalue 1e-5) to search against UniProtKB (Swiss-Prot + TrEMBL)37, and InterProScan 5.52-86.038 to search against the Pfam39, CDD40, Gene3D41, Smart42, and Superfamily43 databases. The eggnog-mapper v2.1.444 was used to predict conserved sequences and domains, GO terms, and KEGG pathways against the eggnog v5.0 database45. A total of 13,580 (86.56%) genes were functionally annotated against the UniProtKB database. In integrating with InterProScan and eggnog annotation results, 13,227 (84.31%) protein-coding genes with protein domains were identified, which were assigned 11,276 COG Functional Categories genes, 9,489 Reactome pathways, 7,819 MetaCyc, 7,722 GO terms, 7,324 KEGG KO terms, and 4,274 KEGG pathways, respectively.

Data Records

The MGI, ONT, RNA-seq and Hi-C sequencing data used for the genome assembly have been deposited in the NCBI Sequence Read Archive (SRA) database with accession numbers SRR2135882846, SRR2135882747, SRR2135882948 and SRR2135882649, respectively, under the BioProject accession number PRJNA872950. The chromosomal assembly has been deposited at GenBank with accession number JAPFQK00000000050. Genome annotation information has been deposited in the Figshare database51.

Technical Validation

Evaluating the quality of the genome assembly

The quality of M. manilae genome assembly was evaluated using two approaches. Firstly, sequencing data were mapped to the genome to verify the accuracy, yielding mapping rates of 99.52% for MGI, 94.40% for RNA-seq, and 98.52% for ONT data. Secondly, BUSCO analysis found 98.6% of the 1,367 single-copy orthologues (in the insects_odb10 database) to be complete (97.9% single-copied genes and 0.7% duplicated genes), 0.4% fragmented, and 1.0% missing.

Chromosome synteny

Chromosome synteny between M. manilae and Cotesia congregata was detected by MCScanX52 with default parameters. The genome assembly of C. congregata53 was retrieved from NCBI with accession number GCA_905319865.3. The visual diagram was generated using TBtools54. The synteny of the M. manilae assembly was compared to that of C. congregata, a closely related species of the subfamily Microgastrinae. The results showed a low level of synteny between M. manilae and C. congregata (Fig. 3). A number of fusion and fission events were detected between these two wasps. For instance, Chr11 and a part of Chr5 of M. manilae were syntenic to Chr4 of C. congregata, whereas Chr1 of M. manilae was syntenic to a portion of Chr2 and Chr3 of C. congregata. Low genome synteny was also identified between Nasonia vitripennis and Pteromalus puparum, both of which are members of the same family Pteromalidae55.

Fig. 3
figure 3

Chromosomal synteny between Microplitis manilae and Cotesia congregata genomes.

Gene annotation validation

OrthoFinder v2.5.456 was utilized to infer sequence orthology, based on protein annotation sequences of 11 additional hymenopteran organisms retrieved from NCBI, including Apis mellifera, Athalia rosae, Bombus terrestris, Chelonus insularis, Diachasma alloeum, Fopius arisanus, M. demolitor, Nasonia vitripennis, Orussus abietinus, Polistes dominula, and Venturia canescens (Table S2). A total of 132,122 genes were assigned to 12,544 gene families. Among them, 4,910 gene families were presented in all the species genomes, with 3,780 single-copy and 1,130 multicopy gene families. In the 15,689 predicted genes of M. manilae, 14,822 (94.47%) were grouped into 9,725 families. There were 1,295 genes in 241 families unique to M. manilae (Fig. 4, Table S3).

Fig. 4
figure 4

Distribution of genes in different Hymenoptera species. “1:1:1” represents shared single-copy genes, “N:N:N” as multicopy genes shared by all species, “others” as unclassified orthologs, “unassigned” as orthologs which cannot be assigned into any gene families (orthogroups).

All single-copy protein sequences were concatenated into one data matrix after being aligned with MAFFT v7.42757. The phylogenetic tree was constructed using IQ-TREE v2.0.558 with the best model (JTT + F + R7) estimated by ModelFinder59. Statistical support for the phylogenetic trees was evaluated by Ultrafast bootstrap60 analysis using 1000 replicates. The phylogenetic tree reconstructed by IQ-TREE had high bootstrap support values. The topology of the phylogeny was consistent with that of the previous study61. The MCMCTree package in PAML v4.9j62 was used to estimate divergence times. Based on a previous study, five calibration time points were used: root holometabolous: <300 million years ago (mya); Orussoidea + Apocrita: 211–289 mya; Apocrita: 203–276 mya; Aculeata: 160–224 mya; and Ichneumonoidea: 151–218 mya61. As expected, our analysis revealed that M. manilae was closely related to M. demolitor and these two species diverged approximately 7.6 mya (Fig. 5). CAFE v4.2.163 was used to estimate gene family expansions and contractions with a p value of 0.01. Finally, we found 615 and 635 gene families experienced expansions and contractions in M. manilae, respectively, and 395 (310 expanded and 85 contracted) of them were rapidly evolved (Fig. 5).

Fig. 5
figure 5

Phylogenetic and gene family evolution analyses of Microplitis manilae and 11 other Hymenoptera species. The bootstrap values of all nodes are supported at 100/100. Node values indicate the number of gene families showing expansion, contraction, and rapid evolution. The scale at the bottom of the figure represents the divergence time.