DNA barcoding and tree-based identification
After amplifying the COI gene-specific sequence of eight individuals from the four species, NJ-tree analysis based on the Kimura 2 Parameter distance (K2P) revealed the distinctive difference in COI sequences between the seven groups and estimated the intergeneric and intraspecific sequence divergences.
Based on COI sequence identification, the NJ tree revealed the unique lineage of these individuals, and the clustering information clarified the differences and similarities in the molecular sequences (Fig. 1 and Additional file 1). Seven different wasps were clearly distinguished. Notably, V. analis fabricius 1 and V. analis fabricius 7 were the factors that contribute to the group sequence variation of the other six unanimous individuals, indicating the occurrence of probable mutation or evolution process in this species (V. analis fabricius). Therefore, DNA barcoding could possibly be applied for the identification of wasps with similar or unknown characteristics based on the COI sequence identification. The results also indicated that these species were distinct and could be used for subsequent comparison studies.
RNA-Seq and de novo assembly of wasp transcriptome
The cDNA libraries from the venom glands of 12 wasp individuals were sequenced using the Illumina platform. 452,427,244 clean and high-quality reads were obtained by deleting redundant transcripts, and the filtering rates of the sequencing reads ranged from 87.75 to 91.70% (Additional file 2). The clean and high-quality reads of RNA-Seq from the four wasp species were assembled into 127,629 contigs corresponding to 323,495,099 base pairs (bp) in total (Table 1). The maximum contig length was 28,994 bp, and the minimum was 301 bp, with an average length of 2534 bp and an N50 value of 3163 bp (Table 1). In addition, the number of contigs differed across the four species, ranging from 65,229 to 76,458, where the highest number was detected in V. mandarinia smith, possibly indicating more genome information (Table 1).
Table 1 The statistics of the sequencing data after quality trimming Coding sequence domain prediction
The open reading frame (ORF) and coding domain sequence (CDS) of the wasps were predicted using the sequence information and reference structures obtained from ORFfinder. In all, 3,557,399 CDSs were predicted and clustered, including different types of ORFs (Additional file 3).
Homology-based annotation of transcripts
The unigenes from the four different wasps were compared to the Flybase, KEGG, KOG, nr, Swiss-Prot, and Tox-Prot databases using BLASTX (E-value < 10− 5), and the results showed that 374 unigenes were annotated in all of these databases (Fig. 2a). Furthermore, for individual wasp species, V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith. V. mandarinia smith had 304, 316, 315 and 332 unigenes annotated into all databases, respectively (Additional file 4). In the nr database, the species of the annotated homologous sequences of V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith were mainly Polistes dominula (more than 90%), Nasonia vitripennis and Vespa affinis (Fig. 2b). In the Swiss-Prot database, the species hits of the annotated homologous sequences of V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith were mainly Homo sapiens, Drosophila melanogaster, Mus musculus and Rattus norvegicus (Fig. 2c). Moreover, in the Tox-Prot database, the species of the annotated homologous sequences of V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith were mainly Latrodectus tredecimguttatus, Bungarus fasciatus, Bombus ignitus and Scolopendra subspinipes dehaani (Fig. 2d). These results indicated that the unigenes of the four different wasps (V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith) were annotated in the nr, Swiss-Prot and Tox-Prot database to obtain the similar species information.
We further plotted the classification of four species of wasp’s venom toxins by using a blastx search for Tox-Prot database (Fig. 3). The results showed that V. velutina group and V. analis fabricius group had similar classification of toxins, mainly composed of Factor V activator RVV-V alpha, Scoloptoxin SSD076, Venom serine protease Bi-VSP, Probable phospholipase A1 magnifin and Thrombin-like enzyme flavoxobin (Fig. 3a, b). Venom serine protease Bi-VSP, Acetylcholinesterase, Scoloptoxin SSD976, Probable phospholipase A1 magnifin, and Alpha-latrocrustotoxin-Lt1a (Fragment) accounted for a high proportion in the V. tropica ducalis group (Fig. 3c). In the V. mandarinia smith groups, the main annotated proteins were Acetylcholinesterase, Scoloptoxin SSD976, Factor V activator RVV-V alpha, Probable phospholipase A1 magnifin, and Venom serine protease Bi-VSP (Fig. 3d). These results indicated that the species and proportion of toxins contained in the four venom glands were different and may vary from species to species.
GO enrichment analysis
The GO enrichment of unigenes of V. velutina group showed that 136 terms were enriched and contained 69 terms in BP, 38 in MF, and 29 CC (Additional file 5). As shown in Fig. 4a, cilium organization and cilium assemble were significantly enriched in BP. Axoneme, ciliary part, ciliary plasm, plasma membrane bounded cell projection cytoplasm, centrosome and axoneme part were terms significantly enriched in CC while metallopeptidase activity, metalloendopeptidase activity, endopeptidase activity, Rho GTPase binding, and Rho guanyl-nucleotide exchange factor activity were significantly enriched in MF terms (Additional file 5).
The GO enrichment of unigenes of V. analis fabricius group showed that 136 terms composed of 74 terms in BP, 36 in MF, and 26 in CC were enriched (Additional file 6). As shown in Fig. 4b, cilium organization and cilium assemble were significantly enriched in BP; ciliary part, axoneme, ciliary plasm, and plasma membrane bounded cell projection cytoplasm were significantly enriched in CC; and metallopeptidase activity, metalloendopeptidase activity, Rho GTPase binding, and Rho guanyl-nucleotide exchange factor activity were significantly enriched in MF (Additional file 6).
The GO enrichment of unigenes of V. tropica ducalis group showed that 136 terms were classified as BP (70 terms), MF (39 terms), and CC (17 terms) (Additional file 7). As shown in Fig. 4c, cilium organization and cilium assemble were significantly enriched in BP; axoneme, ciliary part, ciliary plasm, and plasma membrane bounded cell projection cytoplasm were significant enriched in CC; and metallopeptidase activity and metalloendopeptidase activity were significant enriched in MF (Additional file 7).
The GO enrichment of unigenes of V. mandarinia smith group showed that 166 terms were enriched and could be classified as BP (88 terms), MF (43 terms), and CC (35 terms) (Additional file 8). As shown in Fig. 4d, dolichol-linked oligosaccharide biosynthetic process, oligosaccharide-lipid intermediate biosynthetic process, and DNA integrity checkpoint were significantly enriched in BP. Axoneme, ciliary plasm, ciliary part and CCR4-NOT complex were significantly enriched in CC while metallopeptidase activity, metalloendopeptidase activity, Rho guanyl-nucleotide exchange factor activity, Rho GTPase binding, guanyl-nucleotide exchange factor activity, ATPase activity, coupled, and ATPase activity were significantly enriched in MF (Additional file 8).
Through the Venn diagram we found that 1608 unigenes were common to the four species of wasp (V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith) (Fig. 5a). Additionally, as shown in Fig. 5a, the specific unigenes detected in V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith were 990, 981, 297 and 5141, respectively (Fig. 5a). Among them, V. mandarinia smith had the most unique unigenes, indicating that V. mandarinia smith may have more genomic information than the other three species of wasps.
We further carried out GO enrichment analysis on the shared and specific unigenes of the four species of wasp. The GO enrichment of unigenes shared by the four species of wasp showed that 1089 GO terms (904 in BP, 74 in MF, and 111 CC) could be enriched (Additional file 9). As shown in Fig. 5b, epithelial cell differentiation, epithelial cell development, wing disc development, ovarian follicle cell development and columnar/cuboidal epithelial cell development were terms significantly enriched in BP; apical part of cell, cell junction, neuron projection, and cell cortex were terms significantly enriched in CC; heme binding, tetrapyrrole binding, cofactor binding and iron ion binding were significantly enriched in MF (Additional file 9).
GO enrichment analysis of unigenes specific to each of the four wasp species showed that only V. velutina and V. mandarinia smith groups had enrichment data. The GO enrichment of unigenes specific to V. velutina showed that 45 GO terms were enriched and included 33 terms in BP, 8 in MF, and 4 in CC (Additional file 10). As shown in Fig. 5c, negative regulation of gene silencing by RNA, regulation of neuronal synaptic plasticity, regulation of gene silencing by miRNA, regulation of posttranscriptional gene silencing, and production of miRNAs involved in gene silencing by miRNA were significantly enriched in BP while plasma membrane protein complex, cell cortex, cytoplasmic region, and synapse were terms significantly enriched in CC. Terms such as structural constituent of muscle, O-methyltransferase activity, RNA methyltransferase activity and protein domain specific binding were those significantly enriched in MF (Additional file 10).
The GO enrichment of unigenes specific to V. mandarinia smith showed that 30 terms (27 in BP and 3 in MF) were enriched (Additional file 11). As shown in Fig. 5d, segmentation, regulation of lipid storage, blastoderm segmentation, regulation of lipid localization, and compound eye morphogenesis were significant enriched in BP; RNA polymerase II-specific, DNA-binding transcription factor activity, and NAD+ kinase activity were significantly enriched in MF (Additional file 11).
SSR and SNP analysis
SSR analysis was performed on the four species of wasp transcriptome samples using the MIcroSAtellite identification tool (MISA) software. The results showed that 195,330, 193,994, 195,152 and 196,691 SSRs were detected in the V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith transcriptome samples, respectively, for a total of 8 SSR types, namely c, c*, p1, p2, p3, p4, p5 and p6 (Fig. 6a). Among them, c-type, p1 type, p2 type and p3 type were the most commonly found SSR types in the transcriptome of the four wasp species.
SNP variants were analyzed separately according to the species, and these mutations could result in synonymous, nonsynonymous and even some frame-shift in transcription process. The results showed that 269,585 (198,185 for transition and 71,400 for transversion), 254,863 (184,699 for transition and 70,164 for transversion), 266,084 (190,664 for transition and 75,420 for transversion) and 253,901 (187,321 for transition and 66,580 for transversion) SNPs were identified in the V. velutina, V. analis fabricius, V. tropica ducalis and V. mandarinia smith transcriptomes, respectively. As shown in Fig. 6b, the number of transition types (C- > T, A- > G, G- > A and T- > C) was significantly higher than the transversion types (A- > T, G- > T, T- > A, C- > G, A- > G, C- > A, T- > G and G- > C), where C- > T type was the most abundant of all four species of wasp samples. There were differences in the distribution of SNPs among the four species of wasp, and V. tropica ducalis had a higher C- > T type whereas V. velutina had a higher T- > C type.