Introduction

Animal mitochondrial genomes are about 16 Kb in size and contain 37 genes: 13 protein-coding genes, 22 transfer RNA genes (tRNA) and two ribosomal RNA genes (rRNA) [1, 2]. The genome is highly economized with few sections of noncoding DNA, intergenic regions, or repetitive sequences [3, 4], except for an A+T rich region, which contains essential regulatory elements for transcription and replication [5].

Gene arrangements are usually conserved within major lineages [2], but may be highly rearranged in certain groups [612]. Gene rearrangement events may serve as useful phylogenetic markers and models for evolutionary studies [1316]. In apocritan Hymenoptera, frequent gene rearrangements have been observed from broad examinations of gene segments [10, 17] and whole genome sequences [1822]. However, no informative arrangement pattern has been identified to date, for which there are two possible explanations: the one is that diversified gene arrangements have arisen independently among different hymenopteran lineages, and the other is that limited sampling is concealing potentially synapomorphic rearrangements. The apocritan lineage shows other extraordinary features in the mitochondrial genome, such as high A+T content [23, 24], diversified gene rearrangement events, and the involvement of recombination in gene rearrangement [17].

Evaniidae is proposed to be one of the most basal lineages in Hymenoptera [25, 26]. Presently, no complete mitochondrial genome has been sequenced from members of this family or its presumed sister groups, the Aulacidae and Gasteruptiidae. Here, we present the complete mitochondrial genome of Evania appendigaster (Hymenoptera: Evaniidae) and give a thorough description of its genome features in comparison to other hymenopteran species.

Materials and methods

DNA extraction, PCR amplification and sequencing

Total genomic DNA was extracted using the DNeasy tissue kit (Qiagen, Hilden, Germany) from a leg of an E. appendigaster adult.

A range of universal insect mitochondrial primers [27, 28] and hymenopteran mitochondrial primers were used to amplify the regions of cox1-cox2, cob-rrnL, rrnL-rrnS. Species-specific primers were designed based on sequenced fragments and combined in various ways to bridge the gap of cox2-cob and rrnS-cox1. Six fragments of 575–8626 bp were amplified, covering the whole mitochondrial genome (Table 1). The PCR and sequencing procedures followed the methods in Wei et al. [23].

Table 1 Primers used in this study

Genome annotation and secondary structure prediction

tRNA genes were initially identified using the tRNAscan-SE search server [29] with default parameters. Sequences longer than 100 bp between the identified tRNA genes were used as queries in BLAST searches in GenBank for identification of protein-coding and rRNA genes. The exact initiation and termination codons were identified in ClustalX version 2.0 [30] using reference sequences from other insects, following the criteria in Wei et al. [23]. Finally, the tRNA search was carried out again for the large intergenic regions using a reduced cutoff score. Twenty-one of the 22 typical animal mitochondrial tRNA genes were found using the previous steps, except for trnS2, which was identified by alignment. A+T content and codon usage were calculated using MEGA version 4.0 [31].

All tRNA secondary structures were predicted using the tRNAscan-SE search server [29] except for trnS2, which was predicted manually. rRNA structures were predicted by comparison and algorithm-based methods as in Wei et al. [23].

Results and discussion

Genome structure and base composition

The complete mitochondrial genome of E. appendigaster is 17,817 bp (GenBank accession No. FJ593187), which is among the largest animal mitochondrial genomes yet sequenced [1]. All of the 37 typical animal mitochondrial genes were identified (Fig. 1; Table 2).

Fig. 1
figure 1

Organization of Evania appendigaster mitochondrial genome. Gene abbreviations are as follows: cox1, cox2, and cox3 refer to the cytochrome oxidase subunits, cob refers to cytochrome b, nad1-nad6 refer to NADH dehydrogenase components, and rrnL and rrnS refer to ribosomal RNAs. Transfer RNA genes are denoted by one letter symbol according to the IPUC-IUB single-letter amino acid codes. L1, L2, S1 and S2 denote tRNA Leu(CUN), tRNA Leu(UUR), tRNA Ser(AGY) and tRNA Ser(UCN), respectively. AT indicates A+T-rich region. Gene names with lines indicate that the genes are coded on the minority strand while those without lines are on the majority strand

Table 2 Annotation of Evania appendigaster mitochondrial genome

There are in total 31 overlapping nucleotides between neighboring genes in nine locations and the length of overlapping sequence is 1–7 bp, while there are in total 943 bp intergenic nucleotides in 13 locations and the length of intergenic spacers is 1–534 bp, excluding the A+T-rich region (Table 2).

The A+T content of E. appendigaster mitochondrial genome are lower than all other sequenced hymenopteran species, and there are more A and C than T and G in the majority strand (Table 3). A higher A+T content was found in parasitic wasps (Apocrita) compared with nonparasitic wasps (Symphyta) in partial mitochondrial genes [24] and whole genome sequences [1820, 22, 32, 33].

Table 3 Base composition of hymenopteran mitochondrial genomes

Gene rearrangement

Gene arrangement of the E. appendigaster mitochondrial genome is similar to other apocritan species. Gene rearrangement events have been classified as translocation, local inversion (inverted in the local position), gene shuffling (local translocation) and remote inversion (translocated and inverted) [17]. Four tRNA genes are rearranged, which are remote inversions of trnW, trnC and trnS1 and gene shuffling of trnM (Fig. 1). Rearrangement of tRNA genes is common in the hymenopteran mitochondrial genome, especially those in tRNA clusters, such as in the junctions of A+T-rich region-nad2, nad2-cox1, cox2-atp8 and nad3-nad5 [10, 17, 23]. However, the rearrangements in the E. appendigaster mitochondrial genome are novel. In vertebrates, gene shuffling is the dominant gene rearrangement event [34], while in Hymenoptera, equal numbers of gene shuffling, inversion and translocation events have been observed at the cox2-atp8 junction [10]. In the E. appendigaster mitochondrial genome, remote inversion was found to be the dominant gene rearrangement event.

Gene shuffling is usually explained by the tandem duplication-random loss (TDRL) model [17, 35]. Evidence of the TDRL model includes a derived pattern of gene order, pseudogene and the position of intergenic spacer, the last two of which are the expected intermediate steps in changing mitochondrial gene order under this model. In the derived tRNA cluster between the A+T-rich region and nad2, all neighboring genes are overlapped or directly adjacent except for trnQ and nad2, where there is a 22 bp intergenic spacer (Table 2). Under the TDRL model, it is unlikely to randomly delete the duplicated or original genes to produce a pattern in which remnant adjacent genes overlap. Thus, it is unlikely that trnC and trnS1 were rearranged by TDRL, while it is possible that trnM was rearranged by tandem duplication of the trnI-trnQ-trnM cluster followed by deletion of trnI-trnQ and trnM in the two boundaries in an intermediate state before the insertion of trnC and trnS1. This region is located to one side of the A+T rich region that is thought to contain two replication origins [36], so an illicit-primer may be responsible for the duplication of the original tRNA cluster. The 22 bp intergenic spacer between trnQ and nad2 may be a remnant region after deletion of the second copy of trnM. Recombination may be involved in remote inversions and is the most plausible explanation for local inversions in apocritan mitochondrial genomes.

Protein-coding genes

The size of the protein-coding genes in the E. appendigaster mitochondrial genome is similar to their corresponding orthologs in other insects. The genes with the highest A+T content in the hymenopteran mitochondrial genome are usually nad6 or atp8. In E. appendigaster, the A+T content of atp8 is 69.1%, amongst the lowest ones, and this is the result of lower A+T content in the 3′ sequence of atp8.

All protein-coding genes start with ATN codons (two with ATA, four with ATT, one with ATC, and five with ATG) except for nad1, which uses TTG as start codon (Table 1). cox1 is usually found to use nonstandard start codons in insects, such as TCG, ACC, CGA, CTA, CCG and AAA [37, 38]. In E. appendigaster, cox1 uses the usual start codon ATG, 3 bp after the end of trnY, and the translated amino acid sequence aligned well with orthologs in other Hymenoptera. All examined species in Lepidoptera have been found to use R as the initial amino acid for cox1 [39], whereas in Hymenoptera all species uses the ATN start codon [18, 19, 2123, 32] except for Vanhornia eucnemidarum [20]. In E. appendigaster, three ATA lying in or 6 bp downstream from trnL1 are possible start codons for nad1. However, we proposed TTG directly after trnL1 as the start codon for nad1. This would minimize intergenic spacer and avoid overlapping between trnL1 and nad1 [37, 40]. We examined nad1 start codons in the 11 previously reported hymenopteran species, and the results revealed that either the intergenic spacers or the overlapping regions would be reduced in Perga condei [32], Vanhornia eucnemidarum [20] and three Nasonia species [21] if TTG is assigned as the start codon (Fig. 2). In Diadegma semiclausum [23], Polistes humilis [22] and three bee species [18, 33], no TTG codon is found near the initial region of nad1. In two vespid species, Abispa ephippium and Polistes humilis, trnL1 is rearranged and rrnL is left upstream nad1. In A. ephippium, a TTG codon is present 3 bp downstream the identified start codon ATA. Since there is no standard way to define the exact boundaries of rRNAs, the criteria of reducing intergenic spacer and overlapping region could not be applied to assign the start codon. In conclusion, our results suggest that TTG is a possible start codon for nad1 in Hymenoptera [37, 40, 41].

Fig. 2
figure 2

Determination of nad1 start codons in Evania appendigaster and other reported hymenopteran mitochondrial genomes. The box indicates the newly assigned start codons, and the shaded regions the previously assigned start codons. Sequences of tRNA are marked by solid lines, intergenic spacers by dotted lines and rrnL by dashed lines

Nine protein-coding genes use the termination codon TAA. Four protein-coding genes use incomplete stop codons: nad1 and nad2 use the truncated termination codon TA, and cox3 and nad4 use T, which is commonly reported in other invertebrates [18, 42]. The relative synonymous codon usage values show a biased use of A and T nucleotides in E. appendigaster (Table 4).

Table 4 Codon usage in Evania appendigaster mitochondrial genome

tRNA genes

The length of tRNAs ranges from 61 to 70 bp. All tRNA genes have a typical cloverleaf structure except for trnS2 (Fig. 3). trnS2 could not be identified and folded using conventional tRNA search methods such as tRNAscan-SE. We manually found the location of trnS2 by comparisons with those identified in other insects and then determined the exact boundaries according to the secondary structure folded by eye. The D-stem pairings in the DHU arm are absent in E. appendigaster trnS2, which has also been reported in other insects [6, 18, 37, 43] and the rest of Metazoa [44, 45]. Since this atypical trnS2 is common in Coleoptera, Sheffield et al. [37] built an updated covariance model for automated annotation, which also performs well in other insects.

Fig. 3
figure 3

Predicted secondary structures for the 22 typical tRNA genes of Evania appendigaster mitochondrial genome. Base-pairing is indicated as follows: Watson–Crick pairs by lines, wobble GU pairs by dots and other noncanonical pairs by circles

A total of 28 unmatched base pairs exist in the E. appendigaster mitochondrial tRNA secondary structures, 19 of which are G–U pairs, eight U–U and one A–A. The number of mismatches is relatively high in the E. appendigaster mitochondrial tRNAs compared with other insects, and even within Metazoa [46]. Mismatches in regions where the tRNA genes overlap with adjacent downstream genes could be corrected by 3′-RNA editing [4750]. The 5′-parts of tRNA accepter stems are also found in Acanthamoeba [51] and some fungi [52]. Of the 28 mismatches, only four in trnQ, trnR and trnS1 are located in the overlapping regions in the accepter stem, indicating that other mechanisms might be involved to escape the effects of Muller’s ratchet in the E. appendigaster mitochondrial genome [53].

trnS2 and trnK use abnormal anticodons TCT and TTT, respectively, which have been found to be correlated with gene rearrangement [23].

rRNA genes

rrnL has a length of 1274 bp, with an A+T content of 79.7%. rrnS has a length of 747 bp, with an A+T content of 76.0%. The gene sizes are normal, but the A+T contents are lower than their counterparts in other hymenopteran species.

Both rrnL and rrnS conform to the secondary structure models proposed for these genes from other insects [23, 39, 5456]. Forty-nine helices are present in E. appendigaster rrnL as in D. melanogaster [55] and A. mellifera [54], belonging to six domains (Fig. 4). H837 usually forms a long stem structure with a small loop in the terminal [23, 37, 54], but it forms a shorter stem and a larger loop in E. appendigaster as that in D. melanogaster [55]. The deduced structures of H2347 and H2520 are variable [54, 57, 58], but in E. appendigaster they are more similar to those from A. mellifera by Gillespie et al. (2006) than those from other insects [57, 58].

Fig. 4
figure 4

Predicted rrnL secondary structure in Evania appendigaster mitochondrial genome. Tertiary interactions and base triples are shown connected by continuous lines. A 5′ half of rrnL; B 3′ half of rrnL. Symbols for base-pairings are as in Fig. 2

The secondary structure of rrnS contains 29 helices present in D. virilis [56] and A. mellifera [54], belonging to three domains (Fig. 5). Helix H39 could not be predicted, where a circle was formed by H27, H47, H367 and H500, and the sequences in between. Helix 47 is variable among different lepidopteran species, but the terminal portion of this stem is conserved [37], and in E. appendigaster, two loops were formed similar to D. virilis but different from two other hymenopteran species, D. semiclausum and A. mellifera, where a larger loop is present. H673 is well conserved in moths, where one stem with a bulge in the terminal is present [39, 59], and in E. appendigaster, two stem-loop structures are present as in D. virilis [56] and D. semiclausum [23], but different from that in A. mellifera [54], in which this structure is similar to moths. The structure of H1074 has been discussed in honey bee [54, 60, 61], and our predicted structure in E. appendigaster is consistent with that of Page (2000) and Gillespie et al. (2006).

Fig. 5
figure 5

Predicted rrnS secondary structure in Evania appendigaster mitochondrial genome. Symbols are as in Fig. 3

Non-coding regions

One of the most interesting features in the E. appendigaster mitochondrial genome is the presence of five major non-coding regions of more than 20 bp: spacer 1 is 22 bp between trnQ and nad2, spacer 2 is 534 bp between trnK and trnD, spacer 3 is 244 bp between atp8 and atp6, spacer 4 is 94 bp between cob and nad1, and spacer 5 is 2325 bp between rrnS-trnW and trnC-trnM-trnI. Long intergenic spacers have been identified in several insect mitochondrial genomes [18, 20, 23, 40, 62, 63]. Although intergenic spacers appeared to be unique to individual species [37], conserved motifs have been found across all insects, and are proposed to be associated with mtTERM [37, 39, 64].

Spacer 1 shows limited conservation among hymenopteran species which possess it. In Hymenoptera, the tRNAs directly upstream nad2 are variable because of frequent gene rearrangements of the tRNAs between A+T-rich region and nad2 [23], therefore this spacer is unlikely to have any function in translation or transcription. However, we suggested that it is the product of gene rearrangement as in that in D. semiclausum [23]. Spacer 2 has an A+T content of 96.8%, composed of seven tandem repeat units “GTAATTTTAT”, twelve “AATAATAATATT”, eight “AATAATAATATTAAT”, an initial sequence “TTATTAATAAACCTTAAATTAAAAATTAATTA”, and a terminal sequence “AATAATAATAT(TAA)8(TA)33AT”. Spacer 3 has an A+T content of 76.2% and contains no repeat sequence although it is 224 bp long. As far as we know, no intergenic nucleotides between atp6 and atp8 have been found in the previously reported insect mitochondrial genomes, and furthermore, it is a common feature of metazoan mitochondrial genomes that atp8 and atp6 overlap [65]. It has been proposed that the secondary structure of the transcribed mRNA may facilitate cleavage between the abutting proteins [38, 66, 67]. We could map secondary structure as those in other insects [38, 46] (Fig. 6), which indicated that the presence of spacer 3 in E. appendigaster would not affect the cleavage of atp6 and atp8. Spacer 4 was found in another six hymenopteran species (Fig. 7). This intergenic spacer region may correspond to the binding site of mtTERM, a transcription attenuation factor [64], as evidenced by a 7 bp motif (ATACTAA) conserved across Lepidoptera [39] and a 5 bp (TACTA) motif conserved across Coleoptera [37]. In Hymenoptera, we found a 6 bp conserved motif (THACWW), which shows high similarity to those in Lepidoptera and Coleoptera. In P. condei and D. semiclausum, although there is only a 2 bp intergenic spacer and a 7 bp overlapping region between trnS1 and nad1, respectively, we could still find conserved motifs in both species nearby regions between trnS1 and nad1. This may indicate wrong annotations of this region in both genomes, or the existence of the motif within genes. Spacer 5 is proposed as the A+T-rich region because of its location between rrnS-trnW and trnC-trnM-trnI and high A+T content (85.6%). It is one of the longest A+T-rich regions in the sequenced insect mitochondrial genomes [23, 68]. Twenty-three tandemly arranged units of “GTCATTATTTAATATAAAATA” are present in the middle of the A+T-rich region. This region, characterized by five elements [2, 5], is believed to function in the initiation of replication and control of transcription. However, these elements in the E. appendigaster mitochondrial genome are not arranged in the conserved pattern.

Fig. 6
figure 6

mRNA loops for genes atp8-atp6 in Evania appendigaster mitochondrial genome. The box indicates start codon of atp6

Fig. 7
figure 7

Alignment of the intergenic spacers between trnS1 and nad1 (Spacer 4) across Evania appendigaster and other reported hymenopteran mitochondrial genomes. Shaded region indicates the conserved motif and some of these unconserved intergenic nucleotides are replaced by dots