Introduction

The mitochondrion is a fundamental eukaryotic organelle. Energy production via oxidative phosphorylation is its most-studied function, though it also functions in apoptosis and cell aging1,2. A typical animal mitochondrial genome is a circular DNA molecule, approximately 16 kb long, with a relatively conserved gene content, usually containing 37 genes: 13 protein-coding genes (PCGs), two ribosomal RNA (rRNA) genes, and 22 transfer RNA (tRNA) genes, plus an A + T-rich region3,4. Compared with nuclear genomes, animal mitochondrial genomes are characterized by several distinct features, including cellular abundance, small genome size, conserved gene content and organization, lack of extensive recombination, maternal inheritance, and a high nucleotide substitution rate2,5,6. In particular, mitochondrial genomes can provide higher phylogenetic resolution than the shorter sequences of individual genes. Consequently, this small molecule has been widely used for phylogenetic analyses in many groups2,7,8,9,10,11,12.

Muscomorpha (Diptera: Brachycera), an infraorder of Brachycera, is a large and diverse group of flies, containing the bulk of the Brachycera. It includes a number of the most common flies, such as the housefly, the fruit fly, and the blowfly. The Muscomorpha can be separated into two sections, the Aschiza and the Schizophora9,13,14,15. Two large superfamilies, Platypezoidea and Syrphoidea were traditionally included in Aschiza. Phoridae and Syrphidae are the two largest families within these superfamilies, respectively. The Schizophora contain the majority of family level diversity among dipterans, and represent a recent rapid radiation of lineages14,16. Schizophora can be divided into two subsections, the Acalyptratae and Calyptratae, commonly referred to as acalyptrate muscoids and calyptrate muscoids, respectively15.

Hoverflies, also called flower flies, compose the insect family Syrphidae. This family contains almost 6,000 described species in 200 genera, and is nearly worldwide in distribution17,18. As the common names suggest, these flies are often seen hovering or nectaring at flowers. In contrast to the fairly uniform flower-feeding habits of adult syrphids, the larvae eat a wide range of foods. Traditionally, the Syrphidae has been divided into three subfamilies, the Eristalinae, Microdontinae, and Syrphinae. Larvae of the subfamily Eristalinae are saprophagous in dead wood, eating decaying plant and animal matter in soil, ponds, or streams19,20; whereas Microdontinae larvae are inquilines in ants’ nests21; and Syrphinae larvae are insectivores, preying on aphids, thrips, and other plant-sucking insects17,22,23. This three subfamily system (Microdontinae, Eristalinae, and Syrphinae) was adopted for Syrphidae more than 25 years ago18,23.

The marmalade hoverfly Episyrphus balteatus (Syrphidae) is a relatively small hoverfly (9–12 mm), widespread throughout the Palearctic ecozone, endemic to Europe, North Asia, and North Africa. Its color patterns may appear wasp-like to animals, such as birds, protecting it from predation24. Often exhibiting dense migratory swarm behavior, this, and the resemblance to wasps, may panic unaware people25. Its feeding habit is rare among adult flies, as it is capable of crushing pollen grains as a food source26. Eupeodes corollae (Syrphidae) is another common hoverfly species. The adults are often migratory with a worldwide distribution. It has been tested as a biological control agent for aphids and scale insects in greenhouses. However, the larval flies preferred the fruit in the experiment, consuming more fruit than aphids27.

As of June 2016, three Syrphidae mitochondrial genome sequences were deposited into GenBank: Ocyptamus sativus, Simosyrphus grandicornis and an unknown genus sp. Syrphidae (accession KM244713)9,16,28. Here, we sequenced the mitochondrial genomes from Episyrphus and Eupeodes, representing the first mitochondrial genomes reported from the genera Episyrphus and Eupeodes. Furthermore, we compared genome features and investigated phylogenetic relationships within Muscomorpha using complete mitochondrial genomes from GenBank along with our two newly sequenced mitochondrial genomes (Table 1).

Table 1 Mitochondrial genomes used in this study.

Results and Discussion

General features of the newly sequenced mitochondrial genomes

The complete Episyrphus mitochondrial genome sequence is 16,175 bp long (GenBank accession KU351241) (Table 2). The partial mitochondrial genome of Eupeodes mitochondrial genome sequence is 15,326 bp long (GenBank accession KU379658) (Table 3). Particularly A + T-rich region was failed to generate reliable sequence data in both species.

Table 2 Annotation of the Episyrphus balteatus mitochondrial genome.
Table 3 Annotation of the Eupeodes corollae mitochondrial genome.

No gene rearrangement was observed in our analyses: (1) as compared with the putative ancestral insect arrangement29, (2) as in all sequenced dipteran species11, (3) as with the 23 genes encoded on the majority strand (J-strand), and (4) as with the 14 genes encoded on the minority strand (N-strand).

Each of the 37 typical mitochondrial genes is present in both species. The mitochondrial genome of Episyrphus has 255 bp of intergenic nucleotides, in 22 different locations, with intergenic spacer lengths ranging from 1 to 60 bp. Seven pairs of genes overlap each other, with overlap lengths ranging from 1 to 7 bp. Eight pairs of genes directly adjacent one another including the pairs of rrnL-trnV and trnV-rrnS. The mitochondrial genome of Eupeodes has 230 bp of intergenic nucleotides, in 19 locations, with intergenic spacer lengths from 2 to 47 bp. Nine pairs of genes overlap each other, with overlap lengths ranging from 1 to 7 bp. Nine pairs of genes directly adjacent one another including the pairs of rrnL-trnV and trnV-rrnS. In both species, the longest intergenic spacer was located between trnK and trnD, followed by the one located between trnE and trnF. The longest overlapping regions were located between atp8 and atp6, nad4 and nad4l. The intergenic and overlapping region of these two species are similar to most other insect mitochondrial genomes with no gene rearrangement occurred, compared with those with frequent gene rearrangement30.

Base composition

Three parameters, the AT and GC asymmetries, known as AT-skew and GC-skew, and A + T content, are often used to characterize the nucleotide-compositional behavior of mitochondrial genomes31,32. The Episyrphus and Eupeodes mitochondrial genomes show a very strong bias in nucleotide composition (A + T% > G + C%), which is typical of insect mitochondrial genomes. The 16 Muscomorpha species we analyzed have PCG A + T contents between 71.10% (Bactrocera dorsalis) and 78.97% (Ocyptamus sativus); The mitochondrial genomes of Episyrphus and Eupeodes have A + T contents of 78.85% and 78.83%, respectively (Table 4).

Table 4 Nucleotide composition in regions of Muscomorpha mitochondrial genomes.

The PCG sequence AT-skews of the 16 Muscomorpha species are primarily negative (except in Episyrphus, O. sativus, Simosyrphus grandicornisand, and Haematobia irritans). All GC-skews are positive, indicating that the PCGs contain a higher percentage of T and C than A and G nucleotides (Table 4), as reported with most other insects32.

Protein-coding genes, codon usage and nucleotide diversity

Nine of the 13 mitochondrial PCGs in Episyrphus and Eupeodes mitochondrial genomes are located on the J-strand; the other four PCGs are located on the N-strand (Tables 1 and 2). Total PCG length in Episyrphus is 11,220 bp, while Eupeodes has 11,211 bp of PCG.

All Episyrphus and Eupeodes mitochondrial genome PCGs start with ATN codons. One, six, and six of the PCGs start with ATA, ATG, and ATT, respectively. Orthologs from the two species have the same start codons. Most PCG stop codons are the canonical TAA, except for nad5 in Eupeodes, which uses an incomplete TA.

Mitochondrial genome codon usage in Episyrphus and Eupeodes show a significant bias towards A and T (Fig. 1) as in other species of Muscomorpha (Figure S1). In the Episyrphus and Eupeodes mitochondrial genomes, Leu, Ile, Phe, and Met are the most frequently encoded amino acids, hence TTA (Leu), ATT (Ile), TTT (Phe), and ATA (Met) are the most frequent codons, as is typical of other insect mitochondrial genomes30,33,34. These frequently used codons exclusively consist of A and T, which contribute to the high A + T content seen in most fly mitochondrial genomes (Figure S1). This preferred codon usage is strongly reflected at third positions by high A/T versus G/C frequencies.

Figure 1: Relative synonymous codon usage (RSCU) of mitochondrial genomes in the two newly sequenced mitochondrial genomes of Episyrphus balteatus and Eupeodes corollae.
figure 1

The stop codon is not given.

Evolutionary rate of protein-coding genes was calculated by using the nucleotide diversity and Jukes and Cantor corrected nucleotide diversity within Muscomorpha. Among the 13 protein-coding genes, five genes of cox1, cox2, cox3, cob and nad1 showed relatively low level, five genes of atp6, nad6, nad4, nad4l and nad5 showed medial level, whereas three genes of atp8, nad2 and nad6 showed the highest level of nucleotide diversity (Fig. 2). Relative evolutionary rate among the 13 protein-coding genes in Muscomorpha was similar to previous studies of insect mitochondrial genomes35,36.

Figure 2: Nucleotide diversity of each protein-coding gene among Muscomorpha.
figure 2

Pi, nucleotide diversity; Pi(JC), Jukes and Cantor-corrected nucleotide diversity. Species used for calculation were listed in Table 1.

Transfer RNA and ribosomal RNA genes

All tRNA anticodons from the two newly sequenced mitochondrial genomes are identical to other Muscomorpha species (Tables 1 and 2). Of the 22 tRNA genes total, 14 are located on the J-strand, with the rest on the N-strand.

The Episyrphus mitochondrial genome contains 1,477 bp within tRNA genes, at an A + T content of 80.37%. Individual tRNAs range in length from 64 bp (trnR) to 72 bp (trnV). The Eupeodes mitochondrial genome contains 1,479 bp within tRNA genes, at an A + T content of 80.12%. Individual tRNAs range in length from 64 bp (trnR) to 72 bp (trnV).

Secondary structure models of the tRNA genes in the two newly sequenced mitochondrial genomes were predicted using the Mitos WebServer37 (Fig. 3). In Episyrphus and Eupeodes, all tRNA genes fold into the canonical clover-leaf structure. The dihydrouridine (DHU) arm of all the tRNAs is a large loop, instead of a conserved stem-and-loop structure; however, this is typical of metazoan mitochondrial genomes38. The amino acid acceptor (AA) stem and the anticodon (AC) loop are conserved at 7 bp in all of our tRNA genes. The size of the variable- and D-loop often determine overall tRNA length39. The DHU arms in our tRNAs are 2 to 4 bp long, the AC arms are 4 to 5 bp long, and the TΨC arms vary in length from 3 to 5 bp. The variable loops are less consistent, ranging from 4 to 8 bp.

Figure 3: Comparison on the secondary structure of the tRNA genes in Muscomorpha mitochondrial genomes.
figure 3

The secondary structures were draw from tRNA genes of Episyrphus balteatus. Variations of each sites in other 14 species of Muscomorpha were indicated near corresponding nucleotide. Each species was marked by a unique color as shown on the right bottom of the figure.

We also compared the variation of stem regions of tRNA genes among 15 species of Muscomorpha. Among the 22 tRNA genes, trnM was the most conserved one without any nucleotide variation in stem regions, followed by trnV and trnE with three site mutations. The trnC showed the highest number of site mutation on stem regions (17 sites), followed by the trnH (16 sites) (Fig. 2).

Base pairs other than canonical A-Us and C-Gs are occasionally used in our tRNAs, based on predicted tRNA secondary structures. We found six and five mismatched base pairs in the tRNAs from Episyrphus and Eupeodes, respectively. Among the six mismatched base pairs in Episyrphus, five are U-U pairs, located in the AA and TΨC stems; the other is an A-A pair, located in the A-A stem. Eupeodes has four U-U pairs, located in the AA and TΨC stems, and an A-A pair, located in the A-A stem.

The two ribosomal RNA genes in the mitochondrial genome, rrnL and rrnS, are 1,338 bp long, with an A + T content of 84.61%; and 804 bp long, with an A + T content of 83.96%, respectively, in Episyrphus. In Eupeodes, rrnL is 1334 bp long, with an A + T content of 84.78%; rrnS is 795 bp long, with an A + T content of 83.14% (Table 4).

Phylogenetic relationships

We reconstructed phylogenies within the Muscomorpha using the nucleotide sequences of the 37 mitochondrial genes. Bayesian and maximum likelihood (ML) methods estimated congruent topologies (Fig. 4). Our analyses supported the monophyly of all superfamilies used in the study. The Aschiza (lower Cyclorrhapha) was found to be a paraphilic group40. We included two superfamilies of Platypezoidea and Syrphoidea from Aschiza. Platypezoidea was sister to all other species of Muscomorpha, which is congruent with previous studies11,41. The five genera of Syrphidae (Syrphoidea) clustered as ((unknown Syrphidae sp.) + (Ocyptamus + (Eupeodes + (Episyrphus + Simosyrphus)))). Syrphoidea and Opomyzoidea formed a lineage, and then sister to the other species of Schizophora. The Opomyzoidea was traditionally considered as a superfamily of Schizophora, which was proved to be a monophyletic group40. We study showed that Schizophora was interrupted by Opomyzoidea, which might be caused by the long-branch of Opomyzoidea. In Opomyzoidea, we used one species from family Fergusoninidae, in which, all species are gall-forming flies together with Fergusobia (Tylenchida: Neotylenchidae) nematodes. The novel life history of the species from Fergusoninidae might affect the evolutionary pattern of their mitochondrial genomes2. The long-branch of Opomyzoidea was also found in previous study based on mitochondrial genome sequences41. Phylogenetic relationships of other groups included in our analyses, i.e. ((Sciomyzoidea + Tephritoidea) + (Ephydroidea + (Muscoidea + Oestroidea))), were in accord with previous studies9,13,14,15,16.

Figure 4: Phylogeny of Muscomorpha inferred from coding nucleotide sequences of the mitochondrial genome (13 PCGs, 22 tRNAs, and 2 rRNA genes), using the Bayesian and maximum likelihood methods.
figure 4

Numbers separated by “/” indicate the posterior probability and bootstrap values of the corresponding nodes (BI / ML). “*” indicate that the node was fully supported by both inferences (1/100).

Methods

Sampling and DNA extraction

The specimens were collected from Sichuan Province, China. Specimens were initially preserved in 100% ethanol in the field when collected, and then stored at −80 °C prior to DNA extraction. Whole genomic DNA was extracted from the legs and thorax of the specimens using a DNeasy tissue kit (Qiagen, Hilden, Germany), following the manufacturer’s protocols.

PCR amplification and sequencing

Initially we used a previously designed set of universal primers for insect mitochondrial genomes1,42 to amplify and sequence partial gene segments. Then we designed specific primers based on the sequenced segments to amplify regions that bridged the gaps between different segments (Table S1). PCR cycling consisted of an initial denaturation step at 96 °C for 3 min, followed by 40 cycles of denaturation at 95 °C for 30 s, annealing at 42–53 °C for 30 s, elongation at 60 °C for 1.5 kb/min (depending on the size of target amplicon), and a final elongation step at 60 °C for 10 min. PCR products were evaluated by agarose gel electrophoreses. PCR components were added following the Takara LA Taq protocols. A primer-walking strategy was used for all the amplifications from both strands (Table S1).

Mitochondrial genome annotation

Mitochondrion DNA sequences were assembled using Lasergene software (DNAStar, Inc., USA, NewYork). The tRNA genes were initially identified using the Mitos WebServer (http://mitos.bioinf.uni-leipzig.de/index.py)37. We set the genetic code to “Invertebrate Mito”. Those tRNAs that could not be found using this approach were confirmed by sequence alignment with their homologs from related species. Secondly, protein-coding genes were identified by BLAST searches in GenBank, using other published mitochondrial genomes from Syrphidae9,16,28. Finally, the rRNA genes and control regions were identified by the boundary of the tRNA genes, and by comparison with other insect mitochondrial genomes.

Comparative analysis of the mitochondrial genomes from Symphyta

We compared the mitochondrial genomes of 16 species from the Muscomorpha, including our two newly sequenced genomes. Gene arrangement, base composition, and PCG codon usage features were analyzed. Because several tRNA genes were not available for some species, we analyzed base composition using only the PCGs. Furthermore, the unknown Syrphidae sp. sequence lacked nad2 data; therefore, we excluded this species from these analyses.

We calculated base composition using MEGA643. The AT-skew and GC-skew were calculated according to Hassanin, et al.31: AT-skew = (A% − T%)/(A% + T%) and GC-skew = (G% − C%)/(G% + C%). The intergenic spacers and overlapping regions between genes were counted manually. The relative synonymous codon usage (RSCU) of all protein-coding genes was calculated using CodonW (written by John Peden, University of Nottingham, UK). Nucleotide diversity and Jukes and Cantor-corrected nucleotide diversity were calculated for species of Muscomorpha using DnaSP v544.

Phylogenetic analysis

We used 14 Muscomorpha species with published mitochondrial genomes, and our two newly sequenced mitochondrial genomes for phylogenetic analyses (Table 1). The 16 species are classified as belonging to two sections, Aschiza and Schizophora. We selected Aschiza sequences belonging to two superfamilies, Platypezoidea and Syrphoidea. Schizophora is classified as two subsections. We selected sequences from both subsections, and selected sequences from six superfamilies within them. Cydistomyia duplonotata and Trichophthalma punctata (Tabanomorpha: Tabanoidea: Tabanidae) were used as outgroups because of the close relationship between Tabanomorpha and Muscomorpha11.

MAFFT version 7.205, which implements consistency-based algorithms, was used for the alignment of protein-coding and RNA genes45. We used the G-INS-I and Q-INS-I algorithms in MAFFT46 for protein-coding and RNA alignment, respectively. The alignment of the nucleotide sequences was guided by the amino acid sequence alignment using the Perl script TranslatorX version 1.147.

Data partitioning, and the ability to apply specific models to different partitions, is ideal for analyzing mitochondrial genomes2. We used PartitionFinder version 1.1.148 to simultaneously confirm partition schemes and choose substitution models for the matrix. The DNA sequence search model was set to “mrbayes”. The greedy algorithm was used, with estimated, linked branch lengths, to search for the best-fit partitioning model.

We constructed phylogenies among the Muscomorpha with the Bayesian inference method (BI) using Mrbayes version 3.2.549, and the ML method using RAxML version 8.0.050. In BI, the GTR + I + G, GTR + G, HKY + I + G, and HKY + G models were used with corresponding partitions (Table S2). Four simultaneous Markov chains were run for 10 million generations, with tree sampling occurring every 1,000 generations, and a burn-in of 25% of the trees. We used the GTR + G model for each ML analysis. We conducted 200 ML runs to find the highest-likelihood tree, then analyzed 1,000 bootstrap replicates.

Additional Information

How to cite this article: Pu, D.-q. et al. Mitochondrial genomes of the hoverflies Episyrphus balteatus and Eupeodes corollae (Diptera: Syrphidae), with a phylogenetic analysis of Muscomorpha. Sci. Rep. 7, 44300; doi: 10.1038/srep44300 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.