Background

The Orthoptera is one of the oldest extant insect lineages, with fossils first appearing in the Upper Carboniferous (290 Mya) (Sharov 1968; Grimaldi and Engel 2005). The monophyly of this order was supported by both morphological and molecular data (Jost and Shaw 2006; Fenn et al. 2008 Ma et al. 2009). It is one of the largest and best researched of the hemimetabolous insect orders and consists of two suborders, the Caelifera (Acidoidea or Acrydoidea) and Ensifera (Tettigoniedea) (Handlirsh 1930; Ander 1939), which are widely accepted by most researchers. With regard to mid-level Caeliferan or Ensiferan relationships (among superfamilies, families, or subfamilies, respectively), there are a few hypotheses based on morphological and molecular data (Flook and Rowell 1997a 1997b 1998; Flook et al. 1999 2000; Fenn et al. 2008; Ma et al. 2009; Eades and Otte 2010; Sun et al. 2010; Zhao et al. 2010 2011). The lack of a consensus as to the phylogeny based only on morphologies makes it especially critical to use DNA data from highly polymorphic genetic markers such as mitogenomic sequences.

Mitogenomes of insects are typically small double-stranded circular molecules. They range in size from 14 to 19 kb and encode 37 genes (Wolstenholme 1992; Boore 1999). For the past 2 decades, mitogenomic data have been widely regarded as effective molecular markers of choice for both population and evolutionary studies of insects. The mitogenome is one of the most information-rich markers in phylogenetics and has extensively been used for studying phylogenetic relationships at different taxonomic levels (Ingman et al. 2000; Nardi et al. 2003). The utility of mitogenomic data may provide new insights into systematics within the Orthoptera.

Different DNA datasets from mitogenomes vary in the degree of phylogenetic usefulness. Protein-coding genes appear to be suited for studies of relationships among closely related species, because unconstrained sites (at the third codon position) in protein-coding genes and information from studies of amino acid substitutions in rapidly evolving genes may help decipher close relationships. In mitochondrial genes, phylogenetic trees based on ribosomal (r)RNA sequences can simultaneously reveal the evolutionary descent of nuclear and mitogenomes, because they are the only ones that are encoded by all organellar genomes and by nuclear and prokaryotic genomes (Gray 1989). The highly conserved regions of rRNA genes may be useful for deep levels of divergence (Simon et al. 1994). Formerly, the A + T-rich region in the mitogenome was rarely used in constructing phylogenies due to its high adenine and thymine contents and high variability (Zhang et al. 1995 Zhang and Hewitt 1997). However, Zhao et al. (2011) verified that the sequence of the conserved stem-loop secondary structure in this region discovered by Zhang et al. (1995) provides good resolution at the intra-subfamily level within the Caelifera.

Phylogenetic analyses have generally been performed with the maximum likelihood (ML) and Bayesian inference (BI) methods. There are yet no sufficient opinions to verify their superiority or inferiority in all cases. When the selected DNA sequence data are rather slowly evolving and large in amount, ML can lead to inferred phylogenetic relationships relatively close to those that would be obtained by analyzing the tree based on the entire genome (Nei and Kumar 2000). The recently proposed BI phylogeny appears to possess advantages in terms of its ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency (Huelsenbeck et al. 2002).

Herein, we reconstructed the phylogeny of the Orthoptera as a vehicle to examine the phylogenetic utility of different datasets in the mitogenome to resolve deep relationships within the order. Also, we explored various methods of analyzing mitogenomic data in a phylogenetic framework by testing the effects of different optimality criteria and data-partitioning strategies.

Methods

Data partitioning

In total, 47 available orthopteran mitogenomes were included in our analyses (Table 1). Ramulus hainanense from the Phasmatodea and Sclerophasma paresisense from the Mantophasmatodea were selected as outgroups. We created six datasets to study the effect of different partitioning schemes on the topology of mitogenomic phylogenies: (1) ATP, (2) cytochrome oxidase (COX), (3) COX + cytochrome (Cyt) b, (4) NADH, (5) the concatenated conserved domain of ribosomal RNA (rRNA(C)), and (6) the concatenated variable domain of rRNA (rRNA(V)).

Table 1 Taxonomic information for the phylogenetic analysis used in this study

DNA alignment was inferred from the amino acid alignment of each of the 13 protein-coding genes using MEGA vers. 5.0 (Tamura 2011). rRNA genes were individually aligned with ClustalX using default settings (Thompson et al. 1997). The 15 separate nucleic acid sequence alignments were manually refined.

Phylogenetic analyses

MrModeltest 2.3 (Nylander 2004) and ModelTest 3.7 (Posada and Crandall 1998) were respectively used to select the model for the BI and ML analyses. According to the Akaike information criterion, the GTR + I + G model was selected as the most appropriate for these datasets. The BI analysis was performed using MrBayes, vers. 3.1.2 (Ronquist and Huelsenbeck 2003) under this model. Two simultaneous runs of 106 generations were conducted for the matrix. Each set was sampled every 100 generations with a burn-in of 25%. Bayesian posterior probabilities were estimated on a 50% majority rule consensus tree of the remaining trees. The ML analysis was performed using the program RAxML, vers. 7.0.3 (Stamatakis 2006) with the same model. A bootstrap analysis was performed with 100 replicates.

Results and discussion

Results

Phylogenetic relationships within the Orthoptera

Different optimality criteria and dataset compilation techniques have been applied to find the best method of analyzing complex mitogenomic data (Stewart and Beckenbach 2009 Cameron et al. 2004 Castro and Dowton 2005 Kim et al. 2005). In this paper, we compared the effect of partitioning according to different protein-coding genes (NADH, COX, COX + Cyt b, and ATP) and different regions in rRNA (rRNA(C) (small subunit ribosomal RNA (rrnS) (III) + large subunit ribosomal RNA (rrnL) (III + IV + V)) and rRNA(V) (rrnS (I + II) + rrnL (I + II + VI))).

Different partitioning schemes had greater or lesser influences on the phylogenetic reconstruction in terms of both the topology and nodal support. When all available data were analyzed, the monophyly of two Orthoptera suborders, the Caelifera and Ensifera, was consistently recovered in the context of our taxon sampling based on most analyses (Figures 1, 2, 3, and 4A,B).

Figure 1
figure 1

Phylogenetic tree built by the Bayesian method based on the NADH dataset. The posterior probabilities are shown close to the nodes. Ac., Acridinae; Br., Bradyporinae; Cal., Calliptaminae; Cat., Catantopinae; Co., Conocephalinae; Cy., Cyrtacanthacridinae; Ep., Episactinae; Go., Gomphocerinae; Grylli., Gryllinae; Gryllo., Gryllotalpinae; Mec., Meconematinae; Mel., Melanoplinae; My., Myrmecophilinae; Oe., Oedipodinae; Ox., Oxyinae; Ph., Phaneropterinae; Py., Pyrgomorphinae; Ro., Romaleinae; Tetr., Tetriginae; Tett., Tettigoniinae; Th., Thrinchinae; Tri., Tridactylinae; Tro., Troglophilinae.

Figure 2
figure 2

Phylogenetic trees built from the cytochrome oxidase (COX) + cytochrome (Cyt) b dataset. (A) ML and (B) BI analyses. Bootstrap proportions and posterior probabilities are shown close to the nodes.

Figure 3
figure 3

Phylogenetic trees built from the cytochrome oxidase (COX) dataset. (A) ML and (B) BI analyses. Bootstrap proportions and posterior probabilities are shown close to the nodes.

Figure 4
figure 4

Phylogenetic trees built from the combined ribosomal (r)RNA(C) dataset. (A) ML and (B) BI analyses. Bootstrap proportions and posterior probabilities are shown close to the nodes.

Phylogenetic relationships within the Ensifera

Based on most analyses, within the Ensifera, the Rhaphidophoridae clustered into one group with the Tettigoniidae and together supported the monophyly of the Tettigonioidea (Figures 1, 2, 3, 4, and 5A,B). This was consistent with results presented by Flook and Rowell (1999), Fenn et al. (2008), Ma et al. (2009), Sun et al. (2010), Zhao et al. (2010), and Zhou et al. (2010), but conflicted with results of Jost and Shaw (2006). Jost and Shaw (2006) found that the Rhaphidophoridae was more closely related to the Grylloidea than to the Tettigoniidae. Relationships among the five subfamilies within the Tettigoniidae were only recovered in the analyses (BI_NADH, BI_(COX + Cyt b), ML_NADH, and ML/BI_rRNA(C)) (Figures 1, 2B, 4A,B, and 6), (Meconematinae + ((Phaneropterinae + Conocephalinae) + (Bradyporinae + Tettigoniinae))). These results were similar to those by Storozhenko (1997), Gwynne and Morris (2002), and Zhou et al. (2010).

Figure 5
figure 5

Phylogenetic trees built from the combined ribosomal (r)RNA(V) dataset. (A) ML and (B) BI analyses. Bootstrap proportions and posterior probabilities are shown close to the nodes.

Figure 6
figure 6

Phylogenetic tree built by the maximum-likelihood method based on the NADH dataset. Bootstrap proportions are shown close to the nodes.

The monophyly of the Grylloidea was supported by the analyses (ML_NADH, ML/BI_(COX + Cyt b), and ML/BI_COX) (Figures 2, 3A,B, and 6). However, the Myrmecophilidae clustered into one clade with the Gryllotalpidae, and the Gryllidae formed an independent monophyletic group in the BI_NADH (Figure 1). These results were consistent with those based on a mitochondrial (mt)DNA sequence by Zhou et al. (2010) but conflicted with results based on three rRNA gene sequences by Flook et al. (1999). The two datasets of COX + Cyt b and NADH may be good choices for resolving deep relationships within the suborder Ensifera.

Phylogenetic relationships within the Caelifera

The monophyly of the Caelifera is widely accepted and is supported by morphological and molecular data (Xia 1994 Fenn et al. 2008 Eades and Otte 2010 Sheffield et al. 2010 Sun et al. 2010 Zhao et al. 2010 2011). In the present study, five superfamilies of Caelifera lineages were included, and they clustered as a monophyletic clade. Our results may provide evidence for resolving phylogenetic relationships among those superfamilies within the Caelifera. The monophyly of five superfamilies within the Caelifera was well supported by our analyses (BI_NADH, ML/BI_(COX + Cyt b), and ML/BI_COX) (Figures 1, 2, and 3A,B). Relationships among the five superfamilies were (Tridactyloidea + (Tetrigoidea + (Eumastacoidea + (Pneumoroidea + (Acridoidea))))). The Tridactyloidea occupied the basal position.

Within the Acridoidea, the BI_NADH analysis produced an identical topology to the OSF system (Eades and Otte 2010) (Figure 1). The respective monophylies of the Acrididae, Romaleidae, and Pamphagidae were well recovered only by this analysis.

Phylogenetic relationships within the Acrididae

With regard to relationships among those subfamilies within the Acrididae, divergent tree topologies were resolved from the different datasets we selected in this study. Our initial analyses using the six datasets led to quite different tree topologies, and neither the ML nor BI trees based on these datasets well resolved deep phylogenetic relationships with the exception of the BI_NADH analysis (Figure 1). The relationships among the eight acridid subfamilies were (Cyrtacanthacridinae + (Calliptaminae + (Catantopinae + (Oxyinae + (Melanopline + (Acridinae + (Oedipodinae + Gomphocerinae))))))).

The five subfamilies, the Cyrtacanthacridinae, Catantopinae, Calliptaminae, Oxyinae, and Melanopline, were placed in one family, the Catantopidae, in Xia's (1958) system. In the BI_NADH analysis, the five subfamilies were split into three clades, Cyrtacanthacridinae, (Calliptaminae + Catantopinae), and (Oxyinae + Melanopline). The Acridinae species were split into two clades. Phlaeoba albonema formed one monophyletic clade, while the other two species were grouped together into one clade with species of the Oedipodinae, which was in conflict with the morphological taxonomy and previously reported topologies (Fenn et al. 2008 Sheffield et al. 2010 Sun et al. 2010 Zhao et al. 2010 2011). For the Gomphocerinae, both subfamilies Arcypterinae and Gomphocerinae in Xia's (1958) system were consolidated into one group. This result supports the monophyletic group of the Gomphocerinae in the OSF system (Eades and Otte 2010).

Few of the analyses recovered a topology completely congruent with other studies. Most datasets scattered members of the Acrididae throughout the tree and failed to resolve most of the clades. The topology within the Acridoidea based on rRNA(V) was almost consistent with that of BI_NADH with the sole exception of the Pamphagidae which was located away from the Acridoidea (Figure 5A,B). In the analyses with this dataset, eight subfamilies within the Acrididae adopted in this work were grouped into three clades: (1) clade 1 containing (the Catantopinae + Cyrtacanthacridinae + Calliptaminae + Oxyinae + Melanopline); (2) clade 2 containing (the Oedipodinae + Acridinae), and (3) clade 3 containing the Gomphocerinae.

Discussion

Phylogenetic analyses in this study

We performed 12 separate phylogenetic analyses to test the effect of the optimality criteria and data-partitioning strategies on mitogenomic phylogenies of the Orthoptera. The results indicated that the differing datasets had much larger effects than the optimality criteria on both the topologies and levels of support.

In terms of the ability to resolve deeper-level relationships in the Orthoptera, conserved gene data (COX + Cyt b and COX) resolved the relationships among major Orthoptera lineages (between suborders or among superfamilies), but were unable to unambiguously resolve intra-subfamily relationships within the Acrididae (Figures 2 and 3A,B). This suggests that the two datasets might not have sufficient phylogenetic signals to resolve relationships among closely related species.

The ATP topologies were the worst among the analyses performed herein (Figure 7A,B). The ATP dataset gave tree topologies that were wildly incongruent with the other datasets and with previously accepted orthopteran phylogenies (Fenn et al. 2008 Ma et al. 2009 Sun et al. 2010 Zhao et al. 2010 2011).

Figure 7
figure 7

Phylogenetic trees built from the ATP dataset. (A) ML and (B) BI analyses. Bootstrap proportions and posterior probabilities are shown close to the nodes.

Among the total evidence analyzed, there were no apparent effects of different optimality criteria on the tree topologies with rRNA(C) and rRNA(V) (Figures 4 and 5A,B). However, different optimality criteria did result in different topologies among the other datasets. In both the MP and BI analyses based on COX + Cyt b (Figure 2A,B), Acrididae species were split into four clades: (1) clade 1 containing the (Gomphocerinae + Melanopline); (2) clade 2 containing the (Romaleinae + Pamphaginae + Cyrtacanthacridinae + Calliptaminae + Catantopinae + P. albonema); (3) clade 3 containing the (Acridinae (Acrida) + Arcyptera coreana); and (4) clade 4 containing the (Oedipodinae). However, the positions of clades 1 and 2 were reversed in the two analyses. Among the remaining datasets (NADH, COX, and ATP), different optimality criteria greatly influenced reconstruction of the ingroup topology (Figures 1, 3, 6, and 7A,B). Nodal support values also appeared to be affected by the optimality criteria in that bootstrap values for the ML analyses were generally lower than those for the BI analyses, which was consistent with analyses by Fenn et al. (2008).

Here, genes with intermediate rates of evolution might have had better phylogenetic utility for the questions at hand.

Choice of genes and their contribution to a total evidence tree

Correction or weighting of DNA-sequence data based on the level of variability can improve phylogenetic reconstructions in some cases. So gene choice is of critical importance.

Evolution rates of rRNA genes considerably vary along the length of molecules (Hillis and Dixon 1991 Simon et al. 1991). Short-range stems and loops tend to be less conserved compared to long-range stems (Hixson and Brown 1986 Simon et al. 1990). Unpaired regions joining domains tend to be highly conserved. rRNA domains evolve at different average rates dictated by their functional constraints. For example, in rrnS, the 5′ half (domains I and II) has many fewer conserved nucleotide strings than the 3′ half (domain III) (Clary and Wolstenholme 1985 De Rijk et al. 1993 Van de Peer et al. 1993). So, domain III has routinely been used in insect systematic studies as a molecular marker (Simon et al. 1994). Similarly, in rrnL, domains I, II, and VI, on average, are less conserved than domains III, IV, and V (Uhlenbusch et al. 1987; Gutell et al. 1992). So the majority of structural and phylogenetic studies mainly focused on the 3′ half of the rrnL molecule (Kambhampati et al. 1996 Flook and Rowell 1997a b; Buckley et al. 2000). The 3′ halves of rrnS and rrnL are not very useful for phylogenetic studies of recently diverged species, because they contain few sites that vary (Simon et al. 1994). Milinkovitch et al. (1993) successfully analyzed relationships among 16 whale taxa using only the most conserved domains of these two ribosomal genes. rRNA genes are most likely to be useful at the population level and at deep levels of divergence. However, if a researcher is choosing a study of relationships among closely related species, a protein-coding gene might be a better first choice.

Protein-coding genes may be more appropriate for phylogenetic analyses at intermediate levels of divergence. The phylogenetic performance of different genes is related to their particular rates of evolution. The three protein-coding genes, atp6, atp8, and nad4L, are the fastest evolving genes, while COX subunits and Cyt b show much-slower overall rates of evolution (Russo et al. 1996; Zardoya and Meyer 1996; Cameron et al. 2004). Cox1 is the most conserved gene in terms of amino acid evolution. In the past, coxl, cox2, cytb, and nad2 were extensively used for phylogenetic analyses (Liu and Beckenbach 1992; Simon et al. 1994; Caterino et al. 2000; Chapco et al. 2001; Litzenberger and Chapco 2001; Chapco and Litzenberger 2002; Amédégnato et al. 2003). Cox2 is the most widely used mitochondrial protein-coding gene in insects (Simon et al. 1994). Both nad4 and nad5 are large genes, and their protein sequence divergences seem to be helpful in constructing trees for distantly related species (Russo et al. 1996). The nad6 gene was often omitted because it is coded on the light strand, and its properties differ from those of the other 12 protein-coding genes (Springer et al. 2001). Zardoya and Meyer (1996) classified mitochondrial protein-coding genes into three groups, good (nad4, nad5, nad2, cytb, and cox1), medium (cox2, cox3, nadl, and nad6), and poor (atp6, nad3, atp8, and nad4L) phylogenetic performers. Some genes seem to be consistently more-reliable tracers of evolutionary history than others.

Conclusions

Our findings suggest that the best phylogenetic inferences can be made when moderately divergent nucleotide data from mitogenomes are analyzed, and that the NADH dataset was suited for studying orthopteran phylogenetic relationships at different taxonomic levels, which may have been due to the larger amount of DNA sequence data and the larger number of phylogenetically informative sites.