Background

Mitochondrial loci have been the most popular phylogenetic markers in animals for over three decades. Their ease of amplification [1, 2], uniparental inheritance, lack of recombination in mammals [2,3,4], differential selection among genes [2], and general synteny [5] combine to make mitochondrial phylogenetic markers an excellent choice for many study systems. Recent increases in DNA sequencing capabilities have provided an opportunity to sequence full mitochondrial genomes (mitogenomes) rather than partial, one, or a few mitochondrial genes. As a result, previous phylogenies that relied on a portion of the mitochondrion are now undergoing re-analysis in an effort to expand support for or re-affirm previous conclusions [6].

The genus Peromyscus, commonly referred to as North American deer mice, encompasses more than 70 species that diverged within the last 6–10 million years [7]. Species including P. maniculatus [8] and P. leucopus [9] are among the most common mammals in North America and have been studied extensively for over 100 years [10]. Despite extensive study, new species [11] and subspecies [12] are being described on a regular basis. The large number of species, both described and undescribed, as well as substantial cryptic variation, has yielded numerous distinct phylogenetic hypotheses.

Previous classifications of Peromyscus recognized seven morphologically distinct subgenera (Habromys, Haplomylomys, Isthmomys, Megadontomys, Osgoodomys, Peromyscus, and Podomys) [13, 14]. Many of those subgenera (Habromys, Isthmomys, Megadontomys, Osgoodomys, and Podomys) have at times been elevated to generic status [15]. Some recent molecular phylogenies, however, show paraphyly [7]. Isthmomys is sister to the genus Peromyscus, representing a distinct genus, as is currently accepted, but Habromys, Megadontomys, Osgoodomys, Neotomodon, and Podomys, distinct genera, were found within Peromyscus, rendering the genus paraphyletic [7, 16, 17], though Neotomodon alstoni [18] previously had been Peromyscus alstoni [14, 19].

Recent attempts to resolve the history of this clade by Bradley et al. [16], Miller and Engstrom [17] and Platt et al. [7] were based on various combinations of nuclear and mitochondrial genes. All three studies arrived at similar topologies, but varied in their levels of support at some nodes. Species-level relationships were well-supported but the relationships among some genera and species groups were not. Given that approaches that use cytochrome b (cytb) alone were somewhat successful in resolving Peromycus, we reasoned that the use of whole mitogenomes, a more data-rich approach, may be an avenue to more robust results at the generic and species group level, thus providing additional clarity. Here, we analyze whole mitogenomes from Habromys, Isthmomys, Neotomodon, Peromyscus, Podomys, and three outgroups, Sigmodon, Neotoma, and Reithrodontomys, as identified from previous molecular phylogenies [7, 16, 17].

Methods

Sampling, DNA preparation and sequencing

Taxa selected for this study were identified based on the findings of Platt et al. [7]. Our objective was to reduce taxon sampling for some groups to isolate specific clades. To do so, we chose representatives from selected subclades to serve as proxies. Specifically, one individual here is a proxy for a larger clade (generally represented as a species group). Members of Neotoma, Reithrodontomys, and Isthmomys, which have determined to be closely related clades by several morphological and molecular analyses [7, 15, 16] were included. Species identifications, museum catalog numbers and are included in Table 1.

Table 1 Basic read, coverage and size data for each taxon. Museum ID refers to a special identification or catalogue number for speciments at the Natural Science Research Laboratory, Museum of Texas Tech University

Whole genomic DNA was isolated using a Qiagen DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) via the manufacturer’s protocol and fragmented to ~ 400 bp. Sizes were verified on an Agilient 2100 Bioanalyzer. Illumina compatible sequencing libraries were prepared from the fragmented DNA using the NEB/KAPA library preparation. Each sample was tagged with a unique index and pooled in equal proportions, after which, the pooled libraries were sequenced on single run of an Illumina MiSeq (2 × 250 bp). Raw data are available in GenBank under BioProject ID PRJNA308567.

Mitochondrial genome assembly and annotation

Raw sequence reads were filtered and processed using Trimmomatic version 0.35 [20]. Specifically, we clipped Illumina adapters, disregarded read ends whose Phred scores fell below 20, and utilized a four-base sliding window to trim reads once the average quality fell below 25. We assembled mitogenomes through a custom Bash script (https://github.com/KevinAMSullivan/Mitochondrial_Genomes/tree/MIRA) The script utilized two major programs, MITObim [21] and MIRA [22] to map the filtered reads to a reference genome before assembling the mitogenomes. We selected Akodon montensis [23], a Sigmodontine rodent and the most closely related organism with a fully sequenced mitogenome, as the reference (GenBank accession number KF769456).

MITOS [24] was used to annotate the mitochondrial genes for each genome. Putative genes were submitted to BLASTn to confirm sequence length. When gaps were noted, we manually checked for frameshifts but none were observed. Acceptable results were those whose top hits were to a Peromyscus or closely related species and that covered nearly 100% of the putative gene. The putative gene was shortened or lengthened to match the length of the BLAST hit if a different size. All sequences were deposited in GenBank under the accession numbers KY707299–707312. Genome coverage was estimated using bedtools genomecov [25] in combination with mapping of the processed reads to each MITObim assembly with BWA via a custom script (https://github.com/KevinAMSullivan/Mitochondrial_Genomes/tree/MIRA).

Assembly validation

To test the reliability of mitogenome assemblies, we used two cytb genes per taxon from GenBank as controls (Additional file 1: Table S1), and worked under the assumption that our reference assembled cytb gene and those from GenBank should form monophyletic clades. We chose full length, non-identical entries from the selected species, aligned them with cytb from our assemblies, and used RAxML [26] to estimate the best tree from 1000 maximum likelihood (ML) searches. Support for each node was generated with bipartition frequencies from 10,000 bootstrap replicates. The GTR+GAMMA+I model of nucleotide substitution was used for both ML analyses.

Phylogenetic analysis of whole mitochondrial genomes

We concatenated the mitochondrial protein coding sequences and aligned them with Muscle [27]. That alignment is available as Additional file 2. Bayesian and ML phylogenetic analyses were accomplished using MrBayes [28] and RAxML [26], respectively. For the Bayesian analysis, four independent runs on five chains were implemented for 3,000,000 generations. We evaluated stability of the final tree by continuing to an average standard deviation of split frequency less than 0.01. The data were also partitioned by codon. No substitution model was specified as a prior to let the program search across all possible models to determine the most appropriate model of evolution. We then used the selected model, GTR+GAMMA, for our ML analysis, for which we implemented 1000 bootstrap replicates. In both analyses, the A. montensis and S. hispidus [29] mitogenomes were specified as outgroups when drawing the trees as Neotominae and Sigmodontinae are sister subfamilies whose relationships have been supported by previous analyses [30, 31]. Our unrooted tree supports this relationship. The implementation scripts for both analyses are available on github (https://github.com/KevinAMSullivan/MIRA-MitoBim/tree/Phylogenies).

Results

We sequenced and assembled whole mitogenomes of 14 rodents in the subfamily Neotominae and one member of Sigmodontinae, Sigmodon hispidus. Taxon sampling included seven genera and six of the 13 species groups within Peromyscus [16]. Genome assembly data, including accession numbers, read coverage, and mean read totals across all taxa, are listed in Table 1.

Each assembled mitogenome comprises 22 tRNA, 2 rRNA, and 13 protein coding regions, totaling 37 genes. Although most genes are transcribed from the positive (heavy) strand, several genes, mainly tRNAs but also including nad6, are transcribed from the negative (light strand). Additionally, the 3′ end of atp6 overlaps with the 5′ end of atp8, a common characteristic in mitogenomes. Noncoding regions typical of mitogenomes are present, including a control region downstream of cytb. As these mitogenomes were not closed assemblies, such variation in mitogenome size is thought to be due to the highly variable and often heteroplasmic control region.

Validation of MITObim assembled and MITOS-annotated cytb took three forms. First, reconstructed genes clustered with loci amplified and sequenced using more traditional methods (Additional file 3: Figure S1), suggesting that the remaining protein coding sequences were accurately assembled and annotated for each taxon. Second, differences that were present are within acceptable ranges for such rapidly evolving loci. Despite forming monophyletic clades for each species, some of the cytb genes differed from those in GenBank. For example, cytb from our P. crinitus [32] mitogenome exhibited 43 and 44 differences, respectively, when compared to their GenBank counterparts, i.e. ~ 96% identity. Divergences in the range may be expected for such a rapidly evolving locus within a species [33]. Third, we have high confidence in our base calls. For example, average coverage of our cytb loci is 142× (data not shown), and the average percentage of those reads indicating any given nucleotide is 99.3%. Given such high support, as well as the fact that P. crinitus is the top BLAST hit to the gene (96% identify, 0.0 E value), we have confidence in the veracity of our cytb sequences over alternate hypotheses for such sequence dissimilarity. Similar results were observed for cytb sequences in all taxa (Additional file 4: Table S2). The single exception was for Neotoma mexicana. The best match (100%) for our reconstructed cytb sequence was to N. isthmica. However, N. isthmica is closely related to N. mexicana [34], and the second best hit was to N. mexicana (99%), confirming its validity as an outgroup [35].

ML and Bayesian phylogenies inferred from mitochondrial protein coding genes recovered identical topologies (Fig. 1). This topology is similar to recent molecular phylogenies. First, S. hispidus, N. mexicana, I. pirrensis [36], and R. mexicanus [37] are positioned outside of Peromyscus. Unlike previous phylogenies however, high posterior probabilities are found at these nodes. In fact, every node, save that linking P. crinitus and P. polionotus [8], has high support. Second, our analyses indicated a paraphyletic Peromyscus. Finally, Isthmomys and Reithrodontomys comprise a clade sister to what is currently considered to Peromyscus and its affiliated genera. This pairing is of note, as both genera have shown conflicting relationships depending on the marker used [7, 16, 17, 38].

Fig. 1
figure 1

Phylogeny inferred by the partitioned maximum likelihood analysis. The topology of the Bayesian tree was identical. A. montensis was included as a previously sequenced outgroup. Bootstrap values (above) and posterior probabilities (below) are provided at all internodes

Our phylogeny using whole mitogenomes largely mirrors that of Platt et al. [7], save the quartet of H. ixtlani [39], P. aztecus [37], P. pectoralis [40], and P. attwateri [41]. Platt et al. found P. attwateri and P. aztecus to form a clade, with P. pectoralis as sister to this grouping and H. ixtlani being the most closely related species to a clade containing all three. The whole mitogenome tree is in partial agreement with Bradley et al. [16] with regard to these same four species. Our close pairing of P. attwateri and P. pectoralis matches, but their phylogeny places P. aztecus as a sister group to a clade that encompasses H. ixtlani, P. attwateri, and P. pectoralis.

Discussion

Regardless of the differences or similarities, our tree (Fig. 1) provides support for previously unsupported nodes within the Peromyscus phylogeny. Previous trees were largely resolved, but lacked significant support at key nodes, mainly those in middle regions of the tree that identify relationships among species groups [7, 16, 17]. Platt et al. [7] provided substantial support (posterior probabilities ≥ 95% and bootstrap values ≥ 85) at nodes grouping the crinitus, eremicus, and californicus species groups; the mexicanus, megalops, and melanophrys species groups; and the aztecus and boylii species groups. However, support in the Bayesian mitogenome phylogeny has posterior probabilities (> 0.95) at every node and bootstrap support (> 90) in the ML phylogeny at all nodes save one.

Two branches of our inferred phylogeny had not yet been reinforced by any molecular analysis. The first concerns H. ixtlani, P. aztecus, P. attwateri, and P. pectoralis. Although some analyses had suggested a close relationship among these taxa, no node in this clade had received substantial support in any previous molecular phylogeny. Our analysis is the first to provide such support for these relationships.

The second branch of interest suggests a close relationship between Isthmomys and Reithrodontomys. Although no phylogenetic analysis ever suggested the genus Reithrodontomys should be nested within Peromyscus, Isthmomys had previously been subsumed as a subgenus [13]. The pairing is counterintuitive given their morphological differences. Although both occupy the same geographic region, there are obvious incongruities in size. Isthmomys pirrensis can reach well over 100 g (averaging 140 g in one study) and 300 mm [42, 43], whereas rodents in Reithrodontomys are much smaller. R. mexicanus ranges from 167 to 190 mm in length [19], and a collection of different surveys gave a range from 7.9–9.5 g [42]. Although their pairing has been previously suggested [7, 16, 17], our analysis is the first to provide high support values.

One major weakness of our study is its lack of complete taxon sampling. This makes comprehensive phylogenetic reconstruction uncertain due to missing critical (i.e. representing additional ingroups) taxa. The addition of excluded taxa, especially Osgoodomys and Megadontomys, should provide a more comprehensive view of phylogenetic relationships within the genus. That being said, the high support values we recover suggest that the relationships at the deeper nodes are valid.

The Peromyscus phylogeny has been studied repeatedly and revised often, using data from several sources [42]. This latest attempt, using next-generation sequencing, continues to suggest paraphyly. Indeed, every molecular phylogeny has been unequivocal in suggesting paraphyly. Given this observation, addressing the status of genera subsumed within Peromyscus should be seen as a priority within rodent taxonomy.

Aside from acknowledging paraphyly, future work on Peromyscus phylogenetics should focus on further clarifying the true phylogeny via more taxa and markers. The inclusion of additional mitogenomes will elucidate relationships within the genus to provide increased detail. However, despite their advantages, mitochondrial loci are imperfect markers. Mitochondria genomes are susceptible to incomplete lineage sorting (ILS) and introgression [44]. Hybrids, especially asymmetric hybrids, exemplify a similar problem in that they may all have the same mitochondrial DNA and species boundaries in those cases would not be reflected [45,46,47]. Supplementing the mitochondrial phylogeny with large numbers of nuclear markers such as ultraconserved elements (UCEs) [48, 49] or retrotransposons [50, 51] is an important next step in understanding phylogenetic relationships within Peromyscus.

Conclusions

Here, we present analyses of whole mitochondrial genomes, for 14 species, ten of which are Peromyscus or close relatives. The data yield phylogenies with significant support at previously unsupported nodes, particularly the pairing of Isthmomys and Reithrodontomys, but also suggest paraphyly within the genus that could be resolved by elevating monophyletic groups to genera or subsuming currently recognized genera as subgenera. Our analyses provide evidence that additional data will help clarify the evolutionary history of this genus.