Genetica

, Volume 132, Issue 1, pp 21–33

Molecular analyses of mitochondrial pseudogenes within the nuclear genome of arvicoline rodents

Authors

    • Department of Forestry and Natural ResourcesPurdue University
    • Department of Veterinary Integrative BiosciencesTexas A&M University
  • J. Andrew DeWoody
    • Department of Forestry and Natural ResourcesPurdue University
Article

DOI: 10.1007/s10709-007-9145-6

Cite this article as:
Triant, D.A. & DeWoody, J.A. Genetica (2008) 132: 21. doi:10.1007/s10709-007-9145-6
  • 123 Views

Abstract

Nuclear sequences of mitochondrial origin (numts) are common among animals and plants. The mechanism(s) by which numts transfer from the mitochondrion to the nucleus is uncertain, but their insertions may be mediated in part by chromosomal repair mechanisms. If so, then lineages where chromosomal rearrangements are common should be good models for the study of numt evolution. Arvicoline rodents are known for their karyotypic plasticity and numt pseudogenes have been discovered in this group. Here, we characterize a 4 kb numt pseudogene in the arvicoline vole Microtusrossiaemeridionalis. This sequence is among the largest numts described for a mammal lacking a completely sequenced genome. It encompasses three protein-coding and six tRNA pseudogenes that span ∼25% of the entire mammalian mitochondrial genome. It is bordered by a dinucleotide microsatellite repeat and contains four transposable elements within its sequence and flanking regions. To determine the phylogenetic distribution of this numt among the arvicolines, we characterized one of the mitochondrial pseudogenes (cytochrome b) in 21 additional arvicoline species. Average rates of nucleotide substitution in this arvicoline pseudogene are estimated as 2.3 × 10−8 substitutions/per site/per year. Furthermore, we performed comparative analyses among all species to estimate the age of this mitochondrial transfer at nearly 4 MYA, predating the origin of most arvicolines.

Keywords

Chromosomal rearrangementsCytochrome bMicrotusMitochondrial genomenumtTransposable elementsRodentVole

Introduction

The historic transfer of proto-mitochondrial DNA to the nucleus and the subsequent integration of mitochondrial genes into the nuclear genome has been one of the predominant forces driving the evolution of these two distinct genomes (Margulis 1970; Lang et al. 1999). Although the two genomes remain physically separated, insertion of mitochondrial DNA into the nucleus has continued and has resulted in the accumulation of translocated mitochondrial fragments within eukaryotic nuclear genomes. Despite their absolute numbers, numts (nuclear mitochondrial pseudogenes; Lopez et al. 1994) usually comprise less than 0.1% of the nuclear genome because they are typically small in length. However, numts can range in size from 30 bp in rice (Fukuchi et al. 1991) to 14.7 kb in humans (Mourier et al. 2001) and their average size can vary by orders of magnitude across taxa (Leister 2005). Numt integration seems to be more extensive in plants as their sizes can be hundreds of kb in length (Stupar et al. 2001; Noutsos et al. 2005). The size or abundance of numts does not correlate with absolute genome size, gene density or abundance of mitochondrial transcripts, which could suggest that the accumulation of numts is lineage-specific (Woischnik and Moraes 2002; Richly and Leister 2004).

Numts often integrate into non-coding regions of the nuclear genome (Leister 2005) but Ricchetti et al. (2004) found that in humans, recently transferred numts preferentially insert within or near coding regions and may alter gene function. This result is consistent with the idea that numt transfer may be mediated by chromosomal repair mechanisms. Transcription can induce DNA breaks (Aguilera 2002; Gonzales-Barrera et al. 2002); therefore highly expressed genes may be associated with chromosomal breaks and targets for numt insertions. Indeed, there is empirical evidence that numts are often coupled with chromosomal breaks and end-joining mechanisms (Blanchard and Schmidt 1996). Willet-Brozick et al (2001) analyzed a breakpoint junction of a chromosomal translocation and discovered a 42 bp numt insertion of the mitochondrial 12s RNA gene while Ricchetti et al. (1999) reported that numts were transferred to yeast chromosomes during the repair of double strand breaks.

Numts have been described in a variety of animal and plant taxa, but rodents are an appealing animal model for their study. In particular, rodent genomes evolve rapidly relative to other mammals at both the DNA sequence level and at the chromosomal level (Contreras et al. 1990; Li et al. 1996; Cooper et al. 2003; Rat Genome Sequencing Project Consortium 2004). More specifically, arvicoline rodents are an especially attractive model for the study of numts because cursory descriptions of their numts have been published (DeWoody et al. 1999; Jaarola and Searle 2004) and their karyotype is exceedingly plastic (Matthey 1973; Modi 1987; Mazurok et al. 2001). Furthermore, arvicolines have evolved much faster than most rodents—over 60 species of the vole genus Microtus have evolved in less than 2 million years (Chaline et al. 1999). The elevated rate of speciation in Microtus is mirrored by the rapid rate of nucleotide substitution in its mitochondrial genome (Conroy and Cook 2000; Triant and DeWoody 2006).

The rodent subfamily Arvicolinae (voles, lemmings, and muskrats) includes >130 species distributed among >20 genera (Musser and Carleton 2005) although the exact number of species has varied widely. The earliest arvicoline rodent is thought to have arisen during the early Pliocene of North America and Eurasia (∼5–6 MYA; Repenning 1987; Martin 1989) but most of the contemporary diversity within the Arvicolinae is found within the genus Microtus. This genus underwent a rapid diversification and now encompasses more than 50% of the species within the entire subfamily. The accelerated rate of speciation seen within Microtus has been attributed to chromosomal rearrangements (Reig 1989), and the rapid rate of karyotypic change is illustrated by diploid numbers (2n) that range from 17 to 64 (Matthey 1973; Maruyama and Imai 1981). Whether these chromosomal changes are driving speciation is unclear, but if numt insertions are driven by chromosomal repair mechanisms, the chromosomal rearrangements that have occurred throughout the evolutionary history of arvicoline rodents would seem to provide numerous opportunities for nuclear integrations.

Once integrated into nuclear DNA, the evolutionary rate of mammalian numts should slow because of the difference in substitution rates between the mitochondrial and nuclear genomes. As such, numts may represent an ancestral form of their corresponding mitochondrial fragment and reflect the underlying mutational process (Li et al. 1981; Perna and Kocher 1996). Substitution patterns can reveal whether numts arose from multiple insertion or duplication events and allow for the examination of relative rates of evolution between the nuclear and mitochondrial genome. These types of comparative analyses, however, can be confounded by recombination and indel insertions among numt sequences. For example, complete genomic sequences provide a means to identify entire ranges of numts within an organism but reports of human numt abundance have been conflicting (Tourmen et al. 2002; Woischnik and Moraes 2002; Richly and Leister 2004). The prevalence of indels has likely contributed to alignment ambiguities and discrepancies among numt abundance estimates.

Here, we address some of the challenges involved in isolating numts from multiple species and assessing the evolutionary history of those sequences. We investigate the molecular evolution and structural composition of a large arvicoline numt that contains the cytochrome b pseudogene (ψcytb) originally described by DeWoody et al. (1999) in the Eurasian species M. arvalis. We further extended the original analysis of ψcytb in M. rossiaemeridionalis (the sister taxon of M. arvalis) to the entire numt by characterizing the transposable and repetitive elements, which flank it. By using a comparative approach to date the original arvicoline ψcytb transfer, we evaluate the rate of molecular evolution within the nuclear genome and compare it to the accelerated rate of molecular evolution in arvicoline mtDNA (Triant and DeWoody 2006).

Materials and methods

Isolation of ψcytb

To isolate the previously described nuclear ψcytb pseudogene (AF057139) to the exclusion of the mitochondrial cytochrome b gene, we used Sequencher 4.1 (GeneCodes) to align ψcytb with a mitochondrial cytochrome b sequence (AF159403) and designed four primers (two forward and two reverse) specific to ψcytb (PcytbF: 5′-ATGACAATCATCTGGGGGGA-3′; PcytbF2: 5′-CTCTCTACTGGGCCTATGCT-3′; PcytbR: 5′-GATTGGTATGAAGATTATGATAAT-3′; PcytbR2: 5′-TGATAATGGCGAAGTAGCCG-3′). We tested all possible primer combinations in three Eurasian and one Holarctic species of Microtus using tissue samples obtained from The Museum at Texas Tech: M. agrestis, M. arvalis, M. oeconomus, and M. rossiaemeridionalis (Table 1).
Table 1

Arvicoline species used in this study and their museum accession numbers

Species

Common name

Museum accessions

MtDNA

Numt

Microtus abbreviatus

Insular vole

UAM51870

AF163890

DQ323935

Microtus agrestis

Field vole

TK74209

AF119271

DQ323937

Microtus arvalis

Common vole

TK46583

U54492

AF057139

Microtus californicus

California vole

MVZ182976

AF163891

DQ323937

Microtus chrotorrhinus

Rock vole

UAM55294

AF163893

DQ323938

Microtus kikuchii

Taiwan vole

MVZ180848

AF163896

DQ323939

Microtus mexicanus

Mexican vole

MSB96299

AF163897

DQ323940

Microtus miurus

Singing vole

UAM56354

AF163899

DQ323941

Microtus montanus

Montane vole

UAM62094

AF119280

DQ323942

Microtus ochrogaster

Prairie vole

UAM36773

AF163901

DQ323943

Microtus oeconomus

Root vole

TK46514

AF163902

DQ323944

Microtus oregoni

Creeping vole

UAM50891

AF163903

DQ323945

Microtus pennsylvanicus

Meadow vole

Local specimen

AF119279

DQ323946

Microtus pinetorum

Woodland vole

UAM54655

AF163904

DQ323947

Microtus rossiaemeridionalis

Sibling vole

TK44630

DF015676

DQ323955

Microtus townsendii

Townsend’s vole

UAM50786

AF163906

DQ323948

Microtus xanthognathus

Yellow-cheeked vole

UAM62664

AF163907

DQ323949

Clethrionomys gapperi

Southern red-backed vole

TK26682

AF272636

DQ323950

Clethrionomys glareolus

Bank vole

TK20669

AY309421

DQ323951

Clethrionomys rutilus

Northern red-backed vole

TK29794

AY309242

DQ323952

Synaptomys cooperi

Southern bog lemming

Local specimen

DQ323957

DQ323953

Ondatra zibethicus

Muskrat

Local specimen

AF119277

DQ323954

TK, Museum at Texas Tech; UAM, University of Alaska Museum; MSB, Museum of Southwestern Biology; MVZ, Museum of Vertebrate Zoology; MtDNA, GenBank accession numbers for mitochondrial cytochrome b sequences; Numt, GenBank accession numbers for cytochrome b numt sequences

Genomic DNA was extracted with a standard proteinase K/phenol–chloroform protocol (Sambrook and Russel 2001). PCRs were performed in a final volume of 25 μl and included 1× ThermoPol Buffer (New England BioLabs), 0.2 mM dNTPs, 0.25 μM each primer, 2.5 U Taq DNA polymerase (New England Biolabs), and 0.03 U Pfu DNA polymerase (Stratagene). The thermal profile consisted of an initial denaturation at 94°C for 2 min; 32 cycles of 94°C for 1 min, 55°C for 30 s, 72°C for 1 min; and a final elongation step for 4 min at 72°C. PCR products were cleaned with Qiaquick purification kits (Qiagen). We then identified restriction sites specific to each fragment (nuclear and mitochondrial) and digested the amplicons with the restriction enzyme Rsa I (New England Biolabs), which would determine whether the products were nuclear in origin prior to direct sequencing. Putative nuclear products were sequenced in both directions with the amplification primers and two internal sequencing primers (PcytbSeq1: 5′-TTCAGTAGACAAAGTCACTC-3′, PcytbSeq2: 5′-GGAATAGTAGGAGAACTAAT-3′) using BigDye v.3.1 (Applied Biosystems) following the manufacturer’s protocol modified to one-eighth reactions then cleaned with a sodium acetate precipitation. We aligned each amplicon with its corresponding mitochondrial cytochrome b sequence and concluded that the amplicons were numts by the presence of stop codons or indels using both the universal and mammalian mitochondrial genetic codes.

After identifying the ψcytb numt in the original four Microtus species, we sampled additional taxa within the Arvicolinae: 12 North American endemic Microtus species, one Asian species, three species from the genus Clethrionomys (the putative sister genus to Microtus), the lemming genus Synaptomys and the muskrat genus Ondatra, one of the more primitive genera within the arvicoline rodents. It has been reported that multiple numts, which span a similar portion of the mtDNA molecule can independently transfer to the nucleus (Mirol et al. 2000; Mundy et al. 2000). Thus, whenever possible we used the same primers and PCR conditions in an attempt to isolate orthologs ψcytb amplicons from all taxa sampled. However, our PCR amplifications were sub-optimal when used with the genus Clethrionomys, so we designed new primers specific to the problematic taxa.

Elongation of ψcytb in M. rossiaemeridionalis

We then attempted to determine whether the ψcytb pseudogene was part of a larger numt and used a Genome Walker kit (Clonetech) to define the insertion boundaries in M.rossiaemeridionalis, the sister taxon of M. arvalis. The Genome Walker kit utilizes a linker-ligation/primer-walking protocol to identify unknown flanking sequences. The Pcytb primer sequences described above were elongated to create a set of nested primers on either end of ψcytb. We followed the manufacturer’s suggested protocols, but modified the PCR conditions in that we used 5 U Taq (New England Biolabs) and 0.05 U Pfu DNA polymerase (Stratagene) in all reactions to help reduce fidelity errors (Cline et al. 1996). All amplicons were cleaned, bidirectionally sequenced, and aligned with the sequence of the M. rossiaemeridionalis mitochondrial genome (DQ015676; Triant and DeWoody 2006).

Comparative sequence analyses

We constructed two sets of sequence alignments. One dataset consisted of the ∼1 kb ψcytb nuclear pseudogenes and their corresponding mitochondrial cytochrome b sequences from multiple arvicoline species (n = 22, Table 1). The second dataset, from M. rossiaemeridionalis only, consisted of the full-length numt sequence (4 kb, which we term “Mr_numt”) and its corresponding mitochondrial sequence. The ψcytb arvicoline pseudogenes are presumably each part of a larger numt that are each orthologs to Mr_numt.

We downloaded mtDNA cytochrome b sequences from GenBank (Table 1) to use in comparative analyses with arvicoline ψcytb sequences. There were no such Synaptomys cooperi mtDNA sequences available in GenBank, so we amplified and sequenced cytochrome b from a local specimen using the primers L14724/H15915 (Irwin et al. 1991) and internal sequencing primers (S.coop_Int1: 5′-TACAAACCTACTATCAGC-3′, S.coop_Int2: 5′-GGATGAAGTGAAATGCGA-3′). All ψcytb sequences and their corresponding mtDNA sequences were aligned using Sequencher 4.1 (GeneCodes) and ambiguous basecalls were resolved by resequencing. The numt pseudogene found in S. cooperi contained a 550 bp deletion (Table 2) and was less than half the size of most other numts. Therefore, it was removed from further analyses.
Table 2

Divergence between putative ψcytb nuclear sequences and their mitochondrial cytochrome b counterparts

Species

Length

Ns

Ts/Tv

Indels (# nucleotides)

Microtus abbreviatus

983

0.19

1.88

2 (2)

Microtus agrestis

971

0.19

1.73

2 (4)

Microtus arvalis

1,138

0.17

1.84

5 (13)

Microtus californicus

990

0.18

1.89

0 (0)

Microtus chrotorrhinus

962

0.17

1.72

1 (1)

Microtus kikuchii

1,310

0.17

2.80

4 (384)

Microtus mexicanus

1,155

0.18

1.93

2 (173)

Microtus miurus

988

0.19

1.90

1 (1)

Microtus montanus

1,047

0.2

1.92

3 (69)

Microtus ochrogaster

952

0.18

1.77

1 (1)

Microtus oeconomus

967

0.18

1.96

1 (6)

Microtus oregoni

1,342

0.19

2.18

4 (404)

Microtus pennsylvanicus

1,027

0.18

2.10

3 (68)

Microtus pinetorum

998

0.19

1.68

3 (6)

Microtus rossiaemeridionalis(contained within Mr_numt)

1,130

0.18

1.80

2 (19)

Microtus townsendii

999

0.19

1.93

3 (69)

Microtus xanthognathus

997

0.17

2.04

1 (1)

Clethrionomys gapperi

1,369

0.16

1.94

5 (418)

Clethrionomys glareolus

884

0.16

2.38

1 (2)

Clethrionomys rutilus

881

0.17

2.39

2 (5)

Synaptomys cooperi

446

0.18

3.16

2 (550)

Ondatra zibethicus

925

0.16

2.04

9 (35)

Length, nuclear sequence length; Ns, proportion of nucleotide substitutions; Ts/Tv, transition/transversion ratio; Indels, total number of indels followed by total number of bases contained within them

Because pseudogenes are thought to be associated with transposable and repetitive elements (Mishmar et al. 2004), we used RepeatMasker (Smit et al. 1996–2004) to identify any such elements in nuclear sequences. We used MEGA 3.1 (Kumar et al. 2004) and PAUP* 4.0b10 (Swofford 2003) to generate estimates of nucleotide variability. We estimated the number of nucleotide differences per site and the ratio of transitions/transversions for all mtDNA/numt pairwise comparisons. Additionally, we estimated pairwise divergences among mitochondrial and among ψcytb sequences according to the Kimura 2-parameter model. To assess saturation in arvicoline mitochondrial sequences, we plotted uncorrected pairwise transitions and transversions against corrected pairwise divergences. We then calibrated divergence estimates within Microtus (excluding M. oregoni; see “Results/Discussion”) using 1.5 MYA as the date of the microtine radiation (Repenning 1990) to obtain an average rate of nucleotide substitution in these pseudogenes. The precise date of the microtine radiation is unclear and it is not known with certainty how many radiation events occurred (Repenning 1980; Chaline et al. 1999). Therefore, we calculated nucleotide substitution rates across the range of plausible dates (0.5–2.0 MYA).

Molecular dating

To date the original translocation of the progenitor Mr_numt sequence from the source mitochondrion to the nucleus, we used QDate 1.1 (Rambaut and Bromham 1998). This method utilizes a maximum-likelihood quartet approach and user-specified calibration dates to estimate divergence times between two monophyletic groups with different rates of evolution. We used quartet dating because rates of nucleotide substitution are expected to differ between the mitochondrion and the nucleus. In addition, this method allows for examination of each pairwise comparison within our dataset to identify numt sequences that may potentially be non-orthologs. Because our ψcytb amplification primers were designed to be gene specific, we assume all sequences are orthologs unless the phylogenetic analyses suggested otherwise (see below).

We used calibration dates derived from fossil data: within Microtus comparisons—1.5 MYA (Repenning 1990); Microtus/Clethrionomys—2.0 MYA (Chaline and Graf 1988); Microtus/Ondatra—2.0 MYA (Carleton and Musser 1984); within Clethrionomys comparisons—2.5 MYA (Chaline and Graf 1988); Clethrionomys/Ondatra—2.5 MYA (Carleton and Musser 1984; Chaline and Graf 1988). We constructed 210 quartets that contained all possible pairs of calibrated taxa within the cytochrome b dataset with each quartet including two mitochondrial and two pseudogene sequences (Fig. 1). Indels were removed prior to analysis. We used the REV (markov general reversible) model of nucleotide substitution with corrections for gamma distribution rate heterogeneity as estimated by Modeltest 3.7 (Posada and Crandall 1998). For all quartets, we conducted log likelihood ratio tests to compare the constrained 1-rate and 2-rate substitution models against the unconstrained 5-rate substitution model. We discarded any quartets that did not fit the 2-rate model in which one pair within a quartet has a different rate from the other as would be expected for our dataset. Saturation tests indicated that saturation was present in the dataset (see “Results”) so we conducted quartet analyses under the same conditions but with third positions excluded.
https://static-content.springer.com/image/art%3A10.1007%2Fs10709-007-9145-6/MediaObjects/10709_2007_9145_Fig1_HTML.gif
Fig. 1

Quartet dating was used to estimate mitochondrial and nuclear divergence times, where one pair consisted of mitochondrial sequences and the other of nuclear sequences. Nodes X and Y are calibration points taken from fossil data, whereas node Z is the estimated date of translocation. Figure modified from Rambaut and Bromham (1998)

Phylogenetic reconstruction

Using maximum parsimony (MP) and maximum likelihood (ML) approaches as implemented in PAUP* 4.0b10 (Swofford 2003), we conducted phylogenetic analyses of the ψcytb pseudogenes and corresponding mitochondrial cytochrome b sequences (Table 1) to identify potentially non-orthologous sequences. MP analyses were performed with heuristic searches using tree-bisection-reconnection (TBR) branch swapping with 100 random addition sequences and 1,000 bootstrap replicates. Strict and majority-rule consensus trees were constructed from all equally parsimonious tress. ML analyses were conducted under the GTR + I + G model with a shape parameter of 0.7625 as determined by Modeltest 3.7 (Posada and Crandall 1998) under the hLRT and AIC criteria. We performed heuristic searches with TBR branch swapping, the as-is addition sequence and 100 bootstrap replicates. Because saturation was observed (see “Results”), we conducted the same analyses with third positions excluded. MP analyses were conducted under the same conditions while ML analyses were performed under the Trn + I + G model with a shape parameter of 0.8087. Indels were removed prior to all analyses. Numt pseudogenes were identified in the more primitive arvicoline species (see “Results”); thus, all trees were rooted with mitochondrial cytochrome b sequences of Cricetus cricetus and C. griseus as the Cricetinae subfamily of Palearctic hamsters is thought to be the sister group to Arvicolinae (Steppan et al. 2004).

Results

Isolation of ψcytb

We successfully amplified the presumptive ψcytb locus in 22 arvicoline species: 20 voles (Microtus and Clethrionomys), one lemming (Synaptomys) and one muskrat (Ondatra) (Table 2). In most cases, the pseudogene-specific primers PcytbF2 and PcytbR generated robust amplicons and unambiguous sequences with no apparent contamination by mitochondrial sequences. However, unique primers were developed for the genus Clethrionomys because of inconsistent results with the PcytbF2 and PcytbR. Amplicon sizes ranged from 446 to 1,369 bp across species. In all cases, pairwise comparisons with mitochondrial cytochrome b sequences revealed numerous stop codons and frameshift indels within the pseudogene sequences.

Elongation of ψcytb in M. rossiaemeridionalis

We amplified the entire Mr_numt fragment, including ψcytb and the flanking regions, in three fragments totaling ∼8 kb (Fig. 2). The total length of the mitochondrial fragment transferred to the nucleus (i.e., Mr_numt) was 3,960 bp. Thus, Mr_numt originated from approximately 25% of the mitochondrial genome and spans 3 mitochondrial protein-coding genes (cytochrome b; NADH 6; NADH5), 6 tRNAs (Pro; Thr; Glu; Leu; Ser; His), and a portion of the control region. After adjusting for indels, Mr_numt differed from its mitochondrial counterpart at 20% of its sites and the ts/tv ratio was 1.55. Nucleotide composition between the two sequence types was consistent, numt/mt: adenosine 0.33/0.33; cytosine 0.27/0.29; guanine 0.12/0.12; thymine 0.29/0.26. Mr_numt is bordered on both ends by a dinucleotide microsatellite repeat (CTn). We were able to sequence through the dinucleotide repetitive region on its 5′ end (126 bp) and obtain an additional 925 bp of flanking sequence until we reached the ligated adaptor (i.e., the end of the Genome Walker fragment). We were unable to sequence through the long dinucleotide repeat on the 3′ end (122 bp) but sequenced toward Mr_numt from the 3′ ligated adaptor. We obtained 1,844 bp of flanking sequence until reaching several mononucleotide repeats that proved resistant to our sequencing efforts (Fig. 2). All sequences generated in this study have been deposited in the GenBank database under the accession numbers (DQ323935–DQ323957, Table 1).
https://static-content.springer.com/image/art%3A10.1007%2Fs10709-007-9145-6/MediaObjects/10709_2007_9145_Fig2_HTML.gif
Fig. 2

Mr_numt sequence (M. rossiaemeridionalis) and associated flanking region, aligned to the corresponding portion of the mitochondrial genome. Associated repetitive and transposable elements, flanking sequences and missing data within the nuclear sequence are all represented. Adaptors were ligated to restriction fragments prior to isolation. SINE = short interspersed element; LINE = long interspersed element; MER = medium reiterated repeats. White spaces within the mitochondrial fragment indicate tRNAs

Comparative sequence analyses

RepeatMasker identified repetitive or transposable elements in ψcytb pseudogenes from four species (M. kikuchii, M. mexicanus, M. oregoni, C. gapperi). Three additional species (M. montanus, M. pennsylvanicus, M. townsendii) harbored a 66-base insertion that was not recognized by RepeatMasker or found within the GenBank database (Table 3, Fig. 3). Direct terminal repeats characterized the insertion sites of this 66 bp insertion. This and other indels (e.g., a 550 bp deletion in S. cooperi) were not identified as recognizable transposable or repetitive elements. RepeatMasker also identified one transposable element within the Mr_numt sequence and three transposable elements within its flanking sequences (Table 3). The length of Mr_numt including the SINE insertion totals 4,128 kb (Fig. 2).
Table 3

Transposable elements associated with putative ψcytb sequences or their flanking regions

Species

Inserted element

Length (bp)

Fraction of sequence (%)

Arvicoline species

M. kikuchii

LTR (ERV_classII)

204

15.6

M. mexicanus

SINE (B1)

152

13.2

M. oregoni

LTR (ERV_classII)

133

9.9

C. gapperi

LTR (ERV_classII)

300

21.9

M. rossiaemeridionalis (Mr_numt)

3′ flanking region

SINE (B1)

152

8.2

SINE (B2-B4)

166

9.0

MER1

348

18.9

Total

666

36.1

5′ flanking region

LINE (LINE1)

296

28.1

MER1

410

38.9

Total

706

67.0

numt sequence

SINE (B1)

151

3.7

Total

151

3.7

LTR, long terminal repeat; SINE, short interspersed element; LINE, long interspersed element; MER, medium reiterated repeats; ERV, endogenous retrovirus

https://static-content.springer.com/image/art%3A10.1007%2Fs10709-007-9145-6/MediaObjects/10709_2007_9145_Fig3_HTML.gif
Fig. 3

Sequence alignment of Microtus 66 bp inserts for M. townsendii, M. montanus, M. pennsylvanicus numts and M. rossiaemeridionalis mtDNA cytochrome b gene. Underlined sections represent direct repeats (note substitutions that have occurred between repeat sequences). Dashes within mtDNA sequence represent 66 bp insertion sites. Numbers beside the alignment indicate nucleotide position within M. rossiaemeridionalis mtDNA cytochrome b gene

Pairwise divergence among Microtus mitochondrial genes ranged from 0.02 to 0.19 (mean = 0.14), while those for nuclear pseudogene sequences ranged from 0.01 to 0.16 (mean = 0.08). For most comparisons, pairwise divergences among pseudogene sequences were generally 2–6× lower than divergences among mitochondrial sequences (i.e., the pseudogenes seem to be evolving more slowly than their mitochondrial counterparts). However, this apparent reduction in the rate of pseudogene evolution did not hold true for Clethrionomys, Ondatra, and M.oregoni. There was evidence of saturation (both transitions and transversions) in most taxa (data not shown).

The scatter plot of numt vs. mitochondrial pairwise divergences revealed clustering among the genera (Fig. 4). Values for Microtus clustered within their respective genera with the exception of M. oregoni, which grouped with Clethrionomys and Ondatra. Overall, we estimated average rates of nucleotide substitution in the Microtus pseudogene as 2.3 × 10−8 per site/per year, using a calibration date of 1.5 MYA. Estimates obtained throughout the span of plausible calibration dates (0.5–2.0 MYA) ranged from 1.7 to 6.8 × 10−8 substitutions per site/per year (Fig. 5).
https://static-content.springer.com/image/art%3A10.1007%2Fs10709-007-9145-6/MediaObjects/10709_2007_9145_Fig4_HTML.gif
Fig. 4

Pairwise divergence values among nuclear sequences plotted against those for mitochondrial sequences. •, Microtus comparisons (minus M.oregoni); ○, comparisons involving M. oregoni; ▲, comparisons involving Clethrionomys; △, comparisons involving Ondatra zibethicus

https://static-content.springer.com/image/art%3A10.1007%2Fs10709-007-9145-6/MediaObjects/10709_2007_9145_Fig5_HTML.gif
Fig. 5

Nucleotide substitution rates in arvicolines as a function of divergence times. The most likely Microtus radiation date of 1.5 MYA is highlighted with an enlarged symbol

Molecular dating

We assessed results generated under the 1-rate and 2-rate substitution models for each possible quartet (n = 210). We tested the 1-rate model to confirm that our quartets were not evolving at the same rate (as expected when considering both mitochondrial and nuclear sequences) and most quartets (n = 159) were indeed rejected. Under the 2-rate substitution model, we first discounted 5 of the 210 quartets because of model nonconformity and another 51 of the 210 quartets where QDate could not establish accurate 95% confidence intervals. Most of the discounted quartets included Clethrionomys, Ondatra, and M. oregoni. We cross-referenced the remaining 154 quartets against the 51 quartets that fit the 1-rate model and found that 25 quartets fit both models and thus were discounted from further analysis. Of these 25 quartets, all contained Clethrionomys, Ondatra, and M. oregoni. We removed the remaining quartets belonging to these taxa (n = 14) because of concerns about orthology (see below), leaving only quartets derived from Microtus. The remaining 115 quartets were used to estimate the mean divergence between mitochondrial and nuclear sequences at a value of 3.99 MYA (standard deviation 0.70; median 3.88; 95% confidence intervals 3.86–4.12). Quartet analyses conducted with only first and second positions did not provide enough resolution as most quartets (n = 168) had to be discounted because they fit both the 1-rate and 2-rate models. The remaining 42 quartets provided an estimate of 4.38 MYA (standard deviation 1.08; median 4.06; 95% confidence intervals 4.04–4.72).

Phylogenetic reconstruction

Similar tree topologies were generated using the assigned substitution models without many changes in structure, branch lengths or support. ML and MP analyses utilizing all codon positions resulted in two clades, one comprised of numt sequences and the other comprised of mtDNA sequences (Fig. 6). Most numt branch lengths were shorter than mtDNA branch lengths. In ML analysis, the Clethrionomys mtDNA sequences were not clustered with the other arvicoline mtDNA sequence and O. zibethicus mtDNA sequences grouped with numt sequences but bootstrap support was not high in either case. Relationships within the two clades were similar; however, within the numt clade, M. oregoni was grouped with Clethrionomys. Similar tree topology was recovered by MP analyses (Fig. 6). Both ML and MP analyses failed to group M. oregoni with other microtines in our analyses despite the monophyly of Microtus as gauged by other phylogenetic studies (Jaarola et al. 2004). The Clethrionomys lineage formed a tight cluster and received strong bootstrap support. Removing third positions from the analyses resulted in the same numt and mtDNA clades but with less resolution among mtDNA sequences (data not shown). M. oregoni numt sequences again clustered with Clethrionomys sequences in both MP and ML analyses.
https://static-content.springer.com/image/art%3A10.1007%2Fs10709-007-9145-6/MediaObjects/10709_2007_9145_Fig6_HTML.gif
Fig. 6

Maximum likelihood tree depicting evolutionary relationships among arvicoline species based on putative ψcytb sequences (numt) and their corresponding mtDNA cytochrome b sequences. Values at nodes indicate bootstrap support with those from ML on the left and those from MP on the right. Nodes with no support values were not observed in MP

Discussion

The original ψcytb pseudogene, described in M. arvalis by DeWoody et al. (1999), is here shown to be widespread among arvicoline rodents. Although we did not characterize the flanking region of ψcytb in all 22 taxa surveyed, we did so in M. rossiameridionalis and found that the entire numt (Mr_numt) spans ∼4 kb, approximately 25% of the entire mitochondrial genome. Multiple stop codons and frameshift indels throughout Mr_numt strongly suggest that it is non-coding. Thus, Mr_numt is almost certainly located in the nuclear genome. Richly and Leister (2004) determined that the average size of rodent numts is 193 bp. Although mammalian numts encompassing multiple mitochondrial genes have been described [e.g., in cats (Lopez et al. 1994; Kim et al. 2006)] most mammalian numts are fragments of single mitochondrial genes (Bensasson et al. 2001). A number of long numts (some encompassing more than 75% of the mtDNA genome) have been identified bioinformatically in sequenced genomes (e.g., humans, Mourier et al. 2001) but Mr_numt is one of the largest numts yet described in animals lacking a complete genome sequence.

Whether numts preferentially insert into nuclear genomes and where these insertion sites occur has been debated in the human literature. Mishmar et al. (2004) suggested that transposable elements influence the integration of mitochondrial sequences and their duplication within the nuclear genome. They analyzed 247 numt flanking regions in the human genome and found that 59% of them were within 150 bp of a repetitive element (∼6 times more often than expected) while 14% of them were inserted directly into repetitive elements. In contrast, Ricchetti et al. (2004) reported that recently transferred numts are associated with coding regions while others associate numts with chromosome break points (Ricchetti et al. 1999; Willet-Brozick et al. 2001). We identified a number of repetitive and transposable elements within Mr_numt and its flanking regions that comprise ∼25% of the total sequence. Mr_numt appears to have been inserted directly into a dinucleotide repeat and is associated with four transposable elements, one within Mr_numt itself and three within its flanking sequence. The distribution of repetitive elements within arvicoline genomes is not known but ∼40% of the mouse and rat genomes are comprised of repetitive DNA derived from mobile elements (Mouse Genome Sequencing Consortium 2002; Rat Genome Sequencing Project Consortium 2004). Therefore, the association of arvicoline numts with repetitive and mobile elements does not seem extraordinarily high. Three Microtus species had an unidentified 66-base insertion into their ψcytb sequences. These inserted sequences consist of repetitive regions and are flanked by short direct repeats but do not match any known sequences (Fig. 3); perhaps suggesting that they are lineage specific.

We were able to isolate ψcytb in most arvicoline species using the same set of primers, but this alone does not indicate orthology. One method for confirming that numts are the result of the same insertion event is to identify insertion boundaries and compare flanking sequences (e.g., Schmitz et al. 2005). However, Mr_numt is flanked by dinucleotide and mononucleotide repeats that proved difficult to sequence. Because the inclusion of paralogous sequences within our dataset could bias many of our estimates, we employed an alternative method for assessing orthology—one of phylogenetic concordance. Our analyses indicate that most of the Microtus ψcytb pseudogenes described herein seem to be orthologs.

The quartet analysis was conducted not only to date the mtDNA transfer but also to identify suspect sequences/taxa. Some of the arvicoline species examined were problematic and led us to discount a number of quartets because of rate homogeneity found within the quartet, or because divergence estimates were contained within the 95% confidence intervals. The majority of those rejected quartets belonged to Clethrionomys, M.oregoni, and Ondatra suggesting that the numt sequences isolated from these taxa are not orthologs to ψcytb in the other Microtus species and thus quartets involving those taxa were disregarded.

The estimate of ∼4 MYA for the Mr_numt translocation date would predate the initial appearances of almost all of the arvicoline species used in this study. Ondatra is the oldest genus in our dataset and fossil records indicate a mid-Pliocene origin (∼3.7 MYA; Repenning 1987). If our estimated date of ∼4 MYA is accurate, then the transfer to the nucleus might have coincided with the diversification of modern arvicoline rodents and be unique to that lineage.

Within our phylogenetic tree, most numts have shorter branch lengths than their mtDNA counterparts, suggesting that the sequences are nuclear and evolving more slowly than those within the mtDNA genome. There was poor resolution within both mtDNA and numt clades but this is not surprising as most North American Microtus species have shown little phylogenetic resolution within mtDNA cytochrome b sequences (Conroy and Cook 2000; Jaarola et al. 2004). The rapid microtine radiation, coupled with the complications of combining mtDNA and nuclear sequences with different rates of evolution into a single analysis, is likely contributing to the poor resolution of our phylogeny. M. oregoni should cluster with other North American voles (Jaarola et al. 2004) but among numt sequences was not even grouped within the Microtus clade. It is instead clustered with the more basal arvicoline taxa, suggesting a more ancient numt transfer or (more likely) that our sequence from M. oregoni is not of ψcytb but of a paralog (Fig. 6).

The relationships among numt and among mtDNA sequences were similar; however, O. zibethicus mtDNA cytochrome b sequence clustered with the numt sequences rather than the mtDNA sequences (Fig. 6). Although bootstrap support was not high, this may reflect the putative date of numt transfer (∼4 MYA) coinciding with the emergence of this species within the fossil record (Repenning 1987). The pseudogene and mtDNA sequences did recover the same clade consisting of M. montanus, M. townsendii, and M. pennsylvanicus. All three species share a synapomorphic 66-base insertion (Table 3) and are otherwise known to be closely related taxa (Conroy and Cook 2000). Note that our phylogenetic analyses were not conducted specifically to recover the systematic relationships of arvicolines, but were used to identify potentially non-orthologs sequences (e.g., those of M. oregoni).

Our tests for saturation among arvicoline mtDNA cytochrome b sequences revealed evidence of saturation at both transitions and transversions (data not shown). Saturation at transitions would be expected because of the transition bias found in animal mtDNA that has been reported previously for rodents (Yang and Yoder 1999) but saturation at transversions is surprising; transversions typically increase linearly with time in the animal mitochondrial genome (Moritz et al. 1987). Pairwise divergences among arvicoline ψcytb sequences suggest that they are evolving more slowly than their mitochondrial counterparts, as is typical of mammalian numts (Zhang and Hewitt 1996). However, pairwise comparisons involving Ondatra, Clethrionomys, and M.oregoni suggest that their numt sequences were evolving almost as rapidly as their mitochondrial DNA. This seems unlikely, and a more parsimonious explanation is that the sequences represented by these taxa actually represent independent numts that are not orthologs of ψcytb. The scatter plot of nuclear pairwise divergences vs. mitochondrial divergences showed a clustering pattern among genera that reflected the phylogenetic relationships, with M. oregoni grouped among the more basal taxa (Fig. 4). While our analyses suggest that the numts from M. oregoni, Clethrionomys and Ondatra are not orthologs of the Microtus numts, they could be orthologs with respect to each other.

By disregarding suspected paralogs, our estimates of substitution rates should be conservative. Microtus is known to exhibit an elevated evolutionary rate within their mitochondrial genome (Triant and DeWoody 2006) but it is unknown whether the mtDNA rate is concordant with the evolutionary rate of the nuclear genome. Our estimate of 2.3 × 10−8 for the substitution rate provides the first evidence of the evolutionary rate within the Microtus nuclear genome (Fig. 5), and is greater than that described for other mammals. Kumar and Subramanian (2002) estimated the average substitution rate in mammalian genomes to be 2.2 × 10−9 per year whereas estimates of the rate of neutral substitution are 2.0 × 10−9 for humans and 4.5 × 10−9 per year for mice (Mouse Genome Sequencing Consortium 2002). Li et al. (1981) estimated the average rate of nucleotide substitution for mouse, human and rabbit pseudogenes as 4.7 × 10−9. In contrast to Kumar and Subramanian (2002) but consistent with our results, Nachman and Crowel (2000) used processed pseudogenes to estimate the human mutation rate, equivalent to the substitution rate for neutral genes, as 2.5 × 10−8. The causes of the disparity among these three studies are not immediately apparent, but one obvious reason might be sampling error: we only sampled a single pseudogene in over 20 taxa, Nachman and Crowel (2000) sampled 12 autosomal pseudogenes in humans and chimpanzees, and the genome sequence estimates (Mouse Genome Sequencing Consortium 2002) are based on ancestral repeats. Further research is needed to determine if the rate of nucleotide substitution in the Microtus nuclear genome is, like the mtDNA genome, accelerated relative to other mammals.

Prior to this study, numt pseudogenes had been reported within the arvicoline rodents (DeWoody et al. 1999; Jaarola et al. 2004; Jaarola and Searle 2004) but only the ψcytb numt in M. arvalis had been described and characterized. Although the driving forces behind the integration of mtDNA sequences into nuclear genomes (and their subsequent dispersal and accumulation) are not well understood, they may be associated with chromosome breaks. If so, then the rampant chromosomal fission/fusion events in the Microtus lineage has afforded countless opportunities for numt insertions and thus voles could prove to be a valuable model.

Acknowledgements

We are grateful to the Natural Science Research Laboratory in The Museum of Texas Tech University, The University of Alaska Museum, The Museum of Southwestern Biology, and The Museum of Vertebrate Zoology at Berkeley for loaning us the tissue necessary for this study. We thank David Bos, Joe Busch, Jill Detwiler, Dave Glista, David Gopurenko, Maarit Jaarola, Emily Latch, Jamie Rudnick, Sara Turner, Rod Williams and anonymous reviewer for comments on an earlier version of this manuscript. Our lab is funded in part by the USDA, the NSF, and Purdue University. This work is contribution number 2006-17842 from Purdue University.

Copyright information

© Springer Science+Business Media, Inc. 2007