Introduction

More than 375 complete sequenced mitochondrial genomes from ray-finned fishes have so far (December 2007) been submitted to the database (http://www.ncbi.nlm.nih.gov), and many of these sequences have contributed considerably to resolving phylogenetic relationships among fishes. Evolutionary relationships at different taxonomic levels have been addressed, including Division (Inoue et al. 2003; Miya et al. 2003), Subdivision (Ishiguro et al. 2003), Genus (Doiron et al. 2002; Minegishi et al. 2005), and Species (Yanagimoto et al. 2004; Ursvik et al. 2007).

The circular mitochondrial genomes from ray-finned fishes are usually small (ca. 17,000 bp), compactly organized, and encode only a subset of the mitochondrial gene products (13 protein genes, 22 tRNA genes, 2 rRNA genes) (Boore 1999). The mitochondrial gene order is invariable among the ray-finned fishes, except some notable examples of tRNA gene duplications and rearrangements at three different mitogenomic sites. The tRNA-gene IQM cluster located between the NADH dehydrogenase subunit 1 (ND1) and ND2 genes is the most frequently observed rearrangement site (Miya et al. 2001, 2003; Mabuchi et al. 2004; Satoh et al. 2006). The second rearrangement site includes the tRNA-Glu (E) gene usually located between ND6 and cytochrome B (CytB) genes, but occasionally transposed into a position between CytB and the control region (CR) (Miya and Nishida 1999; Inoue et al. 2001; Satoh et al. 2006). Interestingly, codfishes belonging to the family Macrouridae (the grenadiers) frequently harbor tRNA gene rearrangements at both mitogenomic sites (Satoh et al. 2006). A third rearrangement site was recently reported in WANCY tRNA gene cluster of the blackspot seabream, including oriL disruption and heteroplasmic tRNA gene duplications (Ponce et al. 2008).

The complete mtDNA genomes have so far only been reported for a few Gadidae species, a family harboring commercially and ecologically important codfish species. These include Atlantic cod (Gadus morhua, Johansen and Bakke 1996; Ursvik et al. 2007), walleye pollock (Theragra chalcogramma, Yanagimoto et al. 2004), Norwegian pollock (Theragra finnmarchica, Ursvik et al. 2007), haddock (Melanogrammus aeglefinus, Roques et al. 2006; Ursvik et al. 2007), and whiting (Merlangius merlangus, Roques et al. 2006). In addition, extensive partial mtDNA sequences were reported recently by Carr and coworkers for an additional five Gadidae species (Coulson et al. 2006). Several of the Gadidae species have been investigated in phylogenetic (Carr et al. 1999; Møller et al. 2002; Bakke and Johansen 2002, 2005; Coulson et al. 2006; Teletchea et al. 2006; Ursvik et al. 2007) or population studies (Shields and Gust 1995; Sigurgislason and Arnason 2003; Arnason 2004; Yanagimoto et al. 2004) based on mtDNA sequences. These reports show that the Gadidae fishes are typically characterized by a monophyletic evolutionary origin and a low-level geographic structuring of mitochondrial genotypes.

The Arctic cod (Arctogadus glacialis) and Polar cod (Boreogadus saida) are endemic to the Arctic Ocean, and considered key species in Arctic ecosystems (Smetacek and Nicol 2005). Both are important for commercial fisheries in the production of fishmeal and fish oil (Cohen et al. 1990). Only two reported studies have included both these species in mtDNA-based phylogeny. Møller and coworkers analyzed approximately 400 nt positions of the CytB gene, and concluded that A. glacialis and B. saida apparently have to be considered as sister taxa, but this clustering was not well supported in their statistical tests (Møller et al. 2002). Recently, a phylogenetic analysis based on almost complete mitochondrial genome sequences of several Gadidae species, including A. glacialis and B. saida, was reported. This study corroborated the findings by Møller and coworkers, but with improved statistical support (Coulson et al. 2006). In an attempt to further approach the Gadidae taxonomy, we have determined the complete mitochondrial genome sequences of A. glacialis and B. saida and performed phylogenetic analysis based on both the nucleotide and the derived amino acid sequences. Sister taxa affiliation of A. glacialis and B. saida was strongly supported. Interestingly, the sequencing analyses identified a novel gene arrangement within the mitochondrial genome of B. saida that involves duplications of tRNA genes and the origin of light strand replication (oriL).

Materials and methods

Fish samples and DNA extraction

The B. saida specimens (Bs1 and Bs2) and the A. glacialis specimen (Ag1) were collected from the Arctic Ocean, and DNA was extracted from muscle tissue by using the mtDNA Extractor CT Kit from Wako as previously reported (Ursvik et al. 2007).

PCR amplification, cloning, and DNA sequencing

The PCR and sequencing primers (Table 1) were designed from the published Atlantic cod mtDNA sequence (X99772; Johansen and Bakke 1996) and used to amplify the mitochondrial genomes in 1–4 kb fragments. Each PCR reaction (25 μl) included the total DNA sample as well as 0.2 mM dNTPs, 0.2 μM of each primer, and 0.6 U of Expand HiFidelity polymerase and buffer (Roche). The PCR reactions were performed in a Peltier 200 Thermal cycler with the following cycling parameters: 94°C initial denaturation for 2 min., 30 cycles with 94°C denaturation for 15 s, 53°C annealing for 30 s, 68°C elongation for 2–3 min, and a final extension at 72°C for 7 min. The total amplified reactions were run on 0.8 or 1% agarose gels containing ethidium bromide, and bands were excised and purified with Qiagen gel extraction kit. PCR products from the control region were inserted into the p-Drive vector (Qiagen) and transformed in E. coli EZ competent cells. Positive clones were verified with plasmid isolation (Wizard plasmid purification kit) and EcoRI restriction cutting followed by agarose gel electrophoresis. Direct sequencing was performed on the remaining PCR products on both strands using the BigDye version 3.1 kit (Applied Biosystems) with the same primers as in the PCR and internal primers. The sequencing reaction mixture included 10–60 ng of the PCR-template, 0.35 μM primer, 2 μl BigDye mix, 1 μl 5× reaction buffer in a 10 μl volume. The sequencing products were analyzed in an ABI genetic analyzer (Applied Biosytems).

Table 1 PCR and DNA sequencing primers

Data analyses

Computer analyses of DNA sequences were performed using software from DNASTAR Inc. The phylogenetic analyses were based on two sets of multiple alignments of nucleotide sequences derived from the complete mitochondrial genomes of 16 specimens representing eight closely related species within the Gadidae family. One alignment set was constructed from 16,783 nucleotide positions covering the complete mitochondrial genomes except the highly variable ETAS (extended termination associated sequence) domain within the control region. The second alignment set was constructed by aligning the 11,400 nucleotide positions (3,800 amino acids) of the 13 concatenated protein-coding gene excluding the stop codons. This nucleotide alignment set was translated to amino acids and used in phylogenetic analysis. The multiple alignments were constructed using ClustalX version 1.81 (Thompson et al. 1997) and manual refinements. Molecular phylogenies were reconstructed by using the tree-building methods of minimum evolution (ME) and maximum parsimony (MP) in MEGA version 3.1 (Kumar et al. 2004) and MEGA version 4.0 (Tamura et al. 2007), as well as maximum likelihood (ML) in PAUP* (version 4.0b10) (Swofford 2002). ME trees were built from all the nucleotides present in both nucleotide alignment sets. Furthermore, ME trees were built from the alignment of the nucleotides of the 13 concatenated protein-coding genes by using synonymous substitutions, codon positions 1 and 2, and translated amino acids. For all the ME analyses, different models of nucleotide or amino acid substitutions were used. MP trees were built from both alignment sets using all nucleotides, and from the alignment of the nucleotides of the 13 concatenated protein-coding genes using codon nucleotide positions 1 and 2, as well as translated amino acids. Analyses that include nucleotides were performed by heuristic searches using close-neighbor-interchange (CNI) search level 3 and production of initial trees by random addition of sequences (100 replications), while the max–mini branch and bound tree search option was used for construction of trees from the translated amino acid alignment. ML trees built from the alignment covering the near complete mt genome were based on the sequence evolution model HKY + I+ G selected by the computer program WinModeltest version 4b (Posada and Crandall 1998), while trees built from the alignment of the nucleotides (all nt positions, and codon positions 1 and 2) of the 13 concatenated protein-coding genes used the evolution model GTR + I+ G selected by WinModeltest. The reliabilities of tree branching points of the ME, MP, and ML trees were evaluated by bootstrap analyses (2,000 replications).

Results and discussion

Gene content and organization of A. glacialis and B. saida mitochondrial genomes

The complete mitochondrial genome sequences of A. glacialis (isolate Ag1) and B. saida (isolate Bs1) were determined by a primer walking approach and found to be 16,644 and 16,745 bp, respectively. These sizes are similar to those of other known Gadidae species (Johansen and Bakke 1996; Yanagimoto et al. 2004; Roques et al. 2006; Ursvik et al. 2007). The circular mtDNA genomes contain the same set of 13 protein coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes found in all sequenced vertebrate mtDNA to date, and in an overall organization similar to most bony fish mitochondrial genomes (Fig. 1a). Furthermore, GC contents, base compositions, and codon usages were found to be very similar to other sequenced Gadidae mitochondrial genomes (see Roques et al. 2006).

Fig. 1
figure 1

Gene content and organization of A. glacialis and B. saida mitochondrial genomes. A linear map of the circular mtDNA is presented. All genes, except ND6 and eight of the transfer RNA genes (indicated by the standard one-letter symbols for amino acids), are encoded by the H-strand. Transfer RNA genes encoded by the L-strand are indicated below the diagram. Abbreviations: SSU and LSU mitochondrial small- and large-subunit ribosomal RNA genes, ND1–6 NADH dehydrogenase subunits 1–6, COI–III cytochrome c oxidase subunits I–III, A6 and A8 ATPase subunits 6 and 8, CytB cytochrome B, oriH and oriL origin of H-strand and L-strand replication; CR control region containing the D-loop. The mtDNA regions containing sequence duplications in A. glacialis (CR) and B. saida (WANCY tRNA cluster) are indicated below the mt-genome organization map. The complete mitochondrial genomes of B. saida and A. glacialis have been assigned the accession numbers AM919428 and AM919429, respectively, in the EMBL Database

Rearrangements and heteroplasmic duplications of oriL and flanking tRNA genes in B. saida

Initial analysis indicated the DNA sequencing heterogeneity within the tRNA gene cluster flanking the origin of light-strand replication (oriL) in B.saida. A PCR-amplification and DNA sequencing approach was included to study this region in more detail. The tRNA gene cluster and some flanking sequences were amplified from DNA isolated from two specimens of B. saida as well as from one A. glacialis (Fig. 2a) and subsequently separated by agarose electrophoresis (Fig. 2b). Although A. glacialis generated a product (DNA I) consistent with the typical vertebrate organization (see Fig. 2a), the two B. saida specimens generated larger products (DNA II and DNA III). Interestingly, one of the B. saida specimens (Bs2) appears heteroplasmic in the tRNA gene/oriL cluster, since both the DNA II and DNA III products were observed (Fig. 2b).

Fig. 2
figure 2

Duplication of oriL and flanking tRNA genes in B. saida. a Schematic map of the analyzed mtDNA region (WANCY tRNA gene cluster). The cluster was amplified using a forward primer (FP; L4876) and a reverse primer (RP; H5785). b Separation of amplified products in a 0.8% agarose gel. M, the size marker 1 Kb Plus DNA Ladder from Invitrogen (right). Analyzed amplified DNA bands named I, II, and III are indicated (left). Two specimens of B. saida (Bs1 and Bs2) were analyzed in addition to the A. glacialis specimen (Ag1). Both Bs1 and Bs2 appear heteroplasmic in the WANCY tRNA gene cluster. The additional smaller DNA bands (less than 800 bp) are due to unspecific primer bindings to sequences present in the total DNA preparations. c Schematic summary of sequence analyses of the fragments I, II, and III. The multiple copies of oriL (presented as a hairpin structure), tRNA-Asn (N), and truncated tRNA-Ala (ψA) present in B. saida are identical in sequence. NC noncoding sequence

Four of the amplified DNA fragments (DNA I from Ag1, DNA II from Bs1 and Bs2, and DNA III from Bs2) were eluted from the gel, cloned into a plasmid vector, and subsequently DNA-sequenced. The results are summarized in Fig. 2c and confirmed rearrangement-mediated duplications and insertions that include oriL and nearby tRNA genes in B. saida. In the DNA II product oriL and tRNA-Asn gene (N) were perfectly duplicated, but oriL was followed by a 40-bp unique noncoding (NC) sequence. Furthermore, a truncated tRNA-Ala gene (ψA) was located between the NC region and the duplicated tRNA-Asn gene. The DNA III product was identical to DNA II, but with a duplicated insertion (Fig. 2c).

An extensive partial mitochondrial genome sequence was reported from B. saida (Coulson et al. 2006). Unfortunately, the tRNA gene cluster sequence in this specimen was not well resolved, and probably the insertion escaped notice. Related tRNA gene rearrangements to those we observe in B. saida have been reported in another vertebrate system, the mitochondrial genomes of lungless salamanders (Mueller and Boore 2005). Duplication of the oriL region and flanking tRNA genes was observed, as well as the presence of truncated tRNA genes. Interestingly, all these mitochondrial gene rearrangements appear closely linked to the origin of replication, and the authors suggest that imprecise biochemical processes during mtDNA replication may contribute to such rearrangements (Mueller and Boore 2005). Recently, a heteroplasmic rearrangement was reported at the same tRNA gene cluster of the fish blackspot seabream (Ponce et al. 2008). Our finding of DNA replication-induced mtDNA rearrangements in B. saida supports this arrangement as a widespread, but sporadic, molecular feature within vertebrates. The molecular implication in mtDNA replication of multiple oriL copies has not yet been investigated.

Direct repeat motifs in the control region of A. glacialis

The major noncoding region noted within A. glacialis and B. saida mitochondrial genomes is located between the tRNA-Pro (P) and tRNA-Phe (F) genes. This region, named the control region (CR), constitutes about 7% of vertebrate mitochondrial genomes and commonly contains the displacement loop (D-loop), the promoters for heavy and light strand transcription, the heavy-strand replication origin (oriH), and several binding sites of specific protein factors (Falkenberg et al. 2007). The CRs of A. glacialis and B. saida (Fig. 3a) are divided into three domains, namely, the extended termination-associated sequence (ETAS) domain, the central conserved (CC) domain, and the conserved sequence block (CSB) domain. Although the vertebrate CSB domain contains important functional sequences involved in oriH replication (CSB-2 and CSB-3) and transcription initiation (Foran et al. 1988; Sbisa et al. 1997), the CC domain still lacks functional assignment. The ETAS domain is highly variable among the Gadidae and harbors heteroplasmic tandem repeats (40 nt HTR motif) in G. morhua (Johansen et al. 1990; Arnason and Rand 1992; Johansen and Bakke 1996) at the RS1 site (nomenclature by Savolainen et al. 2000). Interestingly, the A. glacialis ETAS domain contains two direct repeat motifs (DR1 and DR2) present in two copies each (Fig. 3b). While the 23 bp DR1 motif partly overlaps the G. morhua HTR, the 26 bp DR2 motif contains the pyrimidine run (HPyR) known to be heteroplasmic in Gadidae (Yanagimoto et al. 2004; Ursvik et al. 2007). These functional unassigned A. glacialis sequence motifs are apparently conserved among the Gadidae fishes, but are present in single copy in most species investigated except A. glacialis.

Fig. 3
figure 3

Organization and variability of the control region in Gadidae species. a Schematic organization of the mitochondrial control region (CR) in A. glacialis and B. saida. CR is located between the tRNA genes Pro (P) and Phe (F), and contains the highly conserved termination association sequence (TAS), the conserved sequence box D (CSB-D), the pyrimidine sequence run (Py-run) in the CC-domain, and the two conserved sequence boxes in the CSB-domain associated with oriH (CSB-2 and CSB-3). b Detailed view of the highly variable extended termination association sequence (ETAS-domain) in completely sequenced mitochondrial genomes representing six Gadidae genera (Arctogadus, A. glacialis AM919429; Boreogadus, B. saida AM919428; Gadus, G. morhua X99772; Theragra, T. finnmarchica AM489718; Melanogrammus, M. aeglefinus AM489717; Merlangius, M. merlangus DQ020496). The 40-bp heteroplasmic tandem repeat (HTR) motif present in G. morhua, and the A and B copies of the two direct repeat motifs (DR1, 23 bp; DR2, 26 bp) in A. glacialis, are boxed. The 5′ end of the tRNA-Pro gene, TAS, and the heteroplasmic pyrimidine run (HPyR) are indicated below the sequence alignment. HPyR is heterogenic both within and between Gadidae specimens (Yanagimoto et al. 2004; Ursvik et al. 2007)

Mitogenomic phylogeny strongly supports sister taxa affiliation of A. glacialis and B. saida

Several mtDNA-based reports have included A. glacialis and/or B. saida in phylogenetic analysis (e.g., Carr et al., 1999; Møller et al. 2002; Bakke and Johansen 2005; Teletchea et al. 2006; Coulson et al. 2006). A common finding is that Arctogadus and Boreogadus appear more related to Gadus and Theragra than to any other genera within the Gadidae family. Møller and coworkers reported parsimony clustering of Arctogadus and Boreogadus, but without significant statistical support (Møller et al. 2002). Similarly, both parsimony and likelihood analyses based on near complete mitochondrial genome data sets support sister taxa affiliation of A. glacialis and B. saida, but with some statistical restrictions (Coulson et al. 2006). In an attempt to further investigate the taxonomy of A. glacialis and B. saida, we performed mitogenomic-based phylogeny that included two specimens of each taxa. Thus, eight species representing five closely related Gadidae genera (Arctogadus, Boreogadus, Gadus, Theragra, and Melanogrammus) were included. Phylogenetic analyses were performed on datasets based on the alignment of the almost complete mitochondrial genome (16,783 nuclotide positions) and on the alignment of the nucleotides of the 13 concatenated protein-coding genes (all 11,400 positions, synonymous substitutions, or only first and second codon positions), and on the derived protein sequences (3,800 amino acid positions). All trees resulted in essentially identical tree topologies. A representative ML tree is presented in Fig. 4, and supports the conclusions by Møller et al. (2002) and Coulson et al. (2006) that A. glacialis and B. saida are closely related to the genera Gadus and Theragra. These analyses could not conclusively resolve some of the internal relationships between the Gadus and Theragra species, but strongly support the notions that G. macrocephalus/G. ogac and T. chalcogramma/T. finnmarchica have to be considered as only two, not four, distinct Gadidae species (Coulson et al. 2006; Ursvik et al. 2007). Finally, we conclude that A. glacialis and B. saida represent sister taxa with a more recent common evolutionary origin compared to the other Gadidae species investigated. This conclusion was further supported in our analyses by high bootstrap values in all trees reconstructed by ME, MP, and ML.

Fig. 4
figure 4

Maximum-likelihood (ML) phylogenetic tree based on the alignment of 16,783 nucleotide positions of the mitochondrial genomes from 16 specimens representing eight Gadidae species (accession numbers are indicated). The evolution model is HKY + G+ I. Tree reconstruction using maximum parsimony (MP) or minimal evolution (ME) resulted in essentially identical tree topologies. Bootstrap values (%) of 2,000 replication are shown at the internal nodes (ML/MP/ME). M. aeglefinus was used as an out-group. The sister taxa affiliation of B. saida and A. glacialis is indicated by black bar