Introduction

Genome duplication is a powerful evolutionary mechanism that has contributed to the diversity of the vertebrate lineage (Ohno 1970). Present evidence supports that two rounds of genome duplication (called 1R and 2R) occurred in early chordate phylogeny and are common to the ancestor of jawed vertebrates (cartilaginous, lobe-finned, and ray-finned fishes) (Sidow 1996). The clade of ray-finned fishes (Actinopterygii, Fig. 1) underwent a third round of genome duplication dubbed the 3R or the FSGD (fish specific genome duplication, shaded area in Fig. 1) (Taylor et al. 2001; Christoffels et al. 2004; Vandepoele et al. 2004). The FSGD is proposed to be a whole genome event (Taylor et al. 2003; Brunet et al. 2006), a fact that is well supported by the observation that spotted green pufferfish (Teleostei; Tetraodon nigroviridis) has two syntenic regions (paralogons) corresponding to each single region in the human genome (Jaillon et al. 2004). Comparative mapping, furthermore, shows that paralogons of pufferfish (Tetraodon), zebrafish (Danio) (Woods et al. 2005) and medaka (Oryzias) (Kasahara et al. 2007) are homologous. This supports the view that the FSGD occurred prior to the divergence of these teleosts.

Fig. 1
figure 1

Simplified phylogeny of jawed vertebrates, with focus on ray-finned fishes (actinopterygians). The jawed vertebrate clade consists of three branches, the cartilaginous (Chondrichthyes), the lobe-finned (Sarcopterygii), and ray-finned (Actinopterygii) fishes (Le et al. 1993; Venkatesh et al. 2001; Kikugawa et al. 2004; Inoue et al. 2003); the close relationship of cichlids is supported by both nuclear genes and phylogenomics data (Chen et al. 2004; Steinke et al. 2006). The shaded area marks the lineages that share the fish-specific genome duplication (FSGD)

The earliest inklings of the FSGD came from comparative analysis of Hox genes and clusters in different chordate lineages (Amores et al. 1998, 2004; Chiu et al. 2002, 2004). Hox genes, which encode transcription factors that play a central role in embryonic patterning of the body plan, are usually organized in clusters in the genome, although there are exceptions in some invertebrate lineages (Monteiro and Ferrier 2006). Evidence to date suggests the basal state of Hox clusters in jawed vertebrates is four (A, B, C, D), as is found in cartilaginous [shark (Chiu et al. 2002; Kim et al. 2000; Prohaska et al. 2004; Venkatesh et al. 2007)], lobe-finned [human (Krumlauf 1994); coelacanth (Koh et al. 2003; Powers and Amemiya 2004)], and basal ray-finned [bichir (Chiu et al. 2004)] fishes.

In contrast, zebrafish has seven Hox clusters that house expressed genes (Aa, Ab, Ba, Bb, Ca, Cb, Da (Amores et al. 1998), where Aa and Ab duplicated clusters are each orthologous to the single HoxA cluster of outgroup taxa such as human (Amores et al. 1998, 2004; Chiu et al. 2002) Recently, the Db cluster (the 8th cluster) in zebrafish has been found to contain a single microRNA and no open reading frames (ORFs) (Woltering and Durston 2006). Evidence of duplicated Hox clusters is reported for additional teleosts including pufferfishes [Takifugu rubripes and Tetraodon nigroviridis (Jaillon et al. 2004; Amores et al. 2004; Aparicio et al. 2002)], medaka [Oryzias latipes (Kasahara et al. 2007; Kurosawa et al. 2006; Naruse et al. 2000)], striped bass [Morone saxatilis (Snell et al. 1999)], killifish [Fundulus heteroclitus (Misof and Wagner 1996)], cichlids [Oreochromis niloticus (Santini and Bernardi 2005), Astatotilapia burtoni (Hoegg et al. 2007; Thomas Chollier and Ledent 2008)], salmon [Salmo salar (Moghadam et al. 2005b; Mungpakdee et al. 2008)], rainbow trout [Oncorhynchus mykiss (Moghadam et al. 2005a)], goldfish [Carassius auratus (Luo et al. 2007)], and Wuchang bream [Megalobrama amblycephala (Zou et al. 2007)].

Comparative analysis of Hox clusters and genes in teleosts showed that the duplicated Hox a and b clusters have experienced divergent resolution producing variation in gene content (Lynch and Force 2000; Prohaska and Stadler 2004) and increased rates of substitution in both protein coding (Chiu et al. 2000; Wagner et al. 2005; Crow et al. 2006) and noncoding (Chiu et al. 2002, 2004; Tumpel et al. 2006) sequences. Consistent with a shared duplication, the Hox paralogs form two distinct a and b clades (Amores et al. 2004). All teleosts examined to-date represent only two species-rich actinopterygian clades, the Ostariophysi (e.g. zebrafish), and Euteleostei (Acanthopterygii: pufferfishes, killifish, medaka, bass, and cichlids; Salmoniformes: salmon, trout), comprising 6,000 and 16,000 species, respectively (Nelson 1994) (Fig. 1).

In itself, a whole genome duplication does not lead to a gain of function. In fact organims with very recently duplicated genomes, such as tetraploid Xenopus species (Pollet and Mazabraud 2006; Evans 2008) or the polyploid relatives of the carp (Luo et al. 2006), remain very similar to their ancestors. The duplicated gene complement, however, provides the raw material for the evolution of new functions due the relaxation of the contraints on the individual paralogs (Force et al. 1999; Conant and Wolfe 2008).

One may ask whether the FSGD is directly responsible for the biological diversification (i.e. speciosity) of ray-finned fishes (Vogel 1998; Wittbrodt et al. 1998; Meyer and Schartl 1999; Venkatesh 2003; Postlethwait et al. 2004; Meyer and Van de Peer 2005; Volff 2005). Alternatively, species-richness and large-scale duplications have to be considered as independent phenomena. The examination of the actinopterygian fossil record (Donoghue and Purnell 2005) shows that there are 11 extinct clades between teleosts and their closest living relatives. The authors conclude that the character acquisitions often attributed as synapomorphies of derived teleost fishes arose gradually in ray-finned fish phylogeny with many innovations already predating the FSGD. Many of these extinct clades that have been shown to predate the FSGD were species rich themselves. Hence fossil evidence suggests that the FSGD is uncoupled to species richness. By showing that the species-poor Osteoglossomorpha exhibit duplicated Hox clusters, we add molecular evidence to this view.

Evidence from a handful of molecular evolution studies is consistent with this hypothesis. Phylogenetic analyses of four Hox genes (HoxA11, HoxB5, HoxC11, and HoxD4) (Crow et al. 2006), duplicated ion and water transporter genes in eels eels (Cutler and Cramb 2001), three nuclear genes fzd8, sox11, tyrosinase (Hoegg et al. 2004), androgen receptors (Douard et al. 2008), the ParaHox cluster (Mulley et al. 2006), and combined datasets (Hurley et al. 2007) in basal, intermediate and derived actinopterygians together suggest that the FSGD is coincident with the origin of teleosts. More precisely, the data place the duplication event after the divergence of bowfin (Amia) and sturgeon (e.g. Acipenser but prior to the appearance ∼135 mya of the lineages leading to 23,637 (93%) of the 23,681 extant species of present-day teleosts teleosts (Benton 2005), Fig. 1.

In order to assess the Hox complement in the earliest teleost lineages we identified Hox genes in the goldeye (Hiodon alosoides), a member of the species-poor Osteoglossomorpha (Nelson 1994; Hurley et al. 2007; Benton 2005). Results of a PCR survey of Hox genes in the goldeye coupled with phylogenetic analyses of four individual Hox orthologs (HoxA10, HoxA13-1, HoxA13-2, HoxC4) provide conclusive evidence that the goldeye has duplicated Hox clusters. The organization of the goldeye Hox clusters, however, is significantly different from that of other teleosts, in that it has retained Hox genes in all eight clusters.

Materials and methods

Gnathostome Hox genes

Nucleotide and amino acid sequences of individual Hox genes analyzed in this study came from three sources: genome databases, published literature, and targeted PCR amplification using degenerate primers designed here (see below). Amphioxus (Brachiostoma floridae) homebox sequences are from (Garcia-Fernández and Holland 1994; Ferrier et al. 2000). The representative of the cartilaginous fishes is horn shark (Heterodontus francisci): HoxA cluster, AF479755; HoxD, cluster AF224262. The representatives of the lobe-finned fishes are coelacanth (Latimeria menadoensis) and frog (Xenopus tropicalis). Coelacanth homeobox fragments are listed in (Koh et al. 2003); we (Chiu et al. 2000) also sequenced the HoxA11 ortholog (AF287139). Frog Hox clusters were taken from the Ensembl Web Browser Xenopus tropicalis genome JGI3: HoxA, scaffold29 1,777,789–2,133, 531; HoxB, scaffold329 415,000–1,016,000; HoxC, scaffold280 199,492–581,365; HoxD scaffold353 474,676–800, 000.

The representatives of the ray-finned fishes include bichir (Polypterus senegalus) and several teleost fishes. The bichir HoxA cluster was assembled from two BAC clones with accession numbers AC126321 and AC132195 as in in (Chiu et al. 2004). Zebrafish (Danio rerio) Hox clusters were assembled from PAC clones: HoxAa, AC107364; HoxAb, AC107365 (with an alteration of nucleotide 79,324 from T to C to avoid a premature stop codon); HoxBa, BX297395, AL645782; HoxBb, AL645798; HoxCa, BX465864 and BX005254; the HoxCb cluster was taken from Ensembl Web Browser Danio rerio genome (Zv5); HoxDa, BX322661. The zebrafish HoxDb cluster does not house Hox genes (Woltering and Durston 2006) and was excluded in this study. Nile tilapia (Oreochromis niloticus) HoxAa, AF533976; striped bass (Morone saxatilis) HoxAa, AF089743. Medaka (Oryzias latipes) Hox clusters AB232918AB232924. Spotted-green pufferfish (Tetraodon nigroviridis) Hox clusters were extracted from the Tetraodon Genome Browser: Footnote 1 HoxAa, chr.21 2,878,001–3,153,406; HoxAb, chr.8 6,506,471–6,727, 504; HoxBa chr.Un 37,928,410–38,293,032; HoxBb, chr.2 1,321,876–1,537,033; HoxC, chr.9 4,083,941–4,353,227; HoxDa, chr.2 10,975,763–11,218,409 (a T was deleted at position 11,134,740 in order to shift back to correct frame); HoxDb, chr.17 9,471,355–9,694,740. Japanese pufferfish (Takifugu rubripes) Hox clusters were acquired from the Ensembl genome browser (assembly FUGU 2.0). The HoxAa cluster is constructed from the entire scaffold47, the HoxAb cluster is constructed from scaffold330, see (Chiu et al. 2002). Short homeobox fragments for QM analysis were in addition taken from (Prohaska and Stadler 2004).

PCR amplification, cloning, and sequencing

Whole genomic DNA was extracted from ∼80 mg of ethanol preserved tissue of goldeye (Hiodon alosoides) and lightfish (Gonostoma bathyphilum) using the DNeasy kit (Qiagen) and protocols.

PCR amplification of an 81 base pair (bp) fragment of the highly conserved homeobox of PG1-8 was performed using a degenerate homebox primer pair \([334: 5^{\prime}{\text{-}}{\tt GAR YTI GAR AAR GAR TTY}{\text{-}}3^{\prime}; 335: 5^{\prime}{\text{-}}{\tt ICK ICK RTT YTG RAA CAA}{\text{-}}3^{\prime}].\) PCR amplification of an 114 bp fragment of the highly conserved homeobox of PG9 13 was performed using the degenerate primers [HB913forward: \(5^{\prime}{\text{-}}{\tt AAA GGA TCC TGC AGA ARM GNT GYC CNT AYA SNA A}{\text{-}}3^{\prime}; HB113Rev: 5^{\prime}{\text{-}}{\tt ACA AGC TTG AAT TCA TNC KNC KRT TYT GRA ACC\newline A}{\text{-}}3^{\prime}\) ]. PCR amplifications were performed with AmpliTaq Gold DNA polymerase (Applied Biosystems) using the following cycling parameters: initial denaturation at 95°C for 5 min, 30 cycles of 95°C for 1 min, 50°C for 1 min, and 72°C for 1 min, and final extension at 72°C for 10 min. Final concentration of MgCl2 was 3.5 mM. Amplified fragments were purified by agarose gel extraction (Qiagen) and cloned into a pGEM-T Easy vector (Promega) following the manufacturer’s protocol. Clones containing inserts of the correct size were identified using colony PCR and sequenced at the UMDNJ-RWJMS DNA Sequencing and Synthesis Core Facility.Footnote 2 For each clone, both strands were sequenced using T7 and SP6 sequencing primers.

Initial assignment of PCR fragments

The 81 and 114 bp long sequences of PG1-8 and PG9-13 homeoboxes, respectively, were compared with the corresponding sequence fragments from a range of chordates (see above). The membership of each PCR fragment to one of the paralog groups Hox1Hox13 was initially determined based on nucleotide and amino acid sequence similarity to published Hox sequences using \({\tt blast}\) (Altschul et al. 1990, 1997). The second layer of analysis used neighbor-joining (Saitou and Nei 1987) trees with deduced amino acid sequences (see Electronic Supplementary material) and assigned goldeye PCR fragments based on assigned the identity of the subtree in which they are located. With the exception of the “middle-group paralogs” Hox4Hox7, we find that the paralog-groups are reconstructed as monophyletic clades (with the exception of the posterior sequences from Amphioxus (Garcia-Fernández and Holland 1994; Ferrier et al. 2000).

Assignment by quartet mapping

All subsequent analyses were performed using homeobox nucleotide sequences. Middle-group genes were identified using quartet mapping (QM), see (Nieselt-Struwe and von Haeseler 2001) and an application of QM to homeobox PCR fragments from lower vertebrates (Stadler et al. 2004) for additional details. To this end, we use the teleost homeobox sequences from (Amores et al. 2004), the collection of homeobox fragments from (Prohaska and Stadler 2004), sequences of human, shark, coelacanth and the bichir HoxA cluster (Chiu et al. 2004) as well as sequences from our own unpublished PCR study of the bichir (Raincrow et al. 2009). We first determine QM support for paralog groups PG4, PG5, and the combination of PG6 and PG7. For those sequences that are not identified as PG4 homeoboxes, we rerun the analysis computing support for PG5, PG6, and PG7.

In a second experiment we then consider trees of the form (({x},R), (U,(V,W))) or (({x}, (R,U)), (V,W)), where {x} denotes the query sequence from Hiodon and {R, U, V, W}  =  {PG4, PG5, PG6, PG7} are the sets of known homeobox sequences from the four middle paralog groups. Together with the query sequence, we thus consider quintets, which can be represented in the form of six inequivalent quartets depending on which pair of paralog groups form a common subtree:

$$ \begin{array}{lll} ((\{x\},R)|(U,(V,W))); ((\{x\},R)|(V,(U,W))); ((\{x\},R)|(W,(U,V)));\\ ((\{x\},(R,U))|(V,W)); ((\{x\},(R,V))|(U,W)); ((\{x\},(R,W))|(U,V)). \end{array} $$

We analyze each of these six quartets using QM, i.e., we determine which assignment of the four paralog groups to R, U, V, W yields the maximal support for the tree. This yields a support value for each Hiodon query sequence x to be placed in a common subtree with either a single paralog group or with a pair of paralog groups. Ideally, x is placed together with the same paralog group R three times and placed together with the combination of R and one other paralog group in the remaining three quartets. Our implementation \({\tt quartm}\) of the QM method performs this quartet analysis of quintets automatically. The program can be free downloaded from the authors’ website.Footnote 3

Assignment by phylogenetic analysis

The QM analysis was complemented by the construction of neighbor joining (Saitou and Nei 1987) and maximum parsimony (Swofford 2003) trees from the same datasets. In the next step we used the same procedure separately for each paralog group to assign a sequence to one of the four gnathostome clusters HoxA, HoxB, HoxC, HoxD. In the final step we then attempted to resolve the assignment of the Hiodon PCR fragments from each class to one of the two teleost-specific paralog groups.

Sequencing of four Hox orthologs

All PCR amplifications were performed with AmpliTaq Gold DNA polymerase (Applied Biosystems). Cloning and sequencing were performed as described above.

Goldeye duplicated HoxA13-1 and HoxA13-2 sequences and the lightfish HoxA13b-like sequence (Figs. 3a, 4) were PCR amplified using universal HoxA13 primers sequences sequences (Chiu et al. 2004) using the following PCR conditions (initial denaturation at 95°C for 5 min, 30 cycles of 95°C for 1 min, 53°C for 1 min, and 72°C for 3 min, and final extension at 72°C for 10 min. Final concentration of MgCl2 was 2.0 mM). The lightfish Hoxa13b-like sequence is deposited in Genbank (1122802); the goldeye duplicated HoxA13.1 and HoxA13.2 sequences have accession numbers 1122788 and 1122792, respectively.

Two overlapping primer pairs were used to PCR amplify the goldeye HoxA10-like sequence (Fig. 3c and Supplemental Figure 2). The first set of degenerate primers (Hox-A10Uforward: \(5^{\prime}{\text{-}}{\tt CDG TNC CVG GYT ACT TCC G}{\text{-}}3^{\prime}; Hox-A10Ureverse: 5^{\prime}{\text{-}}{\tt CCC AAC AAC AKR ARA CTA CC}{\text{-}}3^{\prime}\)) amplify approximately the last third of exon 1, the intron, and most of exon 2 using the following cycling parameters (initial denaturation at 95°C for 5 min, 30 cycles of 95°C for 1 min, 55°C for 1 min, and 72°C for 1 min, and final extension at 72°C for 10 min. Final concentration of MgCl2 was 3.5 mM). To amplify the N-terminal portion of exon 1 we designed a forward primer (PFCA75: \(5^{\prime}{\text{-}}{\tt TTT GYW CRA GAA ATG TCA GC}{\text{-}}3^{\prime}\)) from an evolutionarily conserved noncoding sequence (PFCAEF75; Raincrow et al. 2009) immediately upstream of the HoxA10 start codon. PCR using this forward primer and a reverse primer (Halexon1R: \(5^{\prime}{\text{-}}{\tt CCT TAG AAG TTG CAT AAG CC}{\text{-}}3^{\prime}\)) that is specific to the goldeye HoxA10-like exon 1 sequence (described above), was performed under the reaction conditions (initial denaturation at 95° C for 5 min, 30 cycles of 95°C for 1 min, 55°C for 1 min, and 72°C for 1 min, and final extension at 72°C for 10 min. Final concentration of MgCl2 was 3.0 mM). The HoxA10-like sequence of goldeye built from a contig of these overlapping PCR fragments, spanning from the promoter to exon 2, is deposited in Genbank (1122799).

Fig. 2
figure 2

Hox cluster complement of chordates with focus on actinopterygians. The Hox cluster of Amphioxus is shown at the top. The Hox genes are depicted as colored rectangles for coelacanth (outgroup; red); zebrafish (blue), medaka (light green), tilapia (dark green), Tetraodon (pink) and Fugu (magenta) are shown in the top panel. Putative goldeye Hox genes, as inferred from the PCR fragments, are depicted as colored rectangles in the bottom panel. Black rectangles indicate homeoboxes that are assigned to a specific paralog group and cluster (e.g. B) but not to a teleostean a or b clade (see text). Fuscia rectangles indicate homeoboxes that are assigned to a specific paralog group, cluster and clade. Green rectangles depict homeobox fragments assigned to a specific paralog group but not cluster

Fig. 3
figure 3

Examples of phylogenetic analysis of Hox exon 1 sequences. Species abbreviations as in Fig. 1. a HoxA13 tree reconstructed using neighbour-joining (Saitou and Nei 1987) analysis of HoxA13 amino acid sequences. Bootstrap support (2000 replications) are shown at the nodes. b HoxA10 tree reconstructed using Bayesian (Ronquist and Huelsenbeck 2003; Altekar et al. 2004) analysis of amino acid sequences. Node confidence values of 1,000,000 generations are shown. c Consensus HoxC4 tree reconstructed using Neighbor joining (Saitou and Nei 1987), heuristic maximum parsimony (Swofford 2003), and maximum likelihood Swofford:03,Zwickl:06 analyses of amino acid sequences. Node confidence values are listed as NJ/HMP/B. d Consensus HoxC4 tree reconstructed using Neighbor joining analysis of nucleotide sequences. Node confidence values are listed as NJ/MP/B/ML. See text for details of phylogenetic analysis

The HoxC4 ortholog of bichir (Polypterus senegalus, Pse; (1123044,1123047 and the HoxC4a-like paralog of goldeye (Hiodon alosoides, Hal; Genbank 1122797 were amplified with a degenerate primer pair (HoxC4forward: \(5^{\prime}{\text{-}}{\tt CAT GAG CTC GTY TTT GAT GGA}3^{\prime}; HoxC4Reverse: 5^{\prime}{\text{-}}{\tt AYT TCA\newline TCC TKC GGT TCT GA}{\text{-}}3^{\prime}\)) using the following PCR conditions (initial denaturation at 95°C for 5 min, 30 cycles of 95°C for 1 min, 53°C for 1 min, and 72°C for 3 min, and final extension at 72°C for 10 min. Final concentration of MgCl2 was 2.0 mM).

Phylogenetic analysis of exon 1 sequences

Alignments of Hox gene nucleotide sequences were done using the \({\tt clustalW}\) algorithm (Thompson et al. 1994) in the software package MacVector, version 8.1.1, using default settings. Nucleotide sequences were trimmed so each sequence was of equal length. Alignments of Hox gene predicted amino acid sequences were done using the \({\tt clustalW}\) algorithm in the software package MacVector version 8.1.1 using default settings. Amino acid alignments were corrected by eye and trimmed so each sequence was of equal length. Alignments can be viewed in the Electronic Supplement.

Maximum Parsimony trees were created using PAUP* v4.0b10 (Swofford 2003) under the parsimony optimality criterion. Heuristic searches were performed under default settings. Neighbor-Joining (Saitou and Nei 1987) trees were also created using the PAUP* v4.0b10 package using the distance optimality criterion with default settings. Maximum Likelihood trees were obtained using GARLI v0.951 v0.951 (Zwickl 2006). Default settings were used unless otherwise stated below. Starting trees were obtained using heuristic search under the likelihood optimality criterion in PAUP* v4.0b10 (Swofford 2003), default settings were used. The substitution model was set to the two rate model which corresponds to the HKY85 model. Under the Run Termination criteria “Bootstrap repetitions” was set to 2,000 and “Generations without improving topology” was set to 5,000 as suggested in the \({\tt GARLI}\) manual when using bootstrap repetitions. For all three methods, node confidence was scored using the bootstrap resampling method and 50% cutoff.

Bayesian trees were obtained using \({\tt MrBayes}\) v3.1.2 (Ronquist and Huelsenbec 2003). and the parallel version of \({\tt MrBayes}\) v3.1.2 (Altekar et al. 2004). \({\tt MrBayes}\) settings were as follows: two rate substitution model, relative rate distribution = gamma, number of generations = 1,000,000, sample freq = 1,000, number of chains = 4, and temperature = 0.2. “Burn-in” was assessed using the “sump” command. Normally, the first 1 or 2 trees were discarded as “burn-in” before creating the final consensus tree. Node confidence was scored using the Bayesian posterior probability provided by the program.

Phylogenetic networks were computed using the neighbor-net algorithm (Bryant and Moulton 2004) implemented in the \({\tt SplitsTree}\) package (Huson and Bryant 2006) using the same distance matrices that also underlie the neighbor-joining trees.

Results

The first step of this study is to estimate the number of Hox clusters in the goldeye (Hiodon alosoides). Using degenerate primers that target homeoboxes (see “Materials and methods”), we cloned and sequenced a total of 421 Hox fragments (81 and 114 bp long, depending on the primer set utilized) and 23 non-Hox fragments (not further analyzed). Using a combination of \({\tt blast}\) (Altschul et al. 1990, 1997), similarity, QM (Nieselt-Struwe and von Haeseler 2001), and phylogenetic analyses (Electronic Supplement, Footnote 4 the 421 Hox sequences group into 41 unique sequences (Figure 2). For each sequence, allelic exclusion tests were performed as described in in (Misof and Wagner 1996). The 41 homeobox sequences of goldeye found in this study have been deposited in GenBank FJ015270FJ015310. A full list is provided in the Electronic Supplement.

As shown in Fig. 2 (bottom panel), the goldeye has duplicated paralogs on each of the four Hox clusters. For HoxA-like clusters, there is evidence for duplicated group 10, 11, and 13 paralogs; HoxB-like clusters, group 4; HoxC-like clusters, groups 5, 6, 9, 12, 13; and HoxD-like clusters, groups 3 and 10. Strikingly, the goldeye is the only teleost fish examined to date that has evidence for retained Hox genes on each of the eight Hox clusters (Aa, Ab, Ba, Bb, Ca, Cb, Da, Db.

Phylogenetic analysis and QM mapping, however, assigned only thirteen sequences to a or b paralog clades observed in advanced teleost fishes (Fig. 2). About the same number of sequences is preferentially classified with the unduplicated genes in bichir, shark, or sarcopterygians. The PCR fragments therefore do not provide enough information to decide whether the goldeye shares the Hox duplication with the crown teleosts, i.e., whether its eight Hox clusters are orthologous to the eight teleost Hox loci, or whether an independent duplication event occured in Osteoglossomorpha.

Because the homeobox sequence amplified in a genomic PCR survey is so short, we chose to further investigate this problem by examining exon sequences of four Hox orthologs, HoxA13 (two paralogs), HoxA10 and HoxC4. For the HoxA13 locus, we cloned and sequenced the gene proper region of two HoxA13-like paralogs (Hal13.1 and Hal13.2) including the beginning of exon 1 (12aa from the start codon), intron, and most of exon 2 including the homeobox. Notably, the homeodomain sequences of Hal13.1 and Hal13.2 are identical to homeobox fragments 13.1 and 13.2, respectively, isolated in our independent PCR survey of H. alosoides whole genomic DNA.

Interestingly, while homebox fragments 13.1 and 13.2 are tentatively assigned as HoxA13a and HoxA13b (Fig. 2), gene tree reconstructions using Hal13.1 and Hal13.2 exon 1 amino acid sequences (Fig. 3a) show that both HoxA13-like paralogs of goldeye do not group in either the HoxA13a or HoxA13b clades of teleost fishes. Instead, both HoxA13 paralogs of goldeye branch at the base of teleosts, prior to the duplication but after divergence of bichir (P. senegalus), the most basal living lineage (Chiu et al. 2004; Mulley et al., 2006). Gene trees reconstructed using exon 1 nucleotide sequences do not resolve the phylogenetic position of the two HoxA13-like paralogs (see also Supplemental Figure 1a).

We examined the exon 1 nucleotide sequences of each HoxA13-like paralog in goldeye and did not detect evidence for gene conversion (data not shown). Interestingly though, when we examined the predicted primary amino acid sequence of Hal13.1 and Hal13.2 paralogs, we found that they share many amino acids at positions that have diverged in the duplicated paralogs of all crown teleosts [zebrafish (Chiu et al. 2002); medaka (Kasahara et al. 2007; Naruse et al. 2000; Kurosawa et al. 2006), tilapia (Santini and Bernardi 2005), lightfish (this study) and pufferfishes (Jaillon et al. 2004; Aparicio et al. 2002)], see Fig. 4. The amino acid positions shared by the duplicated HoxA13-like paralogs in goldeye are the ancestral sites, as determined by their shared presence in the bichir (Polypterus senegalus), which has a single HoxA cluster (Chiu et al. 2004). We examined whether there is selection acting on synonymous substitutions (Ks) at these two loci in the goldeye (Yang, 1997), but we did not find any statistical support (data not shown). Our findings for the goldeye HoxA13-like paralogs are striking because they do not exhibit a pattern of sequence evolution consistent with intensive diversifying selection (van de Peer et al. 2001; Crow et al. 2006) following duplication. The goldeye thus may be a good model to test the predictions of the DDC model (Force et al. 1999), whereby amino acid sequence divergence of duplicated paralogs may be small but divergence in regulatory sequences is large.

Fig. 4
figure 4

Goldeye duplicated HoxA13-like paralogs do not diverge at the amino acid level. Cartoon depiction of HoxA13 exon 1 and exon 2 domains. Amino acid numbers according to HoxA13a of pufferfish (Takifugu), see text. Amino acid positions (black bars) that diverge in the duplicated HoxA13a and Hoxa13b paralogs of species-rich teleosts are shown and contrasted with the duplicated HoxA13-like paralogs of goldeye. Only two amino acid positions diverge in goldeye (asterisks). See text for further description

Using overlapping primer sets (see below), we cloned and sequenced the gene proper region of a HoxA10-like sequence (Fig. 3b) including a promoter sequence (not shown). The homeodomain sequence of the HoxA10-like ortholog is an exact match to fragment 10-1 (Fig. 2), assigned as a HoxA10 homeobox. As illustrated in phylogenetic analysis of exon 1 amino acid sequences, the HoxA10-like sequence of goldeye branches outside the duplicated HoxA10a and HoxA10b clades (Fig. 3b), similarly to the HoxA13-like paralogs (Fig. 3a). The topology of this gene tree is similar to that reported in (Hurley et al. 2007) for other nuclear genes. Interestingly, the promoter of the goldeye HoxA10-like ortholog also has not acquired diagnostic teleostean paralog a and b specific nucleotides (not shown). There are at least two possibilities that could account for these results. First, following Hox cluster duplication, goldeye retains only a single HoxA10 locus that did not accumulate substitutions at an increased rate observed when both duplicated paralogs are retained following duplication in teleost crown groups groups (Chiu et al. 2000; Wagner et al. 2005; van de Peer et al. 2001). In fact, phylogenetic analysis of exon 1 of the single HoxA10b locus in zebrafish provides strong support for branching within the teleostean b clade only at the amino acid (Fig. 3), but not nucleotide sequence (Supplemental Figure 1b) level. Hence, following a duplication, if one of the paralogs is immediately lost, the rate of nucleotide substitution of the remaining singlet may be conservative. A second possibility raised by our findings is that goldeye experienced a duplication that is independent from that in the crown group of ostariphysians and acanthomorphs. A third scenario, although not tenable with available data, is that goldeye experienced massive gene loss shortly after the FSGD and subsequently experienced lineage specific duplications of all or parts of its genome, including the Hox clusters, minimally the HoxA-like cluster.

Intriguingly, phylogenetic analysis of the majority of exon 1 of a HoxC4-like sequence found in this study provides strong support that this locus is HoxC4a-like at the level of amino acid (Fig. 3c) and nucleotide (Fig. 3d) sequences. Hence, this result supports that goldeye shares the FSGD. Importantly, the homeodomain sequence of this HoxC4a-like locus is an identical match to our PCR homeobox survey fragment 4–5 (Fig. 2) that we independently assigned as HoxC4a using phylogenetic methods and QM (Table 1 in the Electronic Supplement). This result, i.e., that goldeye experienced the FSGD, is consistent with the phylogenetic branching arrangement of three Hox genes HoxA11α, Hoxa11β, and HoxB5β in goldeye into HoxA11a, HoxA11b, and HoxB5b teleostean clades, respectively (Crow et al. 2006). Interestingly, our PCR survey above detected two unique HoxA11-like homeobox fragments (11-1, 11-2, Fig. 2 that both are assigned, with weak support, to be HoxA11b-like. Our PCR screen did not yield HoxB5-like homeobox sequences.

Discussion

Our findings contribute to the understanding of the Hox complement in a basal teleost lineage (Fig. 2) and permit inferences on when duplicate Hox paralogs have been lost in actinopterygian phylogeny.

While acanthomorpha have completely lost one of the HoxC duplicates, and ostariophysi as well as Salmoniformes have lost all protein coding genes from one of the HoxC duplicates, goldeye has retained Hox genes of all eight clusters. As illustrated in Fig. 2, goldeye in particular possesses duplicate paralogs of HoxB4, HoxC5, HoxC6, HoxD3, and HoxD10. In contrast zebrafish, with the exception of HoxC6 (Amores et al. 1998), medaka (Kasahara et al. 2007; Naruse et al. 2000; Kurosawa et al. 2006) cichlids (Santini and Bernardi 2005; Hoegg et al. 2007; Thomas-Chollier and Ledent 2008),, and pufferfishes (Aparicio et al. 2002; Jaillon et al. 2004), each possess at most a single copy of these loci (Fig. 2). Based on fossil evidence, we infer that these genes were lost in the time interval spanning from 250 million years ago (Amia) to 135 million years ago (appearance of ostariophysans) (Benton 2005).

The functional consequences of this seeming bias in gene losses remain to be explored. One prediction is that the remaining single ortholog of each locus may exhibit a pattern of sequence evolution diagnostic of negative or stabilizing selection, which is in contrast to the pattern of strong positive selection (i.e. molecular adaptation with K a /K s  > 1) that has been reported when duplicated paralogs are retained, such as the zebrafish HoxC6a and HoxC6b paralogs (van de Peer et al. 2001), HoxA cluster duplicated paralogs of ostariophysan and acanthomorph lineages (Chiu et al. 2000; Wagner et al. 2005) and other nuclear loci (Brunet et al. 2006).

The duplication of the Hox gene system in goldeye together with previously reported duplications (relative to the gnathostome ancestor) of several other nuclear genes in other bony tongues (Hoegg et al. 2004) suggests that we are dealing with a whole-genome duplication. A genome duplication, or the possession of a duplicated Hox system in particular, is therefore uncoupled from species-richness. Our results emphasize the genome plasticity of actinopterygians in general and suggest that different mechanisms may be at work in the earliest (species poor) versus later (species rich) teleost fishes.

Strictly speaking, our data fail to conclusively resolve the question whether or not the duplicated Hox clusters in goldeye are true orthologs of the eight teleostean clusters. As illustrated in Fig. 3a, the branch length of each HoxA13-like sequence in goldeye is long, suggesting they derive from an ancient duplication and not a lineage specific duplication as observed in paddlefish for HoxB5 duplicated paralogs (Crow et al. 2006). The ambiguity of the phylogenetic analysis, furthermore, in itself implies that the duplication observed in osteoglossomorpha must have been very close in time to the divergence of this lineage from crown teleosts, a conclusion also drawn in (Crow et al. 2006). This is illustrated nicely by the phylogenetic networks in Fig. 5, which show that the phylogenetic signal (branch lengths) separating the FSGD from the divergence of Osteoglossomorpha and crown teleosts is comparable to the noise inherent in the available data.

Fig. 5
figure 5

Neighbor-net analysis of the HoxA13 (left) and C4 (right) nucleic acid sequences. The-nnet represents also alternative hypotheses by expanding edges to boxes corresponding to alternative splits that are also supported by the sequence data. This provides a graphical impression on the tree-likeness of the data and vizalizes the signal to noise ratio of the data set

In conclusion, our analysis is consistent both with independent duplications in both lineages shortly after the osteglossomorpha-crown teleost split, and with the—more parsimonious—interpretation of a single FSGD pre-dating this divergence (Crow et al. 2006). We suspect that a definitive resolution of this question will require genome-wide data as well as a denser taxon sampling at key points in actinopterygian phylogeny.