Introduction

The Fox genes are united by encoding a fork head domain, an approximately 110 amino acid protein domain first identified by comparison of the Drosophila melanogaster fork head and rat HNF3A gene sequences (Weigel and Jackle 1990). Crystallographic studies have shown that the fork head domain is able to bind deoxyribonucleic acid (DNA) in a sequence-specific manner (Clark et al. 1993), and the genes are considered to encode transcription factors. Since this initial description and characterisation, numerous Fox genes from a wide variety of taxa have been described; for example, the human genome appears to include approximately 40 Fox genes, the Strongylocentrotus purpuratus genome 22 and the D. melanogaster genome 17 (Lee and Frasch 2004; Mazet et al. 2003; Tu et al. 2006).

There is considerable variation in the literature in the nomenclature applied to these groupings, with the terms class, family, subclass etc. applied differently by different authors. In this paper, we use the convention that the Fox genes as a whole form a class (Kaestner et al. 2000). Molecular phylogenetic analyses have been used to classify this diversity of Fox genes into a number of subclasses, currently named from FoxA to FoxS, with four subclasses (FoxL, FoxJ, FoxN and FoxQ) being subsequently split to yield 23 subclasses in total (Kaestner et al. 2000; Mazet et al. 2003; Tu et al. 2006). All but two of these subclasses have been identified in one or more invertebrates (Larroux et al. 2006; Magie et al. 2005; Mazet et al. 2003; Tu et al. 2006); the exceptions are the two most recently defined subclasses, FoxR and FoxS, which to date appear to be confined to vertebrates (Tu et al. 2006; Wotton and Shimeld 2006).

Based on these studies, we can infer that at least 21 Fox gene subclasses would have been primitively present in the B. floridae genome. Previous studies have reported the characterisation of 11 Fox genes from B. floridae (Table 1), with a 12th gene (whn, a FoxN1/4 gene) from another amphioxus species, Branchiostoma lanceolatum (Table 1). One study has examined the genomic organisation of a subset of amphioxus Fox gene subclasses, providing evidence for the linkage of FoxC, FoxL1, FoxF and FoxQ1 in the B. floridae genome (Mazet et al. 2006). In this paper, we report a comprehensive analysis of the Fox gene complement of the B. floridae, based on the draft genome sequence. We conclusively identify members of all 21 subclasses from FoxA to FoxQ2, with the FoxR and FoxS subclasses not represented. In addition, one further Fox gene falls into a subclass that appears to be missing from vertebrates, and one gene defies classification. Examining the genomic location of all the Fox genes identified reveals several that are linked, including some lineage-specific tandem duplications and some more ancient associations.

Table 1 B. floridae Fox gene models, names and associated information

Materials and methods

Several authors have attempted to classify the Fox genes via molecular phylogenetic analyses, leading to a reasonable understanding of the diversity of sequences included in this gene class (Kaestner et al. 2000; Larroux et al. 2006; Magie et al. 2005; Mazet et al. 2003). We utilised this understanding to select a representative range of Fox sequences with which to search the B. floridae genome. All searches were carried out via the Joint Genome Institute (JGI) B. floridae genome site (http://genome.jgi-psf.org/Brafl1/Brafl1.home.html). We initially searched with amino acid sequences against predicted gene models using BLASTP. Where multiple overlapping models represented a single gene, we chose the model that appeared most accurate. Our criteria for this were that the model included the full fork head domain and where possible matched complementary DNA (cDNA) information. We then searched the full-genome sequence using TBLASTN to identify Fox genes missed by the automated gene prediction methods. Finally, we searched the genome using BLASTN using previously published B. floridae Fox sequences (listed in Table 1) to identify which sequences corresponded to these described genes. Predicted amino acid sequences for all identified loci were assembled into a comprehensive dataset of B. floridae Fox genes. Representative Fox sequences from other species were downloaded from Genbank (see Figs. 1, 2, and 3 for accession numbers).

Fig. 1
figure 1

Molecular phylogenetic analysis of B. floridae Fox genes by Bayesian methodology based upon the fork head domains. Inferred subclass groupings are boxed in grey, with the subclass indicated to the right of the figure and supporting value in bold. B. floridae genes are in red and prefixed Bf. For a full list of gene models, see Table 1. Numbers adjacent to nodes indicate posterior probabilities. Other abbreviations: Dm Drosophila melanogaster, Mm Mus musculus, Dr Danio rerio. For these species, accession numbers of sequences are shown adjacent to the gene name

Fig. 2
figure 2

Bayesian molecular phylogenetic analysis of the FoxQ2, FoxQ1, FoxH and FoxF genes. Species abbreviations as Fig. 1 plus: Ce Caenorhabditis elegans, Ch Clytia hemisphaerica, Ci Ciona intestinalis, Dr Danio rerio, Hs Homo sapiens, Oa Ornithorhynchus anatinus, Sp Strongylocentrotus purpuratus, Tc Tribolium castaneum, Xl Xenopus laevis, Xt Xenopus tropicalis

Fig. 3
figure 3

Bayesian molecular phylogenetic analysis of the FoxD and FoxE genes. Species abbreviations as Fig. 1 plus: Aa Aedes aegypti, Ci Ciona intestinalis, Cs Ciona savignyi. Dj Dugesia japonica, Dr Danio rerio, Gg Gallus gallus, Nv Nematostella vectensis, Sp Strongylocentrotus purpuratus, Xl Xenopus laevis, Xt Xenopus tropicalis

Sequence datasets were imported into BioEdit (Hall 1999) for alignment and trimmed to the fork head domain, then into ClustalX (Thompson et al. 1997) for neighbour-joining analysis with 1,000 bootstrap replicates. Bayesian phylogenetic analysis was conducted in MrBayes3.1 using default settings (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003). The analysis was continued for 1 million generations, with the first 25% discarded when compiling summary statistics and consensus trees. Phylogenetic trees were viewed in Treeview (Page 1996) and then imported into Powerpoint for labelling. In initial analyses we used the full Fox gene complement of D. melanogaster and the full Fox gene complement of the mouse, as previous analyses show these are the best annotated representatives of the protostome invertebrates and vertebrates, respectively. The exceptions to this were the FoxQ2 subclass, which is missing from most mammals and FoxAB subclass, which is missing from vertebrates; Danio rerio and S. purpuratus sequences were used instead. For more focused analyses sequences were drawn from a phylogenetically wider range of species.

To identify Fox gene linkages, all instances of co-occurrence of Fox genes on a single scaffold were examined via the JGI genome browser. Distances between genes were recorded as the distance from the predicted start point of each gene model. The number of other predicted genes between pairs of Fox genes was also recorded.

To identify expressed sequence tags (ESTs) covering each amphioxus Fox gene, we also used BLASTN to search a B. floridae cDNA/EST database (Yu et al. 2008) using all identified Fox gene model sequences from the draft genome and previously published Fox gene sequences. We also searched this data set using TBLASTN with the range of Fox sequences used to search the genome to detect any genes that might be present in the EST set but absent from the genome assembly. When the representative ESTs were identified for a specific Fox gene, the corresponding cDNA cluster ID was recorded from the cDNA/EST database, and the representative cDNA clone was picked from the B. floridae cDNA resource (Yu et al. 2008) or as otherwise indicated in Table 2. The full-length insert sequences of all available Fox gene cDNA clones were determined by primer walking and deposited in Genbank under the accession numbers EU581674 to EU581697.

Table 2 EST counts of Fox genes in cDNA libraries of B. floridae derived from five developmental stages

Results and discussion

Genes, haplotypes and nomenclature

Our searches identified 47 automated gene models representing distinct genes in the current assembly and that encode Fox domains (Table 1). In addition, one further genomic region was found to encode a fork head domain. We also noted that the B. floridae FoxB gene, previously identified through cDNA sequencing (Mazet and Shimeld 2002), was missing from this list. Searching the genome with this cDNA sequence identified a small scaffold of approximately 6 kb that contained only the 5′ region of the cDNA, excluding the fork head domain. Similarly, a FoxC orthologue has been previously identified from B. floridae via both cDNA and genomic clones (Mazet et al. 2006) but was not represented amongst the gene models. Searching the assembled genome with this cDNA sequence found the gene to reside on scaffold 6, though as with FoxB the fork head domain appears to be missing from the current assembly. This resulted in a total of 50 putative Fox genes in the current assembly of the B. floridae genome (Table 1).

One common issue encountered with the current assembly of the B. floridae genome is that frequently the two haplotypes for a single locus have been separately assembled. These are easily confused with genuine lineage-specific gene duplications. We therefore carefully examined our data set to identify and distinguish haplotypes and lineage-specific gene duplications. Initial molecular phylogenetic analyses were used to derive a preliminary family-level classification of all 50 B. floridae Fox sequences (not shown). Sequences falling close together within the resultant trees were further compared with respect to their primary sequence, their relationship to ESTs and the similarity of neighbouring genes. Pairs of sequences were considered haplotypes if they mapped to the same EST sequences (where present) and if their neighbouring genes exhibited high levels of sequence identity. Sixteen such pairs were robustly identified (Table 1), and we conclude these are haplotypes of 16 individual loci, though we cannot formally exclude the possibility that some may represent recent segmental duplications in the genome and hence separate loci. The FoxE genes presented a particular problem with haplotype identification, and these are discussed in more detail below.

Sequences inferred to be haplotypes have been given the suffix A or B (Table 1). We also identified several probable lineage-specific gene duplications (see below), where a Fox subclass contained more than one B. floridae member. These were given the suffix a, b, c etc. These relationships resolved well for most subclasses, with the exceptions discussed in detail below.

The B. floridae Fox gene complement

At least one member of all 21 gene subclasses from FoxA to FoxQ2 was identified, with the exception of FoxB and FoxC. Fork head domains of the FoxB and FoxC types were not found in the current assembly; however, these genes have been isolated as cDNAs, their expression has been described (Mazet et al. 2006; Mazet and Shimeld 2002), and parts of both these cDNAs are represented in the genome. We hence conclude that both genes are part of the full B. floridae Fox complement but that parts of them (including the fork head domains) are missing from the current assembly. The allocation of B. floridae Fox genes to subclasses was robustly supported by Bayesian analyses in most instances, and the genes have been named accordingly (Table 1; Fig. 1). Exceptions and ambiguous placements were subject to more focused analyses (Figs. 2, 3) and are discussed in more detail below. Both FoxR and FoxS subclasses are absent from B. floridae. These subclasses are probably vertebrate specific, as they have not been identified in any invertebrate (Mazet et al. 2003; Tu et al. 2006; Yagi et al. 2003). We also identified a FoxAB-like gene, as described by Tu et al (2006). This subclass appears to have been lost by vertebrates.

Four B. floridae Fox gene subclasses show evidence of lineage-specific duplication (Table 1). The two FoxA genes have been previously described from cDNA sequence (Shimeld 1997) and clearly represent distinct genes. They are closely linked in the genome, being separated by approximately 30 kb and divergently transcribed (Fig. 4). Two distinct FoxN1/4 genes on separate scaffolds were identified, one of which is orthologous to that previously described from B. lanceolatum (Schlake et al. 2000). Three putative FoxQ2 genes on separate scaffolds were identified; one, which we name BfFoxQ2a, has been previously identified via cDNA sequence and shown to be expressed in developing embryos (Yu et al. 2003). The other two, BfFoxQ2b and BfFoxQ2c, are distinct at the sequence level. The expression of BfFoxQ2c is supported by EST data, while no ESTs were identified for BfFoxQ2b. BfFoxQ2c is on the same scaffold as the FoxC–FoxL1–FoxF–FoxQ1 cluster (see below). In the molecular phylogenetic analyses of the full Fox Class (Fig. 1), BfFoxQ2c appears to be most closely related to the D. melanogaster FoxQ2 gene CG11152, with BfFoxQ2a and BfFoxQ2b more distantly related. To examine this more closely, we conducted a more focused molecular phylogenetic analysis incorporating FoxQ2 genes from more species (Fig. 2), including a cnidarian, two other protostomes, a sea urchin, zebrafish and the duck billed platypus (FoxQ2 genes are absent from eutherian mammals and are presumed to have been lost). BfFoxQ2a and BfFoxQ2b lie basal in the FoxQ2 group (Fig. 2), but the posterior probabilities supporting this arrangement are not high (0.89 and 0.66, respectively), and hence we conclude these amphioxus genes are genuine but divergent members of the FoxQ2 subclass.

Fig. 4
figure 4

Schematic illustration of putative Fox gene clusters in B. floridae. Lines connecting genes indicate linkages on the indicated scaffold. Distances between genes are shown in kilobases above the line, and the number of predicted genes lying between the Fox genes is shown below the line. Arrows indicate relative direction of Fox gene transcription

The FoxE subclass is unusually complex in B. floridae, with ten identified gene models. Two are haplotypes of one locus, which we name FoxEa (Table 1). This gene corresponds to that previously named as FoxE4 from the cDNA sequence (Yu et al. 2002a) and is linked to FoxD (Yu et al. 2002b) on scaffold 244 (Fig. 4). The remaining eight models lie on four different scaffolds and include two linked pairs of genes on scaffolds 251 and 559 and three linked genes on scaffold 455 and a single gene on scaffold 1173 (Fig. 4). All eight genes are of very similar sequence. To investigate this diversity more closely, we conducted a focused molecular phylogenetic analysis (Fig. 3), including FoxE genes and FoxD genes from a wider variety of species (FoxD is the most closely related Fox subclass to FoxE). This confirmed the putative B. floridae FoxE genes grouped with FoxE genes from other species.

We suspect the diversity of B. floridae FoxE genes shown in Table 1 represents a combination of recent lineage-specific gene duplications and assembly of separate haplotypes. The comparison of the genomic sequence surrounding all four loci suggested the two pairs of genes on scaffolds 251 and 559 were haplotypes of one locus, while the single gene on scaffold 1173 was a haplotype of one of the genes on scaffold 445. This would imply that B. floridae has a total of six FoxE genes. However, the lack of resolution in molecular phylogenetic analyses, the similarity in gene sequences, the small scaffold size and poor sequence quality surrounding these loci preclude concluding this with certainty. We have hence taken the conservative approach of naming them all as individual genes. More comprehensive future genome assemblies will help resolve this further.

One B. floridae Fox gene did not fall robustly into any specific subclass, and we have named this BfFox1 (Table 1). BfFox1 groups weakly with the FoxH genes under Bayesian analysis, though it falls basal to the gene we have named BfFoxH (Fig. 2). Additional analyses including orphan genes from other deuterostomes did not support orthology with these genes (not shown), suggesting it is not a remnant of an orthology group that has been lost from several lineages, as is the case with FoxAB. To investigate this more closely, we conducted a focused molecular phylogenetic analysis of the FoxH gene subclass, which to date has only been reported from chordates (Fig. 2) and included FoxF and FoxQ1 genes that are relatively closely related to the FoxH subclass. This shows that the gene we have named BfFoxH groups robustly with other FoxH genes. The gene we have named BfFox1 falls basal to these. Furthermore, in BLAST analyses, it appears to be more similar to FoxQ1 sequences. Hence, we cannot conclude with certainty it is a genuine FoxH orthologue and have maintained the name as BfFox1; however, it is clearly related in sequence to the FoxH and FoxQ1 subclasses, and the sequencing of Fox genes from additional species might help resolve its placement further.

Fox gene clusters, old and new

Eight scaffolds were found to include more than one Fox gene (Fig. 4). Four (three containing FoxE genes and one containing the two FoxA genes) have been discussed above and have been derived from tandem duplications specific to the amphioxus lineage. The FoxA cluster must be relatively ancient as both genes are also found in another amphioxus species, Branchiostoma belcheri (Terazawa and Satoh 1997). The high level of sequence similarity between the FoxE genes suggests relatively recent duplications underlie the formation of these clusters.

Three linkages between Fox genes from different subclasses were also noted. The previously described FoxC–FoxL1–FoxF–FoxQ1 cluster is located on scaffold 6 and spans approximately 471 kb (Fig. 4; Mazet et al. 2006). A fifth Fox gene, from the FoxQ2 subclass, is also linked to this cluster, though at more than 800 kb distant, it is unclear whether this has any significance with respect to Fox gene evolution. On scaffold 244, FoxEa and FoxD lie separated by 129 kb (Fig. 4). In the human genome, FoxD2 and FoxE3 lie 20 kb apart on chromosome 1p32. This suggests this linkage is of ancient evolutionary origin and has been maintained in both amphioxus and human lineages. Finally, on scaffold 241 (and scaffold 590, which we have concluded is the haplotype of scaffold 241), FoxP and FoxQ2b lie 32 kb apart.

B. floridae Fox gene expression

We used a B. floridae cDNA/EST database to find evidence of expression of Fox genes and examine their temporal expression profile. This database contains ESTs from nearly 140,000 cDNA clones derived from five developmental stages (Yu et al. 2008). Among the predicted Fox genes (32 to 35 in total), we found representative ESTs for 22 of them (Table 2). In addition, when we searched a supplementary EST data set derived from the same libraries for the B. floridae genome project (Putnam et al. 2008), we identified the corresponding ESTs and the cDNA clone for FoxQ2c. Therefore, 23 of the amphioxus Fox genes have EST data to support their expression. We did not find any ESTs for FoxI, FoxJ2/3, FoxL2, FoxP, FoxQ2b and Fox1. Our search also identified multiple cDNA clusters for certain genes (Table 2), which might represent different splice isoforms. Detailed analysis of the full-length cDNA sequences of these clones with the genomic sequence will provide more information on this issue.

Because the cDNA libraries for this EST project were not amplified or normalised (Yu et al. 2008), the EST count (or EST count/total ESTs from a specific stage) of a gene should be in proportion to its transcript abundance at the corresponding stage. Therefore, the EST count of a certain gene at different developmental stages can be used as a rough index representing its temporal expression pattern (Satou et al. 2003). Table 2 summarises the EST counts of B. floridae Fox genes at the five developmental stages, and Fig. 5 shows these normalised with respect to the number of ESTs sequenced for each stage. Our EST analysis provides important information on the existence of maternal transcripts of B. floridae Fox genes in eggs. Due to the technical difficulties of using B. floridae early embryos prior to blastula stage for in situ hybridisation, the existence and distribution of maternal transcripts from B. floridae developmental genes were not examined routinely in previous studies. From the EST data, at least five B. floridae Fox genes (FoxAa, FoxAb, FoxH, FoxK and FoxN1/4b) are expressed maternally and thus may play roles in very early embryogenesis. We also compared previously published zygotic expression data obtained by in situ hybridisation to our EST counts and found they corresponded fairly well—especially in those genes with high EST counts. For example, the FoxAa expression pattern has been reported in B. floridae (Shimeld 1997) and in another amphioxus species B. belcheri (Terazawa and Satoh 1997). It is weakly expressed in mid-gastrula stage and then strongly expressed in mesoderm and endoderm structures during neurula stage. Later on, its expression is down-regulated and confined to more restricted areas during the larval stage. As shown in Table 2 and Fig. 5, the EST count of FoxAa appears to reflect the temporal expression trend detected by in situ hybridisation. In addition, it has been suggested that although FoxAa and FoxAb are expressed in the same spatial domains during early embryonic stages, the expression level of FoxAa seems to be higher than that of FoxAb (Shimeld 1997). The EST counts of these two genes in Fig. 5 evidently confirm this quantitative difference between FoxAa and FoxAb transcripts in the neurula and larval stages. Surprisingly, in B. floridae mature adults, only FoxAb ESTs can be detected but none from FoxAa, suggesting they are expressed differently and may play divergent functions in adult amphioxus.

Fig. 5
figure 5

EST counts for B. floridae Fox gene EST clusters normalised relative to the total EST number sequenced from the neurula stage

It should be noted that because the current B. floridae EST/cDNA data set is not saturated (Yu et al. 2008), some rare transcripts might not be covered by this data set. For example, B. floridae FoxD gene (AmphiFoxD) has been reported to be expressed zygotically from the gastrula stage in the mesoderm and anterior neural plate; subsequently, it is expressed in the somites, notochord, anterior neural tube and hindgut endoderm at the neurula and larval stages (Yu et al. 2002b). However, the EST data of B. floridae FoxD is only represented by one cDNA clone in the neurula stage, suggesting the amount of transcripts for this gene is too low to be detected in gastrula and larval stages by the current EST data set.

No studies have yet addressed the function of any Fox gene in B. floridae, although several studies have reported a conserved expression in tissues considered to be homologous between B. floridae and vertebrates, for example FoxA genes in the notochord and floor plate (Shimeld 1997), FoxB in the central nervous system (Mazet and Shimeld 2002) and FoxC and FoxF in the mesoderm (Mazet et al. 2006).

Conclusions

Fifty putative genes encoding fork head domains were identified in the B. floridae genome. When probable haplotypes are taken into consideration, we suggest these resolve to 32 distinct genes (and possibly up to 35 genes, with the uncertainty deriving from the multiple FoxE gene duplicates). EST and/or cDNA data support the expression of 23 of these genes. Of the 23 Fox subclasses currently named, B. floridae possesses orthologues of 21. The missing two are likely to be vertebrate specific, and hence we can conclude that the B. floridae lineage has maintained the large majority (and perhaps all) of the ancestral subclass-level diversity of Fox genes inferred to have been present in the common ancestor of the Bilateria. Several linkages were identified between B. floridae Fox genes, including three genomic loci we infer to have derived from evolutionarily ancient linkages and others that have evolved by tandem duplication in the amphioxus lineage.