Development Genes and Evolution

, Volume 218, Issue 11, pp 579–590

Comprehensive survey and classification of homeobox genes in the genome of amphioxus, Branchiostoma floridae

Authors

  • Naohito Takatori
    • Department of Biological Sciences, Graduate School of Science and EngineeringTokyo Metropolitan University
    • Department of Biological Sciences, Graduate School of ScienceOsaka University
  • Thomas Butts
    • Department of ZoologyUniversity of Oxford
  • Simona Candiani
    • Dipartimento di BiologiaUniversità di Genova
  • Mario Pestarino
    • Dipartimento di BiologiaUniversità di Genova
  • David E. K. Ferrier
    • Department of ZoologyUniversity of Oxford
    • Gatty Marine LaboratoryUniversity of St. Andrews
    • Department of Biological Sciences, Graduate School of Science and EngineeringTokyo Metropolitan University
    • Department of ZoologyUniversity of Oxford
Original Article

DOI: 10.1007/s00427-008-0245-9

Cite this article as:
Takatori, N., Butts, T., Candiani, S. et al. Dev Genes Evol (2008) 218: 579. doi:10.1007/s00427-008-0245-9

Abstract

The homeobox genes comprise a large and diverse gene superfamily, many of which encode transcription factors with pivotal roles in the embryonic development of animals. We searched the assembled draft genome sequence of an amphioxus, Branchiostoma floridae, for genes possessing homeobox sequences. Phylogenetic analysis was used to divide these into gene families and classes. The 133 amphioxus homeobox genes comprise 60 ANTP class genes, 29 PRD genes (excluding Pon and Pax1/9), nine TALE genes, seven POU genes, seven LIM genes, five ZF genes, four CUT genes, four HNF genes, three SINE genes, one CERS gene, one PROS gene, and three unclassified genes. Ten of the 11 homeobox gene classes are less diverse in amphioxus than humans, as a result of gene duplication on the vertebrate lineage. Amphioxus possesses at least one member for all of the 96 homeobox gene families inferred to be present in the common ancestor of chordates, including representatives of the Msxlx, Bari, Abox, Nk7, Ro, and Repo gene families that have been lost from tunicates and vertebrates. We find duplication of several homeobox genes in the cephalochordate lineage (Mnx, Evx, Emx, Vent, Nk1, Nedx, Uncx, Lhx2/9, Hmbox, Pou3, and Irx) and several divergent genes that probably originated by extensive sequence divergence (Hx, Ankx, Lcx, Acut, Atale, Azfh, Ahbx, Muxa, Muxb, Aprd1–6, and Ahnf). The analysis reveals not only the repertoire of amphioxus homeobox genes but also gives insight into the evolution of chordate homeobox genes.

Keywords

HomeodomainLanceletMolecular evolutionMolecular phylogenyChordate

Introduction

Homeobox genes are characterized by the presence of a DNA sequence, the homeobox, which encodes a protein domain termed the homeodomain. Homeodomains are usually 60 amino acids in length, although several are larger because of insertions, and all encode three α-helices. Most homeodomain-containing proteins in animals are putative transcription factors with roles in developmental processes such as cell fate determination and cell differentiation (Bürglin 1994, 1995). Homeobox genes can be classified into superclasses, classes, groups, and families, albeit with some inconsistency in the use of nomenclature between authors and studies. A recent comprehensive classification of 235 human homeobox genes provided a simple and robust scheme in which homeobox genes are classified into 11 gene classes (ANTP, PRD, LIM, POU, HNF, SINE, TALE, CUT, PROS, ZF, and CERS) and over 100 gene families, based on phylogenetic analysis of homeodomain sequences, chromosomal location, and domain composition (Holland et al. 2007). Gene families are defined explicitly to contain all genes that derive from a single gene in the most recent common ancestor of bilaterian animals, although, in some instances, gene families must be erected for lineage-specific genes whose evolutionary origins are unclear.

The phylogenetic position of Cephalochordata (amphioxus or lancelets) make this subphylum an important reference point for studying the evolution of homeobox genes in the chordate lineage and, by extension, the evolution of chordate genomes (Bourlat et al. 2006; Delsuc et al. 2006). This is especially the case because comparison between vertebrate, amphioxus, and tunicate genomes reveals genome duplication on the vertebrate lineage (Holland et al. 1994; Dehal and Boore 2005; Putnam et al. 2008) and extensive gene loss in the tunicate lineage (Wada et al. 2003). With the availability of a near complete draft genome sequence of the Florida amphioxus, Branchiostoma floridae (Putnam et al. 2008; Holland et al. 2008), we have undertaken a comprehensive survey of homeobox genes in this animal. Comparison within and between genomes gives new insights into the diversification of homeobox genes in chordates and into patterns of genome evolution.

Materials and methods

Identification of amphioxus homeobox genes

Homeobox sequence searches were conducted according to previously described methods (Takatori and Saiga 2008). Briefly, homeodomain sequences from Drosophila, Caenorhabditis elegans, sea urchin, Ciona intestinalis, Xenopus laevis, mouse, and human were retrieved from public databases (http://www.ncbi.nlm.nih.gov/) and from the human compilation of Holland et al. (2007) and used as queries for several rounds of TBLASTN searching in version 1 of the amphioxus genome assembly (http://genome.jgi-psf.org/Brafl1/Brafl1.home.html). Version 1 has assembly errors within the Hox and ParaHox gene clusters, and a local reassembly was used for these genes (N. Putnam, personal communication). A total of over 7,000 non-independent hits were clustered according to sequence identity and genomic location to provide a set of putative genes and gene models annotated on the assembly or created where necessary. No single probability cut-off was used because different gene classes and families show different extents of sequence divergence; instead, hits were analyzed for the presence of a homeodomain by alignment and secondary structure prediction. Additional domains and motifs were predicted using the Simple Modular Architecture Research Tool http://smart.embl-heidelberg.de/ (Letunic et al. 2006; Schultz et al. 1998). A high degree of polymorphism has been detected in amphioxus (Putnam et al. 2008); two haplotypes corresponding to one gene were often recovered. Closely similar sequences were considered haplotypes if sequence identity extended to intronic sequence and the flanking genes were the same. Only a single sequence from a haplotype pair was used in subsequent sequence analyses.

The status of putative chordate-specific genes was checked by comparison to published homeobox gene surveys of the sea urchin Strongylocentrotus purpuratus (Howard-Ashby et al. 2006) and the cnidarian Nematostella vectensis (Ryan et al. 2006) and by TBLASTN searching against GenBank and recently sequenced protostome genomes (Daphnia pulex, Lottia gigantea, and Capitella sp. I).

Phylogenetic analyses

Molecular phylogenetic analyses were performed with homeodomain and other conserved domain sequences using maximum likelihood and neighbor-joining methods. Sequences were aligned using the ClustalW program in the Molecular Evolutionary Genetics Analysis (MEGA) software package (Kumar et al. 2004) (http://www.megasoftware.net/). Alignments were checked, and gaps were removed manually before construction of the trees. For the construction of maximum likelihood trees, the best-fitting model of protein substitution was first selected using the ProtTest program (Abascal et al. 2005). Trees were calculated using the PHYML program (Guindon and Gascuel 2003; Guindon et al. 2005), and the resulting unrooted trees were displayed in Newick format using the MEGA package or FigTree (Rambaut 2006). Neighbor-joining trees were constructed using the MEGA or PHYLIP packages (Felsenstein 1993). Branch support was assessed using bootstrap replication.

Results and discussion

Number of homeobox genes in amphioxus

We identified 133 genes containing homeobox sequences in the B. floridae genome sequence. These are listed in Table 1 together with their Protein ID numbers; their diversity is displayed in Fig. 1. A further two genes, Pon and Pax1/9, are often classified with homeobox genes but actually lack a homeobox sequence; these genes are excluded from the total count of homeobox genes but are included in Table 1 for completeness. Every previously described amphioxus homeobox gene was found, suggesting that the sequence coverage and gene survey must be close to complete. The 133 genes include representatives of all 11 gene classes described in human (Holland et al. 2007) and can be classified into 111 gene families of which 15 families (16 genes) are specific to amphioxus. Four gene families, Hopx, Isx, Ventx, and Zhx, are potentially chordate-specific, as deduced from their inferred presence in the chordate ancestor and their absence from non-chordate genomes analyzed to date. As in humans the largest gene class is ANTP containing 60 amphioxus genes compared to 100 genes and 19 pseudogenes in human. The second largest class is PRD containing 29 genes compared to 50 genes and numerous pseudogenes in human. The smaller gene classes in amphioxus are TALE (nine genes), POU (seven genes), LIM (seven genes), ZF (five genes), CUT (four genes), HNF (four genes), SINE (three genes), CERS (one gene), and PROS (one gene). Three genes defy classification. In every case except for the HNF class, amphioxus has fewer genes than human. No clear homeobox pseudogenes were detected in amphioxus in contrast to the human genome. The abundance of human pseudogenes is likely to reflect an amplification of LINE elements encoding reverse transcriptase that occurred during primate evolution (Ohshima et al. 2003), coupled with germ-line expression of certain homeobox genes in mammals (Booth and Holland 2004, 2007; Holland et al. 2008).
https://static-content.springer.com/image/art%3A10.1007%2Fs00427-008-0245-9/MediaObjects/427_2008_245_Fig1_HTML.gif
Fig. 1

Homeobox gene diversity in amphioxus. Phylogenetic analysis of all amphioxus homeodomains (excluding the partial homeodomain of Pax2/5/8) constructed using neighbor-joining from a JTT distance matrix. Gene classes are indicated by colors, except for CERS and PROS represented by single genes. Muxa, Muxb, and Ahbx1 that cannot be assigned to known classes are shown in black. Several classes are not recovered as monophyletic groups; additional characters such as presence of additional domains outside the homeodomain are used to assist classification. Due to the short length of sequence alignment and the complex modes of evolution, this tree should not be used to infer accurate evolutionary history; it is presented to demonstrate the diversity of amphioxus homeodomain sequences

Table 1

Classification of all homeobox genes in the B. floridae genome, plus the related Pon and Pax-1/9 genes that lack a homeobox

https://static-content.springer.com/image/art%3A10.1007%2Fs00427-008-0245-9/MediaObjects/427_2008_245_Tab1a_HTML.gifhttps://static-content.springer.com/image/art%3A10.1007%2Fs00427-008-0245-9/MediaObjects/427_2008_245_Tab1b_HTML.gifhttps://static-content.springer.com/image/art%3A10.1007%2Fs00427-008-0245-9/MediaObjects/427_2008_245_Tab1c_HTML.gif

GenBank accession numbers are given for amphioxus genes cloned previously. Joint Genome Institute Protein ID numbers are given for each gene, except for Hox and ParaHox genes since these regions had errors in the version 1 assembly. When two alleles were detected, two Protein IDs are listed. Blue gene families lost from the Olfactores lineage; red gene families lost from the vertebrate lineage; green chordate-specific genes

ANTP class

The ANTP class, named after the Drosophila homeotic gene Antennapedia, is unique to Metazoa and includes the Hox genes and several other well-known developmentally important genes such as ParaHox, En, Emx, Nk, Dlx, and Msx genes. Previous studies have described many amphioxus ANTP class genes, including a single Hox gene cluster (Garcia-Fernandez and Holland 2008; Ferrier et al. 2000), an intact ParaHox gene cluster (Brooke et al. 1998), and a broken NK gene cluster (Luke et al. 2003). Recent analyses of the amphioxus genome sequence have extended the Hox gene cluster to 15 genes (Holland et al. 2008), noted an absence of gene family loss (Holland et al. 2008), and drawn attention to several linked pairs or clusters of ANTP genes, notably Nkx6/Nkx7, Mnxa/Ro, and En/Nedxa/Nedxb/Dll (Butts et al. 2008; Putnam et al. 2008). The Nedx gene family (the name derived from Next to Distalless) includes the Drosophila gene CG13424.

We have used phylogenetic analysis to classify amphioxus ANTP class genes based on the homeodomain sequence (Supplementary Fig. 1). Amphioxus genes can be assigned to known gene families with almost universally high bootstrap support in both neighbor-joining and maximum likelihood analysis. The exception is the Vent family, where amphioxus or vertebrate homeodomains are divergent and do not group together, despite sharing diagnostic residues inside and outside the homeodomain. In addition to orthologues from all conserved chordate ANTP gene families, amphioxus possesses representatives of six gene families previously found only outside the chordates: Msxlx, Bari, Abox, Nk7, Ro, and Repo. The Bari gene family (the name being derived from Bar-related family found in invertebrates), which includes Drosophila CG11085, is named in this paper, as is the Abox gene family (Absent from Olfactores homeobox), which includes Drosophila CG34031.

Amphioxus also possesses three homeobox genes that can be assigned to the ANTP class, but not to previously defined gene families: Lcx, Hx, and Ankx. All three orphan genes are likely to be fast-evolving lineage-specific duplicates (Supplementary Fig. 1). For example, the Lcx or Lunchbox gene is located next to the pair of Vent genes (Luke et al. 2003) and may have derived from this gene family or another linked Nk homeobox gene. Duplication in the amphioxus lineage is not confined to these genes and has also occurred in the Evx, Mnx, Nedx, and NK1 gene families. In the case of Emx, for which duplication has been previously reported (Minguillón et al. 2002), two duplications have occurred producing three paralogues. Two of these (EmxB and EmxC) are tandemly arranged and presumably the product of the second duplication event.

PRD class

The PRD class, named after the paired gene of Drosophila melanogaster, includes most of the Pax genes (genes encoding a paired domain of 128 amino acid residues) and many non-Pax gene families. Previously reported Pax genes from amphioxus are Pax3/7 (Holland et al. 1999) and Pax6 (Glardon et al. 1998), both of which possess a homeobox, Pax2/5/8 (Kozmik et al. 1999), which has a partial homeobox, and Pax1/9 (Holland et al. 1995), which lacks a homeobox. Phylogenetic analyses with the paired domain or the homeodomain confirmed the identity of each of these genes with high bootstrap support. Interestingly, an additional Pax gene was identified in the amphioxus genome, orthologous to the Drosophila pox neuro gene (pon; Supplementary Fig. 2), a gene family hitherto not identified in any chordate genome. The presence of this gene in amphioxus suggests that this gene was lost in the Olfactores lineage. Like the Pax1/9 gene family, Pon genes encode a paired domain but not a homeobox; Pax1/9 and Pon are therefore excluded from our count of 29 PRD class homeobox genes in the amphioxus genome.

We identified 26 non-Pax PRD class genes in the B. floridae genome, which are classified into 23 gene families according to phylogenetic analysis (Table 1, Supplementary Fig. 3) and the identity of the 50th amino acid residue of the homeodomain (Galliot et al. 1999). Four of the genes encode lysine at position 50 (K50) and belong to the Otx, Pitx, Gsc, and Dmbx gene families. All have been reported previously, and two (Gsc and Otx) are ancient genomic neighbors (Putnam et al. 2008). Twenty two genes share a glutamine residue at position 50 (Q50). These comprise genes in the Alx, Arx, Drgx, Isx, Otp, Phox, Prop, Prrx, Rax, Repo, Shox, Hopx, Uncx, and Vsx families, plus a further six Q50 genes that do not form clades with any known PRD gene families. For the latter genes, novel amphioxus-specific gene families were erected, Aprd1 to Aprd6 (Table 1, Supplementary Fig. 4). The vertebrate PRD gene families Mix, Argfx, Dprx, Tprx, Dux, Leutx, Nobox, Rhox, Hesx, and Sebox were not found in amphioxus, but as these have not been found outside the vertebrates, they are likely to be vertebrate-specific novelties (Holland et al. 2007). The vertebrate PRD gene families Hopx and Isx were identified in amphioxus, but not in non-chordates, suggesting these are chordate-specific novelties. Consistent with this, the Prd-A gene of C. intestinalis shows moderate affinity to the B. floridae Isx, suggesting this gene is the ascidian Isx. We also identified an amphioxus orthologue of Drosophila repo, a gene involved in formation of glia, even though this gene has not been found in vertebrates or tunicates. The Uncx gene family is unusual in amphioxus; we identified three genes, two of which form clades with the previously identified Ciona Unc4-A and Unc4-B genes (Wada et al. 2003; Supplementary Fig. 3; 56% and 99%, respectively), while UncxC seems unique to amphioxus. We suggest that the ancestral chordate possessed two Uncx genes.

LIM class

Homeobox genes of the LIM class encode proteins with two LIM domains and a homeodomain with several diagnostic amino acid residues (Bürglin 1994). The LIM class is subdivided into the Lhx1/5, Lhx2/9 (previously called the apterous family), Lhx3/4, Lhx6/8, Lmx, and Islet gene families (Hobert and Westphal 2000; Holland et al. 2007). The B. floridae members of the Islet and Lhx1/5 families have been reported previously and were termed islet and Lim1/5, respectively (Jackman et al. 2000). Our genome survey and phylogenetic analyses using homeodomain or LIM domain sequences confirm these assignments, with no additional Islet or Lhx1/5 genes found (Table 1, Supplementary Fig. 4). We also identified one Lhx3/4 gene, one Lhx6/8 gene, and one Lmx gene; in contrast, the Lhx2/9 gene family has been duplicated to give two genes in amphioxus.

POU class

The POU class proteins possess a 60 amino acid homeodomain and an N-terminal 75 amino acid POU domain. In humans, the POU class has been divided into seven gene families: Pou1 to Pou6, plus tentatively the highly divergent Hdx gene that lacks the POU domain (Holland et al. 2007). We found seven POU class genes in amphioxus; phylogenetic analyses with the homeodomain indicate that these include one member each of Pou1, Pou2, Pou4, Pou6, and Hdx families, plus two genes in the Pou3 gene family, named Pou3 and Pou3L (Supplementary Fig. 5). Among these, Pou1, Pou3 (AmphiBrn1/2/4), and Pou4 genes had been identified previously (Candiani et al. 2002, 2006, 2008; Table 1). The amphioxus Hdx gene encodes two atypical homeodomains (Supplementary Fig. 5), as also seen in members of this gene family in other taxa. One of the putative Pou3 genes is closely similar to Pou3 genes from other taxa, but the second gene (Pou3L) is more divergent from known bilaterian Pou3 genes. Since we found no member of the Pou5 gene family in amphioxus, it was necessary to test whether Pou3L was actually a cryptic, divergent Pou5 gene as opposed to a divergent Pou3 gene. We used synteny analysis, testing whether genes linked to Pou3L in amphioxus had homologues on human chromosomal regions housing either Pou5 or Pou3 genes. Six of eight neighboring genes examined had homologues in human chromosomal regions containing Pou3 family genes, with only one gene suggesting synteny to a Pou5 region (Supplementary Table 1). We suggest that Pou5 is a vertebrate-specific gene, since it has not yet been found in invertebrate genomes, whereas Pou3 duplicated independently on the cephalochordate lineage. The cnidarian N. vectensis also has two putative, but divergent, Pou3 family genes (Ryan et al. 2006), raising the possibility of the duplication being more ancient, but there is no support for this from phylogenetic analysis (Supplementary Fig. 5).

HNF class

Members of the HNF class encode proteins with an atypical homeodomain possessing a 21 or 15 amino acid insertion between the second and third helix and a separate domain of similar structure to the POU domain (Chi et al. 2002). In humans, the HNF class is divided into two families, Hnf1 and Hmbox; the former includes the HNF1A and HNF1B genes, and the latter contains HMBOX1 (Holland et al. 2007). We identified four HNF class genes in the amphioxus genome, which molecular phylogenetic analysis using the homeodomain suggests are one Hnf1 family gene and two Hmbox family genes in addition to an amphioxus-specific divergent gene (Table 1, Supplementary Fig. 6). We suggest that amphioxus Hmbox genes may have duplicated in the cephalochordate lineage, but since no Hmbox genes have been found outside the chordate lineage, their precise evolutionary history is not yet clear.

SINE class

SINE class proteins, named after the Drosophila gene, sine oculis, are characterized by possession of the SIX domain: a DNA-binding domain of about 115 amino acids, situated N-terminal to the homeodomain. Three members of the class have been identified in Drosophila and six in the human genome (Dozier et al. 2001; Holland et al. 2007; Seo et al. 1999) and classified into three gene families, Six1/2, Six3/6, and Six4/5. We identified three homeodomain proteins with a SIX domain and a homeodomain in the B. floridae genome (Table 1); phylogenetic analyses of the homeodomain assign one to each gene family (Supplementary Fig. 7). Our analyses indicate that duplication of the SINE class genes occurred in the vertebrate lineage.

TALE class

The TALE class is characterized by an atypical homeodomain that possess a three amino acid insertion between the first and second α-helices of the homeodomain (Bürglin, 1997) and includes Irx, Meis, Mkx, Pbx, Pknox, and Tgif gene families (Bürglin 1998; Holland et al. 2007). Several of these gene families also have conserved domains outside the homeodomain, for example, the IRO domain, the MEIS domain, MKX domains, the PBC domain, and the TGIF box (Mukherjee and Bürglin 2007). We identified nine TALE class homeobox genes in the B. floridae genome. Domain composition plus phylogenetic analyses of the homeodomain indicates that these comprise one member each for the Meis, Mkx Pbx, Pknox, and Tgif gene families, three genes in the Irx family, and one additional TALE class gene with no clear affinity to known families (Table 1, Supplementary Fig. 8). We name the novel gene Atale and suggest it may be a cephalochordate-specific novelty. Although Mkx genes lack the Irx motif, our phylogenetic analysis of the homeodomain suggests that this family represents a sister clade to the Irx family, as previously suggested (Mukherjee and Bürglin 2007).

The Irx genes represent an interesting case from an evolutionary perspective. These genes form a cluster of three genes in Drosophila (araucan, caupolican, and mirror) and two clusters of three genes in fish, mouse, and human genomes (Dildrop and Ruther 2004; Gomez-Skarmeta and Modolell 2002; Peters et al. 2000). The two vertebrate gene clusters are clearly the products of cluster duplication, as judged by phylogenetic analysis and comparison of transcriptional orientation. Surprisingly, however, the three Drosophila Irx genes do not group with the vertebrate Irx1/3, Irx2/5, and Irx4/6 pairs in phylogenetic analysis; hence, it was suggested that the Drosophila and vertebrate clusters emerged independently (Peters et al. 2000). The three Irx genes we have identified in the B. floridae genome are also linked into a gene cluster, although their transcriptional organization is not consistent with that expected for a pre-duplication version of the two vertebrate clusters. Furthermore, in phylogenetic analysis they clearly do not represent one gene for each of the Irx1/3, Irx2/5, and Irx4/6 families; instead, they appear to have duplicated independently on the cephalochordate lineage (Fig. 2). We also identified a pair of Irx genes in the genome of honeybee and beetle consisting of one mirror gene and a pro-orthologue of araucan and caupolican (Fig. 2 and data not shown), suggesting that evolution of the three-gene cluster in Drosophila involved an extra gene duplication in the Drosophila lineage. Together, these analyses suggest that the common ancestor of eubilaterian animals possessed a single Irx gene that duplicated to give an Irx gene pair in insects and then a three-gene cluster in Drosophila. The original single Irx gene also duplicated to give a three gene cluster in amphioxus and independently in the ancestor of vertebrates (Fig. 2). The same conclusion was reached independently by Irimia et al. (2008). We also noted sequence conservation in the region N-terminal to the homeodomain between vertebrate Irx4/6 genes and invertebrate Irxs, suggesting that the Irx4/6 group retained an ancestral motif (data not shown).
https://static-content.springer.com/image/art%3A10.1007%2Fs00427-008-0245-9/MediaObjects/427_2008_245_Fig2_HTML.gif
Fig. 2

Evolution of the Iroquois genes in Bilateria. Iroquois genes are inferred to have duplicated to yield three-gene clusters independently in at least three extant lineages: flies, cephalochordates, and vertebrates. In the vertebrate lineage, the cluster was duplicated. Paralogues shaded in the same color have higher sequence similarity

CUT class

The CUT homeobox class includes genes of the Cux, Satb, and Onecut families in vertebrates; these genes encode proteins with a homeodomain and one or more copies of the 75 amino acid DNA-binding CUT domain. All invertebrates previously analyzed lack the Satb gene family but have a Compass gene. Genes of the latter family lack the CUT domain but share a COMPASS domain with Satb genes, suggesting that Satb genes evolved from Compass genes (Bürglin and Cassata 2002). The CUT gene diversity of B. floridae has recently been analyzed (Takatori and Saiga 2008). Briefly, the amphioxus genome contains one member for each of the Cux, Onecut, and Compass families, plus one novel gene with four CUT domains and a homeodomain; there are no members of the Satb family (Table 1). Similarly, the sea urchin genome contains a Compass family but not a Satb member. These results suggest that the Satb family arose in the vertebrate lineage, probably from domain shuffling between Compass and Onecut genes, judging from similarity found between the homeodomains of these families.

PROS class

The PROS class, named after a Drosophila gene involved in control of neuronal identity (Chu-Lagraff et al. 1991; Doe et al. 1991), is characterized by a three amino acid insertion between the second and third α-helices of the homeodomain (Bürglin 1995) and by possession of the DNA-binding PROS domain of approximately 100 amino acids (Yousef and Matthews 2005). Two genes of this class have been identified in the mouse and human genomes (Holland et al. 2007; Nishijima and Ohtoshi 2006). We identified one amphioxus member of this class, with the characteristic amino acid insertion and the PROS domain (Table 1, Supplementary Fig. 9). Phylogenetic analysis suggests that vertebrate PROS genes duplicated after the divergence of the amphioxus and vertebrate lineages, probably as part of the whole genome duplications that occurred early in the vertebrate lineage.

ZF class

The zinc finger (ZF) class, also called the ZFH class, includes proteins with homeodomain and zinc finger motifs and contains the Adnp, Zfhx, Zeb, Zhx/Homez, and Tshz families (Holland et al. 2007). Although members of this family share the zinc finger motif, previous molecular phylogenetic analyses suggest that this class is not necessarily monophyletic (Holland et al. 2007); homeodomains and zinc finger motifs might have become associated on more than one occasion in evolution.

We identified five zinc finger-containing homeodomain proteins in the B. floridae genome, which our phylogenetic analyses, using the homeodomain, suggest are one member each of the Zfhx, Zeb, Tshz, and Zhx/Homez gene families, in addition to a single orphan gene that cannot be placed in any established family (Table 1, Supplementary Fig. 10). The Adnp gene family appears to be a vertebrate innovation. In the case of the Tshz family, each chordate gene possesses a single homeobox, while the Drosophila homologue, teashirt, and all other non-deuterostome homologues sequenced to date do not possess a homeobox. No orthologue is found in the sea urchin genome sequence. We suggest that association of a homeodomain with the teashirt zinc-finger gene occurred in the chordate stem lineage. The chordate Zeb genes also encode proteins with a single homeodomain, which form a strongly supported clade in phylogenetic analysis (Supplementary Fig. 10). The Drosophila orthologue of Zeb is zfh1, and although this relationship is not recovered from phylogenetic analysis of homeodomains (Supplementary Fig. 10), orthology is clear from protein structure and sequence conservation outside the homeodomain (Liu et al. 2006).

The evolution of the Zfhx and Zhx/Homez gene families is more complex. Both families are composed of genes with multiple homeodomains of varying numbers. Phylogenetic analysis suggests that the ancestral pre-duplication vertebrate Zfhx gene possessed four homeodomains (Supplementary Fig. 10), an organization that is retained in the amphioxus Zfhx gene. By contrast, the amphioxus Zhx gene is somewhat derived relative to its vertebrate homologues, possessing six homeodomains as opposed to the five deduced to be present in the ancestral vertebrate Zhx gene. Only the first homeodomains of the amphioxus and human Zhx homologues form a well-supported clade (Supplementary Fig. 10), suggesting that relaxation of selective pressures on the other homeodomains allowed sequence divergence.

CERS class

The ceramide synthase (CERS) class, formerly called the LASS class, is a highly unusual class of homeodomain proteins that are localized at the membrane of the endoplasmic reticulum with the homeodomain facing the cytosol (Mizutani et al. 2005). CERS proteins are involved in the synthesis of ceramide, a precursor for sphingolipids and an important second messenger for a variety of cellular events including differentiation. A recent study in which the homeodomain of a CERS protein was deleted suggested that this domain is not required for ceramide synthase activity (Mesika et al. 2007), and it is yet to be determined whether the CERS homeodomain functions as a nucleic-acid-binding domain. The human genome contains six CERS encoding genes, CERS1 to CERS6, with the CERS1 lacking the homeobox (Pewzner-Jung et al. 2006). Phylogenetic analyses with the homeodomain divide the CERS class into two distinct groups with different substrate preference, CERS2 to CERS4 that prefer long-chain acyl-CoA and CERS5 to CERS6 that prefer short-chain acyl-CoA. (Mizutani et al. 2005).

We found one amphioxus gene with a predicted CERS-type homeodomain and a TLC (TRAM/Lag1p/CLN8) domain, also characteristic of CERS proteins. Phylogenetic analysis reveals that the amphioxus protein is more similar to the CERS5/6 than the CERS2/3/4 group of vertebrate genes (Supplementary Fig. 11). This situation contrasts to the Ciona genome where we found five CERS class genes, including representatives of both groups. It is likely that expansion of the Cers class was an Olfactores innovation, since only a single Cers gene is present in the sea urchin and arthropod genomes; it is unclear whether the origin of CERS2/3/4 homologues in Olfactores relates to a difference in sphingolipid metabolism.

Others

Three homeobox genes were identified in the course of our analyses that do not robustly group with any established class (Table 1). The first of these we have named Ahbx1 (amphioxus homeobox 1). The remaining two genes encode three and five homeodomains, respectively, and we have named them Muxa and Muxb (multiple homeobox a and b). All three genes are likely to be derived amphioxus-specific genes.

Conclusion

We identified 133 homeobox genes in the genome of the amphioxus B. floridae, substantially fewer than the 235 homeobox genes described in the human genome. This difference is primarily due to most gene families having more members in vertebrates than in amphioxus, due to gene duplication. There are also several entire homeobox gene families present in human but not amphioxus; these are most likely derived vertebrate-specific gene families, since in each case, they have not been found outside vertebrates. The list of putative vertebrate-specific gene families, and indeed chordate-specific gene families, may need to be refined as more genome sequences become available. There are also several cases of amphioxus having homeobox gene families not found in vertebrates. These include amphioxus-specific homeobox gene families, presumably arisen by duplication on the cephalochordate lineage and extensive divergence, plus several examples of ancient gene families lost from vertebrates. Overall, our analyses highlight gene duplication in vertebrates and amphioxus (far more extensive in the former) but extensive gene loss only on the lineages leading to vertebrates and to tunicates. Together with the recent phylogenetic studies that place cephalochordates as the most distant chordate taxon to vertebrates (Delsuc et al. 2006), our results highlight the importance of further analysis of amphioxus homeobox genes in understanding the emergence and evolution of chordate developmental mechanisms.

Acknowledgments

We thank Nik Putnam, Dan Rokshar, and other members of the amphioxus team at the Joint Genome Institute, Walnut Creek, California, for their very considerable efforts in determining the B. floridae genome sequence and making it available to the research community, and Linda Holland, Nori Satoh and Jeremy Gibson-Brown for important contributions to promotion and coordination of the project. We also thank members of the Developmental Program Laboratory, Department of Biological Sciences, Graduate School of Science and Engineering, Tokyo Metropolitan University, members of the J.W. Jenkinson Laboratory of Evolution and Development, Department of Zoology, University of Oxford, and Jordi Garcia-Fernàndez for discussions and advice. This work was supported by KAKENHI (Grant-in-Aid for Scientific Research) on Priority Area ‘Comparative genomics’ from the Ministry of Education, Culture, Sports, Science and Technology of Japan, Grant-in-Aid for Scientific Research B from JSPS (H.S.), MIUR Italy, FIRB 2001 BAU01WAFY and PRIN 2006 PRIN2006058952 (S.C., M.P.). and BBSRC (T.B, P.W.H.H., D.E.K.F.).

Supplementary material

427_2008_245_MOESM1_ESM.pdf (164 kb)
Supplementary Fig. 1Arbitrarily rooted phylogenetic tree of ANTP class genes generated by maximum likelihood based on homeodomain sequences. Gene family support nodes are shown, except in cases where these are not recovered as monophyletic (Hox families, NK2, NK4, and Vent) due to uneven rates of sequence evolution and complex histories. The two Vent genes are represented by a single branch, since they have identical homeodomains. Amphioxus homeodomains identified in this study are marked with closed circles. The two B. floridae ro alleles differ in predicted amino acid sequence; Protein ID 290436 was used. Scale bar indicates evolutionary distance of 0.5 amino acid substitutions per position. The following notes apply to all supplementary figures: numbers at nodes indicate bootstrap support values given as a percentage of 500 bootstrap pseudoreplications of the data. Names on branches indicate protein names, with species name in abbreviated form and accession ID. Species codes are: Bb Branchiostoma belcheri; Bf Branchiostoma floridae; Ce Caenorhabditis elegans; Cf Canis familiaris; Ci Ciona intestinalis; Dm Drosophila melanogaster; Dr Danio rerio; Gg Gallus gallus; He Heliocidaris erythrogramma; Hr Halocynthia roretzi; Hs Homo sapiens; Mm Mus musculus; Nv Nematostella vectensis; Od Oikopleura dioica; Pd Platynereis dumerilii; Sk Saccoglossus kowalevskii; Sp Strongylocentrotus purpuratus;; Tf Takifugu rubripes; and Xl Xenopus laevis. (PDF 168 KB)
427_2008_245_Fig2_ESM.gif (107 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig2_ESM.tif (27.3 mb)
High resolution image file (TIFF 28 MB)
427_2008_245_Fig3_ESM.gif (189 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig3_ESM.tif (30.7 mb)
High resolution image file (TIFF 32 MB)
427_2008_245_Fig4_ESM.gif (120 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig4_ESM.tif (30.4 mb)
High resolution image file (TIFF 31 MB)
427_2008_245_Fig5_ESM.gif (71 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig5_ESM.tif (28.9 mb)
High resolution image file (TIFF 30 MB)
427_2008_245_Fig6_ESM.gif (57 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig6_ESM.tif (24.8 mb)
High resolution image file (TIFF 26 MB)
427_2008_245_Fig7_ESM.gif (58 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig7_ESM.tif (26.8 mb)
High resolution image file (TIFF 28 MB)
427_2008_245_Fig8_ESM.gif (101 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig8_ESM.tif (29.8 mb)
High resolution image file (TIFF 31 MB)
427_2008_245_Fig9_ESM.gif (27 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig9_ESM.tif (19.6 mb)
High resolution image file (TIFF 20 MB)
427_2008_245_Fig10_ESM.gif (152 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig10_ESM.tif (32.3 mb)
High resolution image file (TIFF 33 MB)
427_2008_245_Fig11_ESM.gif (62 kb)
Supplementary Fig. 2

Arbitrarily rooted phylogenetic tree of Pax genes generated by maximum likelihood based on paired domain sequences. Scale bar indicates evolutionary distance of 0.05 amino acid substitutions per position. In this and all subsequent supplementary figures, amphioxus proteins identified in this study are represented without an accession ID and are marked with closed circles; amphioxus sequences previously reported are represented with accession ID and marked with open circles. The B. floridae genome Pax6 sequence used corresponds to Protein ID 291575. (GIF 109 KB)

427_2008_245_Fig11_ESM.tif (22.2 mb)
High resolution image file (TIFF 23 MB)
427_2008_245_MOESM12_ESM.doc (38 kb)
Supplementary Table 1Chromosomal location and putative synteny of putative human homologues of amphioxus genes neighboring Pou3L. (DOC 38 KB)

Copyright information

© Springer-Verlag 2008