Introduction

Cellulose, a linear homopolymer of d-glucose, is the largest biomass on earth. This molecule is produced by a taxonomically diverse group of organisms, including plants, algae, bacteria, protists, and fungi (Richmond 1991). Peculiarly, one group of animals, the urochordates, has the ability to biosynthesize cellulose (Rånby 1952; Hirose et al. 1999; Kimura et al. 2001). The urochordates are ubiquitous marine invertebrate chordates comprising the three classes, Appendicularia,Thaliacea, and Ascidiacea, all of which have been shown to produce cellulose microfibrils those incorporated into the house in appendicularians and the tunic in thaliaceans and ascidians. Given the phylogenetic position of urochordates as basal chordates (Cameron et al. 2000; Satoh 2003), their unique ability to produce cellulose appears as a characteristic in isolation.

According to a phylogenetic definition of homology (Rieppel 1994), this trait may result from either homology (similarity due to common descent) or homoplasy (similarity due to convergence, parallelism, or reversal) to other cellulose-producing organisms, in which cellulose is synthesized by multi-enzyme complexes at the plasma membrane (Brown 1996; Delmer 1999). These so-called terminal complexes have homologous cellulose synthases as their core catalytic subunits, although cellulose is synthesized along distinct pathways (Delmer 1999; Read and Bacic 2002; Williamson et al. 2002). Thus, for cellulose biosynthesis, cellulose synthase is necessary; the presence or absence of cellulose synthase in urochordates is significant to evaluate whether this animal cellulose is homologous or homoplastic.

Homology or homoplasy is here the recognition of similar traits in the context of vertical inheritance of traits. Out of this, another explanation for traits in isolation is lateral inheritance of traits, known as the mechanism of lateral gene transfer (Bushman 2002; Brown 2003). As yet, lateral gene transfer as a major driving force in evolution is accepted for prokaryotes, but not for eukaryotes (except for symbiosis-originated organelle genes). In the draft genome sequence of the ascidian urochordate Ciona intestinalis, however, results of similarity searches using BLAST pair-wise alignments have suggested the presence of a possible gene with similarity to the cellulose synthase genes of cyanobacterial origin (Dehal et al. 2002). Here we determined the cDNA sequence of cellulose synthase to confirm its presence and resolve the shortcomings in the computational prediction of eukaryotic protein-encoding genes (Zhang 2002). We also performed a phylogenetic analysis of the cellulose synthase to overcome the shortcomings of BLAST analysis in detecting phylogenetic relationships among sequences (Koski and Golding 2001). Lastly, we exploited a gene fusion to remove the uncertainty previously present in detecting lateral gene transfers (Kurland 2000; Lawrence and Ochman 2002). We propose that the likely lateral transfer of a bacterial cellulose synthase gene has paved the way to the establishment of this unique ability in urochordates.

Materials and methods

Isolation of the cDNA for the C. intestinalis cellulose synthase gene

The cDNA clone cilv083f09 was picked from the C. intestinalis Gene Collection (Satou et al. 2002a). The nucleotide sequence of its cDNA insert was determined for both strands using a Big-Dye Terminator Cycle Sequencing Ready Reaction kit and ABI PRISM 377 DNA sequencer (Perkin Elmer).

Similarity search

Amino acid sequences of the following cellulose synthases were downloaded from GenBank through the website of the National Center for Biotechnology Information at http://www.ncbi.nlm.nih.gov/: AgtCesA1 (I39714) from the bacterium Agrobacterium tumefaciens, DdCesA (AF163835) from the protist Dictyostelium discoideum, and AtCesA1 (AAC39334) from the plant Arabidopsis thaliana (numbers in brackets denote GenBank accessions). TBLASTN similarity searches (Altschul et al. 1997) were performed, with the downloaded sequences as a query, on websites of the genomic and EST databases of C. intestinalis (Ciona Genome Project website at http://genome.jgi-psf.org/ciona4/ciona4.home.html and Ghost Database website at http://ghost.zool.kyoto-u.ac.jp/indexr1.html, respectively). TBLASTN similarity searches of Ci-CesA were performed on the following databases :NCBI (http://www.ncbi.nlm.nih.gov/), Joint Genome Institute (JGI, http://www.jgi.doe.gov/index.html), the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/), the Institute for Genomic Research (TIGR, http://www.tigr.org), GeneDB (http://www.genedb.org/), Center for Genome Research (CGR, http://www-genome.wi.mit.edu/), Kazusa DNA Research Institute (http://www.kazusa.or.jp/), Genome Sequencing Center (http://www.genome.wustl.edu/), NIG DNA Sequencing Center (http://dolphin.lab.nig.ac.jp/index.php), All Dictyostelium BLAST Search (http://dicty.sdsc.edu/), and Fungal Genome Resource (http://gene.genetics.uga.edu/). BLAST searches were all performed at the maximum expected value cutoff of 1e-4.

Prediction of transmembrane helices and topology

Putative transmembrane domains and topology of Ci-CesA were estimated by using prediction algorithms TMHMM (Krogh et al. 2001) and HMMTOP (Tusnády and Simon 2001) on their websites at http://www.cbs.dtu.dk/services/TMHMM and http://www.enzim.hu/hmmtop/, respectively.

Molecular phylogenetics

Amino acid sequences were downloaded from CAZy (http://afmb.cnrs-mrs.fr/~cazy/CAZY/index.html), CELLWALL (http://cellwall.stanford.edu), and other websites used in similarity searches. Multiple alignments were performed by using ClustalX version 1.81 (Thompson et al. 1997) as follows. Alignment parameters were tuned so that the known conserved regions became unambiguously aligned, and less-informative regions like indels or transmembrane helices were then removed. Based on these remaining sites (272, 68, and 192 amino acid residues for datasets of GT-2, GH-6, and cellulose synthase, respectively), phylogenetic trees were reconstructed by the maximum likelihood method using TREE-PUZZLE version 5.0 (Schmidt et al. 2002) with the WAG model of amino acid substitution, two rates (1 variable and 1 invariable) model of among site rate heterogeneity, and 25,000 puzzling steps. The Neighbor joining method was also employed in tree reconstructions with TREE-PUZZLE distance matrices, by using BIONJ (Gascuel 1997) for tree inference. Bootstrap analysis (1,000 samples) was performed by using PUZZLEBOOT version 1.03 (Holder and Roger 2003) in combination with Seqboot and Consense programs in PHYLIP version 3.5c (Felsenstein 1993). For GenBank accessions of the sequences used, see the Electronic supplementary material.

Whole-mount in situ hybridization

Digoxigenin-labeled RNA probes were prepared, using DIG RNA labeling Kit (Roche Japan), from the entire coding region of Ci-CesA and used in whole-mount in situ hybridization of Ciona intestinalis embryos, as previously described (Satou and Satoh 1997).

EST counts

In previous studies, we have conducted large scale EST analyses of transcripts expressed during C. intestinalis embryogenesis (Satou et al. 2002a). The cDNA libraries examined were from fertilized eggs, cleaving embryos, gastrulae/neurulae, tailbud embryos, larvae and whole young adults. Because the libraries were not normalized or amplified, the occurrence of cDNA clones or EST counts in each library may reflect the quantity of transcripts of the corresponding genes. Thus, comparison of the EST counts of a certain gene at the six developmental stages may reflect the temporal expression pattern of the gene (Satou et al. 2003).

Results and discussion

Isolation of the cDNA for the C. intestinalis cellulose synthase gene

Through BLAST similarity searches of Ciona genomic sequences, we isolated a predicted gene with similarity to bacterial and protist cellulose synthases at expected values of 2.3e-18 and 4.8e-22, respectively. This predicted gene, ci0100130874, was annotated in the Ciona expressed sequence tag (EST) library (Satou et al. 2002b) as ESTs of cDNA cluster CLSTR00946r1. We determined the full-length nucleotide sequence of the longest EST in this cluster, cilv083f09, which possessed an open reading flame (ORF) of 3,834 base pairs. The sequence has been deposited in the DNA Data Bank of Japan under GenBank accession number AB104509. Unexpectedly, this ORF demonstrated similarity to the genomic sequences of both ci0100130874 and another predicted gene, ci0100152699. These two predicted genes were artifacts resulting from the misassembly of the whole-genome shotgun sequences; indeed, these two sequences constitute one gene (Fig. 1A).

Fig. 1A–C
figure 1

Structure of Ci-CesA. A Schematic representation of the gene displays the total 14-kb region of Scaffold 458 and Scaffold 1,728 (Scaffolds are stretches of whole-genome shotgun sequences assembled in the Ciona genome sequence). Two predicted genes, ci0100130874 and ci0100152699, on the Scaffolds are indicated. The intron-exon structure of Ci-CesA was revealed by sequence comparison of the Scaffold sequences with the full-length Ci-CesA cDNA. Shaded or open boxes indicate the coding or non-coding regions of exons, respectively. Ci-CesA contains 21 exons. The Ci-CesA transcript covers the sequences of the two predicted genes. Exon 16 is included in both predicted genes. B Ci-CesA protein structure. Numerals denote the number of amino acids from the putative translation-initiating methionine. Lines denote the positions of the transmembrane helices. The yellow and pink boxed characters represent the common motifs of UDP-dependent, polymerizing GT-2s (D, DxD, D, QxxRW) and the signature sequence of GH-6 proteins ([LIVMYA]-[LIVA]-[LIVT]-[LIV]-E-P-D-[SAL]- [LI]-[PSAG], PROSITE; http://kr.expasy.org/prosite/; accession PS00656), respectively. The boxed characters represent the KAG and QTP motifs. The dotted line denotes the region demonstrating similarity to a GH-6 conserved domain (Pfam; http://www.sanger.ac.uk/Software/Pfam/index.shtml; accession PF01341) at an expect value of 4e-9. Bold characters represent the positions corresponding to the probable aspartic acid residues conserved in the catalytic core of GH-6s. C Comparison of the domain structure of cellulose synthases of various origins. Multiple alignments are shown in schematic drawings indicating the positions of the common motifs, depicted as D, DxD, D, and QxxRW. Only the cellulose synthase domain of Ci-CesA is represented. Eukaryotic sequences are so far derived from a mycetozoan protist Dictyostelium discoideum (Blanton et al. 2000), a green alga Mesotaenium caldariorum (Roberts et al. 2002), and vascular plants (Richmond and Somerville 2000). In comparison with ascidian and prokaryotic sequences, algal and plant sequences conserved an N-terminal zinc-binding domain and two insertions, the so-called plant-specific conserved region (CR-P) and a class-specific region (or earlier a hypervariable region; Delmer 1999), depicted in grey. In a mycetozoan sequence, there is an additional insertion at the position corresponding to CR-P. A portion of cyanobacterial sequence also demonstrates an insertion at the position corresponding to CR-P (Nobles et al. 2001)

Characterization of Ci-CesA

The product of this gene (size: 1,277 amino acid residues), according to prediction algorisms, is likely an integral transmembrane protein at the plasma membrane containing two N-terminal transmembrane helices (at the amino-acid positions 277–294 and 309–333, see legend of Fig. 1B), followed by the cytoplasmic region (334–622), five transmembrane helices (623–646, 655–679, 747–768, 777–801, and 818–835), and a C-terminal extracellular region (836–1,277; Fig. 1B). The cytoplasmic region featured a motif common to UDP-sugar dependent, polymerizing β-glycosyltransferases (GTs) in the GT-2 family, wherein the conserved “D, DxD” and “D, QxxRW” residues (expressed in one-letter code wherein x indicates any amino acid) form binding sites for the donor-UDP and acceptor-sugar, respectively (Charnock et al. 2001). The cytoplasmic region also conserved the “KAG” and “QTP” motifs of unknown function (Stasinopoulos et al. 1999), those expected of the cellulose synthase subfamily in the GT-2 family (Saxena and Brown 2000; Richmond and Somerville 2000).

Molecular phylogenetic analysis of the N-terminal half (1–850) of the protein with polymerizing GT-2s whose catalytic product is experimentally identified demonstrated that the Ciona sequence fell in a clade containing all the cellulose-synthesizing GT-2s (Fig. 2). BLAST hits for the N-terminal half uncovered similarities primarily to the cellulose synthase subfamily. Although biochemical confirmation is required to confirm the identity of this molecule as a cellulose synthase, these similarities prompted us to name the gene Ci-CesA (C iona i ntestinalis cellulose synthase). In BLAST pair-wise alignments, the N-terminal half of Ci-CesA was determined to be more similar to prokaryotic than to eukaryotic sequences (Dehal et al. 2002; Table 1). The domain structure of Ci-CesA may be partly responsible for this result; Ci-CesA lacked an ancillary domain or the insertions identified in other eukaryotic sequences (Fig. 1B). The C-terminal portion of the molecule (851–1,277) demonstrated similarity in BLAST searches to cellulases in the GH-6 family of glycoside hydrolases (GH; Henrissat and Davies 1997). Ci-CesA contained a conserved GH-6 signature sequence (Fig. 1B); however, the two probable catalytic-important aspartic acid residues (Koivula et al. 2002) are mutated from the consensus sequence (G913 and S958), suggesting this domain may lack its original catalytic activity. Consequently, Ci-CesA may be a fusion of a cellulose synthase domain and a noncellulolytic GH-6 domain.

Fig. 2
figure 2

Phylogenetic tree of GT-2s. The most likely tree (log likelihood =−10,685.08) found by TREE-PUZZLE is shown. Percent supports for clades using the maximum likelihood method (quartet puzzling support values) and Neighbour joining method (bootstrap values) are shown above and below each internal branch defining a clade, respectively. The scale bar represents 0.1 substitutions per site. GT-2s which do not conserve the “D, DxD, D, QxxRW” motif, including dolichol phosphate mannose synthase, cyclic β-1,3-glucan synthase, and others, are not included in the analysis. Catalytic products of the proteins, experimentally identified, are depicted in parentheses. The tree is rooted with a rat glucosylceramide synthase of GT-21, which conserves the “D, DxD, D, QxxRW” motif and is distantly related to GT-2s (Marks et al. 2001)

Table 1 Results of a BLAST similarity search of Ci-CesAa

Expression of Ci-CesA

Ci-CesA was transcriptionally active in C. intestinalis embryos. An EST count analysis indicated that the number of Ci-CesA ESTs/total number of ESTs was 0/29,442; 0/26,796; 0/23,475; 32/31,209; 134/24,532; and 6/29,138 in the cDNA library of fertilized eggs, cleaving embryos, gastrulae/neurulae, tailbud embryos, larvae, and young adults, respectively. This revealed that the transcription of Ci-CesA began around the tailbud embryo stage and became active as development proceeded.

The Ci-CesA expression was confirmed by whole-mount in situ hybridization (Fig. 3). Expression was detectable as early as the developmental stage of neurula. Signal intensity increased as development proceeded through the subsequent tailbud stage. Expression was restricted to epidermal cells. The observed expression pattern of Ci-CesA was spatiotemporally consistent with observed cellulose synthesis in vivo (Gianguzza and Dolcemascolo 1980).

Fig. 3A—F
figure 3

Expression of Ci-CesA in Ciona intestinalis embryos, as revealed by whole-mount in situ hybridization. A A middle gastrula-stage embryo, vegetal view. B A neurula-stage embryo, dorsal view. Signals for zygotic Ci-CesA expression are first detected at the neurula stage. C An initial tailbud-stage embryo, dorsal view. D–F Early tailbud-stage embryos, dorsal view. E A cross-section of the sample shown in D, exhibiting gene expression in epidermal cells. F A control using sense-probes. Scale bar represents 100 µm

Evolutionary origin of Ci-CesA

Both the cellulose synthase and GH-6 domains of this protein lacked homologs in the public databases cataloguing completed and ongoing animal genomic sequences, except in that of C. intestinalis. Such atypical genes, identified by similarity searches, are often explained as either the lineage-specific acquisition of foreign genes via lateral gene transfer or the multiple losses of orthologs in other lineages (Lawrence and Ochman 2002, and references therein). As we could not identify other cellulose synthase genes fused with a GH-6 gene in public databases, the possibility of such a fusion gene preceding the origin of Ci-CesA can be tentatively excluded; the two domains likely originate from two distinct genes. Although we lacked an estimate for the rate of lateral gene transfer or gene loss through genomic evolution, the fusion of two atypical genes independently stemming from two events, either two lateral gene transfers, two gene losses, or the combination of lateral gene transfer and gene loss, is unlikely in parsimony. A single lateral gene transfer of an operon and the subsequent fusion of operon genes, as proposed for the glutamate synthase gene of diplomonad protists (Andersson and Roger 2002), is more plausible. Operons or gene clusters wherein a cellulose synthase gene resides in close proximity to a cellulase gene are conserved in most bacteria (except cyanobacteria; Romling 2002). While most such conserved cellulases belong to the GH-8 family, we found that three species of actinobacteria in the genus Streptomyces have gene clusters containing a GH-6 cellulase instead (see legends of Figs. 4, 5). We thus propose that such a bacterial gene cluster or operon is the likely origin of Ci-CesA. Consistent with this possibility, molecular phylogenetic analyses inferred incongruence between the gene’s and the organism’s phylogenies; in the reconstructed phylogenetic tree (Fig. 4), the GH-6 domain of Ci-CesA constituted a clade with the bacterial and not the eukaryotic (fungal) sequences.

Fig. 4
figure 4

Phylogenetic tree of GH-6s. The most likely tree (log likelihood =−3,852.55) found by TREE-PUZZLE is shown. Branch support values and the scale bar are as outlined in Fig. 2. Sub-classifications of GH-6 into two groups, endoglucanase (EG) and cellobiohydrolase (CBH), according to the mode of catalytic action, are depicted in parentheses as far as is known. Possibly, two bacterial clades correspond to these sub-classifications. An asterisk indicates the three Streptomyces sequences which constitute a gene cluster with a putative cellulose synthase

Fig. 5
figure 5

Phylogenetic tree of putative cellulose synthases. The distance method tree is shown. The number at each branch represents percent bootstrap support. The scale bar denotes genetic distance. The tree is rooted with hyaluronan and chitin synthases used in the GT-2 analysis. Uncharacterized, putative non-cellulose synthase sequences are also included in the analysis, and they group together outside the putative cellulose synthase subfamily. Plant cellulose synthase and cellulose synthase-like sequences (Richmond and Somerville 2000) are denoted as CesA and Csl, respectively. A star denotes the nodes where bootstrap support is less than 50. This cellulose synthase subfamily tree is multi-furcated (or star-shaped). The two cyanobacterial sequences which demonstrate an insertion at the position corresponding to the plant CR-P and are proposed to be phylogenetically in close positions to plant cellulose synthases (Nobles et al. 2001) are underlined. An asterisk indicates the three Streptomyces sequences which constitute a gene cluster with a GH-6 cellulase. The most likely tree found by TREE-PUZZLE is also star-shaped (data not shown)

The cellulose synthase phylogenetic tree, however, was star-shaped, rendering the inference of a relationship between Ci-CesA and other cellulose synthase genes unreliable (Fig. 4). According to this figure, there is a close relationship between the sequences of a fungus, Neurospora crassa, and a DNA virus, Paramecium bursaria Chlorella virus. This similarity is indicative of viral colonization, an activity of DNA viruses which enables the lateral transfer of genes into prokaryotic or eukaryotic genomes (Villarreal and DeFilippis 2000). Given the proposed prokaryotic origin of plant cellulose synthase genes as either cyanobacterial (Nobles et al. 2001) or through endosymbiotic transfer of chloroplast genes to the host nucleus (Martin et al. 2002), our observations could reflect the multiple acquisitions of cellulose synthase genes by eukaryotes in a lateral fashion across the domains of life. Another eukaryotic cellulose synthase gene, as an intron-less ORF of 3.2 kb on the genome of a mycetozoan protist Dictyostelium discoideum (Blanton et al. 2000), is in favor of our view. Current taxon samplings of eukaryotic sequences, however, are poor (see legend of Fig. 1C). For further evaluation of this scenario, more sequences throughout phylogenetically informative taxa, focusing on opisthokonts (animals and fungi) and poorly understood bikonts other than green plants, are necessary.

Evolutionary origin of the animal cellulose

From fossil records, probable stem deuterostome (or stem urochordate) vetulicolians exhibit cuticle-like surfaces as a primitive trait (Lacalli 2002). In contrast, the known oldest urochordate, from lower Cambrian fauna, has a tunic (Shu et al. 2001). This outer sheath is an apomorphic character of extant urochordates, which is the site of cellulose production and accumulation (Kimura and Itoh 1996). According to our working hypothesis, these observations suggest that a bacterial cellulose synthase gene may have been transferred laterally into the genome of the last common ancestor of extant urochordates more than 530 million years ago, the estimated geological time of the fauna.

Through similarity searches in the genome sequence of C. intestinalis, we discovered that this organism lacked the other constituent genes of bacterial cellulose synthase operons, such as the genes encoding cyclic diguanylic acid-binding proteins, necessary for cellulose synthesis in bacteria (Romling 2002). Nonetheless, C. intestinalis synthesizes cellulose. Furthermore, cellulose synthesis in ascidians exhibits unique features, indicating some innovation in their mechanism of cellulose synthesis. While in bacteria terminal complexes are composed of single particles, two types of particles are found in an ascidian (Kimura and Itoh 1996), implying the advent of a new agent in cellulose synthesis. Also, nearly pure monoclinic cellulose (Iβ allomorph) is produced by linearly-arranged terminal complexes (Kimura and Itoh 1996) which, in bacteria, produce triclinic cellulose (Iα allomorph). Amorphous cellulose between terminal complexes and crystalline cellulose fibers also appears to occur (Kimura and Itoh 1996), suggesting the spatial segregation of the two major steps in cellulose biosynthesis, polymerization and crystallization. In conclusion, we propose that urochordates may use a laterally acquired “homologous” gene for an analogous process of cellulose synthesis. It remains unknown how a foreign gene was assimilated into a new genome and recruited (or co-opted) to a morphological novelty, the tunic.