Distinctive architecture of the chloroplast genome in the chlorophycean green alga Stigeoclonium helveticum
- First Online:
- Cite this article as:
- Bélanger, A., Brouard, J., Charlebois, P. et al. Mol Genet Genomics (2006) 276: 464. doi:10.1007/s00438-006-0156-2
- 185 Views
The chloroplast genome has experienced many architectural changes during the evolution of chlorophyte green algae, with the class Chlorophyceae displaying the lowest degree of ancestral traits. We have previously shown that the completely sequenced chloroplast DNAs (cpDNAs) of Chamydomonas reinhardtii (Chlamydomonadales) and Scenedesmus obliquus (Sphaeropleales) are highly scrambled in gene order relative to one another. Here, we report the complete cpDNA sequence of Stigeoclonium helveticum (Chaetophorales), a member of a third chlorophycean lineage. This genome, which encodes 97 genes and contains 21 introns (including four putatively trans-spliced group II introns inserted at novel sites), is remarkably rich in derived features and extremely rearranged relative to its chlorophycean counterparts. At 223,902 bp, Stigeoclonium cpDNA is the largest chloroplast genome sequenced thus far, and in contrast to those of Chlamydomonas and Scenedesmus, features no large inverted repeat. Interestingly, the pattern of gene distribution between the DNA strands and the bias in base composition along each strand suggest that the Stigeoclonium genome replicates bidirectionally from a single origin. Unlike most known trans-spliced group II introns, those of Stigeoclonium exhibit breaks in domains I and II. By placing our comparative genome analyses in a phylogenetic framework, we inferred an evolutionary scenario of the mutational events that led to changes in genome architecture in the Chlorophyceae.
KeywordsChlorophytaPlastid genome evolutionGene orderOrigin of replicationGroup II intronsRepeated sequences
As revealed by the complete chloroplast DNA (cpDNA) sequences that have been reported so far for green plants, the chloroplast genome has evolved much less conservatively in the phylum Chlorophyta than in the Streptophyta. The Chlorophyta (Sluiman 1985) comprises the majority of extant green algae and is divided into four classes: the Prasinophyceae, Ulvophyceae, Trebouxiophyceae and Chlorophyceae. The Prasinophyceae represent the most basal divergence of the Chlorophyta (Friedl 1997; Lewis and McCourt 2004) and, although the branching order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain (Friedl and O’Kelly 2002), analyses of chloroplast genomic features and phylogenetic data derived from mitochondrial genome sequences suggest that the Trebouxiophyceae emerged before the Ulvophyceae and Chlorophyceae (Pombert et al. 2004, 2005, 2006). Complete chloroplast genome sequences have been reported for only six chlorophytes: the prasinophyte Nephroselmis olivacea (Turmel et al. 1999b), the trebouxiophyte Chlorella vulgaris (Wakasugi et al. 1997), two green algae representing distinct basal lineages of the Ulvophyceae, Oltmannsiellopsis viridis (Pombert et al. 2006) and Pseudendoclonium akinetum (Pombert et al. 2005), and also representatives of two different lineages of the Chlorophyceae, Chlamydomonas reinhardtii (Maul et al. 2002) and Scenedesmus obliquus (de Cambiaire et al. 2006). The Streptophyta (Bremer 1985), on the other hand, unites all embryophytes (land plants) and their closest green algal relatives, the members of the class Charophyceae sensu Mattox and Stewart (1984). The currently available chloroplast genome sequences of about 35 photosynthetic land plants and seven charophycean green algae disclosed a high degree of conservation in overall structure and overall gene arrangement (Palmer 1991; Turmel et al. 2002, 2005, 2006). The vast majority of these genomes harbour the same quadripartite structure and gene partitioning pattern, their genes (106–137) are tightly packed, and most of them are grouped into multicistronic operons, several of which are evolutionarily related to those found in cyanobacteria, the progenitors of chloroplasts.
In the Chlorophyta, the chloroplast genome appears to have been progressively remodelled and to have gradually lost the many ancestral features observed in the Streptophyta, with the Prasinophyceae and Chlorophyceae exhibiting the highest and lowest levels, respectively. The gene-rich (128 genes) and compact cpDNA of the prasinophyte Nephroselmis displays the characteristic quadripartite structure and gene partitioning pattern found in streptophyte genomes as well as the great majority of their ancestral operons (Turmel et al. 1999b). This quadripartite structure is characterized by the presence of two copies of a large inverted repeat sequence (IR) separating a small single-copy (SSC) and a large single-copy region (LSC). The chloroplast genome of the trebouxiophyte Chlorella, which encodes 112 genes, has lost the IR, (Wakasugi et al. 1997) but the genes usually found in the IR and each of the single-copy regions have remained clustered together (Pombert et al. 2006). The chloroplast genomes of the two ulvophytes and of the two chlorophycean green algae feature an atypical quadripartite structure. In each ulvophyte genome, one of the single-copy regions features genes characteristic of both the ancestral SSC and LSC regions, whereas the opposite single-copy region contains exclusively genes that are characteristic of the ancestral LSC region (Pombert et al. 2005, 2006). Moreover, the rRNA genes in the IR are transcribed toward the latter region, instead of the SSC region as in the usual quadripartite architecture. From their observations, Pombert et al. (2006) concluded that a dozen genes were transferred from the LSC to the SSC region before or soon after the emergence of the Ulvophyceae and that the transcription direction of the rRNA genes changed. In the chloroplast genomes of the chlorophycean green algae Scenedesmus and Chlamydomonas, single-copy regions of similar sizes harbour sets of genes that are very different from those seen in other green algal genomes, indicating that genes were extensively shuffled between the two ancestral single-copy regions (Maul et al. 2002; de Cambiaire et al. 2006). Although the two chlorophycean genomes differ dramatically in their gene partitioning patterns, they share nearly identical gene repertoires and 11 derived gene clusters containing a total of 32 genes (de Cambiaire et al. 2006). Some of their genes, notably rps3, clpP and rpoB, display novelties (insertion sequences or discontinuities) in their structure. Unlike all other completely sequenced UTC algal cpDNAs that are characterized by the lower density of their genes relative to their Nephroselmis and streptophyte counterparts, the Scenedesmus genome is almost as compact as the Nephroselmis genome (de Cambiaire et al. 2006). Of all the UTC algal cpDNAs examined thus far, Scenedesmus cpDNA features the lowest proportion of short dispersed repeats in intergenic regions (only 8.7%); moreover, another singularity of this genome is the strong tendency of adjacent genes to occur on the same DNA strand (de Cambiaire et al. 2006). Given that Scenedesmus and Chlamydomonas have extremely rearranged genomes and do not represent basal lineages in the phylogeny of the Chlorophyceae (Buchheim et al. 2001; Shoup and Lewis 2003), the ancestral condition of the chloroplast genome could not be inferred for this class.
Phylogenetic analyses of the nuclear-encoded small subunit and large subunit rRNA genes indicate that the Chlorophyceae comprise at least five major groups that generally correspond to currently recognized orders of families (Buchheim et al. 2001; Shoup and Lewis 2003). The Chlamydomonadales and Sphaeropleales [also designated as the clockwise (CW) and directly opposed (DO) flagellar apparatus clades], which are represented by Chlamydomonas and Scenedesmus respectively, apparently share a sister-relationship. The Chaetophorales, Oedogoniales and Chaetopeltidales are basal relative to the Chlamydomonadales and Sphaeropleales; however, the precise divergence order of these three monophyletic groups remains unknown (Buchheim et al. 2001; Shoup and Lewis 2003). To identify some of the forces and major events that shaped the chloroplast genome during the evolution of chlorophyceans, we have determined the complete cpDNA sequence of Stigeoclonium helveticum, a member of the Chaetophorales. Motile cells in this group are quadriflagellated and polymorphic for flagellar orientation (DO + CW) (Watanabe and Floyd 1989). We found that the Stigeoclonium genome is extremely rearranged relative to its Scenedesmus and Chlamydomonas homologues and harbours the fewest ancestral features among all completely sequenced cpDNAs. This IR-lacking genome, which represents the largest chloroplast genome ever sequenced, displays a number of distinctive traits, including a strong bias in gene content and base composition of the DNA strands that is consistent with bidirectional replication from a single origin.
Materials and methods
Strain and culture conditions
Stigeoclonium helveticum was obtained from the Culture Collection of Algae at the University of Texas at Austin (UTEX 441) and grown in modified Volvox medium (McCracken et al. 1980) under 12 h light/dark cycles.
Isolation and sequencing of cpDNA
A + T-rich organelle DNA was separated from nuclear DNA by CsCl-bisbenzimide isopycnic centrifugation (Turmel et al. 1999a). Both the chloroplast and mitochondrial genomes were completely sequenced as described previously (Pombert et al. 2004), using as templates plasmid clones originating from the organelle DNA fraction as well as PCR fragments spanning uncloned regions. Sequences were edited and assembled with SEQUENCER 4.2.1 (GeneCodes, Ann Arbor, MI, USA). To ensure that the sequence assembly of each genome is correct, we ascertained that the sizes of overlapping regions encompassing the whole genome sequence matched perfectly those of the corresponding regions amplified by PCR.
Analyses of genome sequence
Gene content was determined by BLAST homology searches (Altschul et al. 1990) against the nonredundant database of the National Center for Biotechnology and Information (NCBI) server. Protein-coding genes and open reading frames (ORFs) were localized precisely using ORFFINDER at NCBI, various programs of the Wisconsin package version 10.3 (Accelrys, San Diego, CA, USA) and other applications from the EMBOSS version 2.9.0 package (Rice et al. 2000). Genes coding for tRNAs were localized using tRNAscan-SE 1.23 (Lowe and Eddy 1997). Intron boundaries were determined by modelling intron secondary structures (Michel et al. 1989; Michel and Westhof 1990) and by comparing intron-containing genes with intronless homologues using FRAMEALIGN of the Wisconsin package. Homologous introns were detected by BLASTN searches (Altschul et al. 1990) against the non-redundant database of NCBI.
Repeated sequences were mapped with PipMaker (Schwartz et al. 2000). Repeats were identified with REPuter 2.74 (Kurtz et al. 2001) using the −f (forward), –p (palindromic) and –allmax options at minimum lengths (−l) of 30 and 45 bp and were classified with REPEATFINDER (Volfovsky et al. 2001). Number of copies of each repeat unit was determined with FINDPATTERNS of the Wisconsin package. Stem-loop structures and direct repeats were identified using PALINDROME and ETANDEM in EMBOSS 2.9.0 (Rice et al. 2000), respectively. Genomic regions containing non-overlapping repeated elements were identified with RepeatMasker (http://www.repeatmasker.org) running under the WU-BLAST 2.0 (http://www.blast.wustl.edu) search engine.
The sidedness index (Cs) was determined as described by Cui et al. (2006) using the formula Cs = (n − nSB)/(n − 1), where n is the total number of genes in the genome and nSB is the number of sided blocks, i.e. the number of blocks including adjacent genes on the same strand. The strand bias in base composition was calculated for the whole genome and for intergenic regions. For the entire genome sequence (GenBank accession number DQ630521), the sum of values (G − C)/(G + C), where C and G represent the number of occurrences of these two nucleotides, was calculated for windows of length 5,000, starting with nucleotides 50,000 to 55,000 and continuing by shifting 500 nucleotides downstream along the strand for each new window. For intergenic regions, the value (G − C)/(G + C) was calculated separately for each region.
All conserved gene pairs exhibiting identical gene polarities in green algal cpDNAs were identified using a custom-built program. The GRIMM web server (Tesler 2002) was used to infer the minimal number of gene permutations by inversions in pairwise comparisons of chloroplast genomes. Because GRIMM cannot deal with duplicated genes and requires that the compared genomes have the same gene content, genes within one of the two copies of the IR were excluded and only the genes common to all the compared genomes were analysed. The data set used in the comparative analyses reported in Supplementary Table S3 contained 89 genes; pieces of rpoB and all exons of the genes containing trans-spliced introns were coded as distinct fragments (for a total of 96 gene loci).
General features of Stigeoclonium and other UTC algal cpDNAs
A + T (%)
Coding sequences (%)d
Gene content and gene structure
Relative to Scenedesmus and Chlamydomonas cpDNAs, Stigeoclonium cpDNA encodes four additional genes [rpl32, psaM, trnL(caa) and trnS(gga)] but lacks petA, a gene present in all previously sequenced chlorophyte cpDNAs (Supplementary Table S1). Like Chlamydomonas cpDNA, it is missing the infA and rpl12 genes that are present in Scenedesmus and other chlorophyte cpDNAs. All three chlorophycean cpDNAs lack six genes (accD, chlI, minD, psaI, rpl19 and ycf20) that have been retained in the genomes of the three other UTC algae examined thus far. Moreover, like their two ulvophyte homologues, they are missing four genes [cysA, cysT, trnL(gag) and trnT(ggu)] relative to the chloroplast genome of the trebouxiophyte Chlorella.
Numerous genes in the Stigeoclonium genome (cemA, clpP, ftsH, rpoA, rpoB, rpoC1, rpoC2, rps18, rps3, rps4 and ycf1) have expanded coding regions relative to their Mesostigma and Nephroselmis homologues. Most of these genes have been previously identified in other UTC algae (Pombert et al. 2005, 2006; de Cambiaire et al. 2006). Three genes (clpP, rps3 and rps4) display enlarged coding regions only in members of the Chlorophyceae (Supplementary Table S2). The Stigeoclonium rps4 gene is unusual in carrying an insertion sequence that is about 12-fold larger than those present in Scenedesmus and Chlamydomonas cpDNAs. Owing to its considerable size (340 kDa), the full-length protein sequence predicted from Stigeoclonium rps4 is not likely to represent a functional ribosomal protein. On the other hand, our findings that the 5′ and 3′ termini of this gene share sequence homology with virtually the entire Escherichia coli rpsD gene and that its reading frame is maintained over more than 8 kb argue against the idea that Stigeoclonium rps4 is a pseudogene. If this green algal gene is functional, then the sequence of its large expansion element would be expected to be excised at the RNA or protein level. Obviously, in the absence of evidence for a putative intron or intein element in Stigeoclonium rps4, no firm conclusion can be drawn regarding the functional status of this gene.
Like its Scenedesmus and Chlamydomonas counterparts, the rpoB gene in Stigeoclonium cpDNA consists of two separate ORFs that are not associated with sequences typical of group I or group II introns; however, instead of being contiguous, these ORFs are distant from one another in the Stigeoclonium genome (Fig. 1). In contrast to the Scenedesmus and Chlamydomonas rps2 genes and the Chlamydomonas rpoC1, which also consist of distinct ORFs bordered by sequences unrelated to conventional introns, the corresponding genes in Stigeoclonium display a continuous structure. In addition to rpoB, the petD, psaC and rbcL genes occur as dispersed pieces in Stigeoclonium cpDNA (Fig. 1); in all three cases, each gene piece consists of an exon bordered by the 5′ or 3′ portion of a putatively trans-spliced group II intron.
Bias in gene coding regions and base composition of the two DNA strands
Like their Scenedesmus homologues, genes in Stigeoclonium cpDNA show a remarkably strong bias in their distribution between the two DNA strands (Fig. 1). The 59 consecutive genes in the 113.6 kb segment extending from tufA to trnS(gga), with the exception of trnL(uag) and trnMf(cau), are located on one strand, whereas all the other genes reside on the other strand. The sidedness index (Cs), i.e. the propensity of adjacent genes to be located on the same strand (Cui et al. 2006), is significantly higher in Stigeoclonium cpDNA (Cs = 0.9479) than that reported for Scenedesmus cpDNA (Cs = 0.8842).
The cumulative GC skew analyses of prokaryotic genomes display the same profile as that reported here for the Stigeoclonium chloroplast genome (Grigoriev 1998). For prokaryotic genomes, it has been shown that the minimum and maximum coincides with the origin and terminus of replication (Grigoriev 1998) and that a majority of genes are encoded on the leading strand and are therefore transcribed in the same direction as the genome replication, a property termed the coorientation rule. The leading strand is richer in G than in C relative to the opposite strand most probably because it is subject to more frequent C deaminations during the time it remains temporarily single-stranded during gene transcription and chromosome replication (Guy and Roten 2004). Given the striking similarity between the plots of cumulative GC skew obtained for the Stigeoclonium and prokaryotic genome sequences, it is likely that the Stigeoclonium genome replicates bidirectionally from a single origin situated in the trnS(gga)-rrs spacer. It should be noted that our analysis of the cumulative GC skew for the IR-containing cpDNAs of Scenedesmus and Chlamydomonas did not disclose any putative origin and terminus of replication that are consistent with a bidirectional mode of replication, although adjacent genes tend to be encoded on the same DNA strand (Cui et al. 2006; de Cambiaire et al. 2006). The high level of strandedness in the latter chlorophycean chloroplast genomes has probably been generated by selection to regulate gene expression by favouring the formation of long, multicistronic transcripts.
Disruptions of linearity, detected as local minima and maxima, are visible in the plot of cumulative GC skew of the Stigeoclonium genome (Fig. 2a). Interestingly, these distortions correspond to expanded regions in the ftsH, rpoC1, rpoC2, rps4 and ycf1 genes. As demonstrated for two E. coli strains (Grigoriev 1998), they possibly represent recent genome rearrangements such as inversions or horizontally acquired sequences.
As observed previously for Scenedesmus and Chlamydomonas cpDNAs (Maul et al. 2002; de Cambiaire et al. 2006), the chloroplast genome of Stigeoclonium does not reveal any remnant of the ancestral gene partitioning pattern displayed by Mesostigma, Nephroselmis and streptophyte cpDNAs. In Fig. 1, it can be seen that homologues of the genes residing in the SSC and LSC regions of the Mesostigma genome are widely dispersed throughout the Stigeoclonium genome. In contrast, most of these genes in Chlorella and Pseudendoclonium cpDNAs have remained clustered together despite significant changes in genome architecture (Pombert et al. 2006).
An alternative approach for comparing the degrees of similarity displayed by different genomes with respect to their gene order is to estimate the number of gene permutations that would be required to convert the gene order of a given genome to that of another genome. The data obtained with this approach corroborate the notion that the gene organization of Stigeoclonium cpDNA diverges radically from those of previously sequenced chlorophyte genomes (Supplementary Table S3). We estimated that more than 80 inversions would be required to convert the gene order of Stigeoclonium cpDNA into that of any other chlorophyte cpDNA. All the additional pairwise comparisons we carried out yielded reduced numbers of inversions, with the fewest (43 inversions) being obtained in the comparison of the Mesostigma and Nephroselmis genomes. With 58 inversions distinguishing the Scenedesmus and Chlamydomonas cpDNAs, these chlorophycean genomes are clearly more similar to one another than each of these genomes is to its Stigeoclonium homologue.
Group I introns
Group II introns
The five group II introns of Stigeoclonium vary from 654 to 1,918 bp in size and reside within psaC, psaJ, petD and rbcL. Each of these genes is interrupted by one intron, with the exception of rbcL. Positionally homologous introns have not been identified in other chloroplast genomes (Fig. 4); this is the first report indicating the presence of group II introns in psaC, psaJ and rbcL. All five Stigeoclonium group II introns lack an ORF ≥ 100 codons and all, except the psaJ intron, are discontinuous. The second intron in rbcL is split in domain II, whereas the sites of discontinuity of the other introns map to various locations within domain I (Supplementary Fig. S1). The second intron in rbcL and the cis-spliced psaJ intron were classified into the subgroup IIA according to the nomenclature proposed by Michel et al. (1989), whereas the petD intron was classified into the subgroup IIB. The two remaining introns could not be categorized into any of these subgroups because they exhibit characteristics of both subgroups. No close structural relationship was identified among the five group II introns.
Abundance of repeats in Stigeoclonium and other UTC algal cpDNAs
Maximal size of repeats (bp)
Number of repeatsa
Non-overlapping repeats ≥ 30 bpb
Total size (bp)
Fraction of genome (%)
Fraction of intergenic regions (%)
Repeat units in Stigeoclonium cpDNA
Distinctive features of the Stigeoclonium chloroplast genome
Although the Stigeoclonium chloroplast genome shares several derived features with Chlamydomonas and Scenedesmus cpDNAs, it displays a number of distinctive traits. Stigeoclonium cpDNA is the largest chloroplast genome yet sequenced and in contrast to its two chlorophycean counterparts, features no IR. Genes that are usually part of ancestral clusters in green algal cpDNAs have been reshuffled to a significantly greater extent in the Stigeoclonium genome than in Scenedesmus and Chlamydomonas cpDNA and virtually all of the derived clusters identified in the latter algae are absent from the Stigeoclonium genome (Fig. 3, Supplementary Table S3). The distribution of the Stigeoclonium genes between the two DNA strands shows an almost perfect symmetry (Fig. 1) and most remarkably, the gene-encoding strand on each half of the genome is richer in G than in C compared to the alternate strand (Fig. 2). Another distinctive feature of the Stigeoclonium chloroplast genome is its large set of introns (21 introns vs. 9 in Scenedesmus and 7 in Chlamydomonas), which includes four putatively trans-spliced group II introns that have no homologues in other green algal cpDNAs (Fig. 4). As each of these group II introns consists of two pieces that are far apart on the genome, two distinct precursor transcripts, each containing an intron piece, presumably assemble at the site of discontinuity of the intron via base-pairings and tertiary interactions to reconstitute the intron structure required for splicing.
Considering that the presence of an rDNA-encoding IR is a prominent feature of the chloroplast genome in diverse green algal and plant lineages and that its absence from some lineages has been attributed to independent losses (Palmer and Thompson 1981; Palmer et al. 1987; Lidholm et al. 1988; Strauss et al. 1988; Turmel et al. 2005), we infer that an IR was present in the chloroplast genome of the common ancestor of the green algae belonging to the Chlamydomonadales, Sphaeropleales, and Chaetophorales but was lost in the lineage leading to Stigeoclonium (Chaetophorales). As the IR is thought to play a major role in stabilizing gene order (Palmer and Thompson 1982; Strauss et al. 1988; Palmer 1991), it is perhaps not surprising that the Stigeoclonium chloroplast genome is extremely rearranged relative to Scenedesmus and Chlamydomonas cpDNAs. To account for the highly scrambled gene order observed in the great majority of previously documented green plant cpDNAs lacking an IR (Palmer and Thompson 1982; Strauss et al. 1988; Wakasugi et al. 1994; Turmel et al. 2005), it has been hypothesized that the loss of the IR enhances opportunities for intramolecular recombination between homologous sequence elements such as short dispersed repeats (Palmer 1991). Therefore, according to this hypothesis, both the absence of the IR and the great abundance of short dispersed repeats in the Stigeoclonium genome are important factors that influenced the order of genes and gene pieces.
The mode of DNA replication appears to be an additional factor that contributed to the unusual arrangement of genes in the Stigeoclonium genome, in particular to the strand bias in coding regions. Both the strand biases in coding regions and in GC composition displayed by this algal genome are typical of those observed in prokaryotic genomes that replicate bidirectionally from a single origin (Grigoriev 1998; Tillier and Collins 2000a, b; Guy and Roten 2004). Analysis of the cumulative GC skew has allowed us to map a putative replication origin in the trnS(gga)-rrs intergenic region and a putative terminus in the psbD-tufA intergenic region (Figs. 1, 2). Further work will be needed to determine whether the intergenic spacer upstream of the small subunit rRNA gene (rrs) functions as an origin and whether the unique direct repeats and potential stem-loop structure found at this locus are essential for replication. Evidence for bidirectional replication from a single origin based on GC skew analysis has been reported for only two other IR-lacking chloroplast genomes showing a coding strand bias, the genome of the euglenoid Euglena gracilis whose plastids were acquired by secondary endosymbiosis from a green alga (Morton 1999) and the genome of the parasitic green alga Helicosporidium sp. (Trebouxiophyceae) (de Koning and Keeling 2006). Consistent with the GC skew analysis of Euglena cpDNA, previous electron microscopic analysis of replication intermediates had suggested that this genome is replicated bidirectionally from a single origin (near the repeated rRNA genes) to a terminus on the opposite side of the circular genome (Koller and Delius 1982; Ravel-Chapuis et al. 1982). As in Stigeoclonium cpDNA, the putative origin of bidirectional replication in the reduced genome of Helicosporidium has been located just upstream of the rrs gene. In contrast, studies of cpDNA replication in Chlamydomonas and various land plants indicate that these genomes replicate by a mechanism different than that used by prokaryotic genomes (Heinhorst and Cannon 1993; Kunnimalaiyaan and Nielsen 1997). Except for Euglena cpDNA, all chloroplast genomes that were examined have been found to contain multiple origins whose number and locations may vary in different organisms.
Prior to our study, the only known trans-spliced group II introns in chlorophyte cpDNAs were the bipartite introns occupying the same site in the Scenedesmus and Chlamydomonas psaA genes (Kück et al. 1987; de Cambiaire et al. 2006) and the tripartite intron inserted at a distinct site in the Chlamydomonas psaA (Kück et al. 1987; Goldschmidt-Clermont et al. 1991; Turmel et al. 1995a). Most other trans-spliced group II introns are bipartite and have been documented mainly in land plant mitochondrial genomes. Interestingly, cis-spliced versions of these mitochondrial introns have been found in some land plant taxa, supporting the notion that disruption of ancestral cis-spliced introns gave rise to trans-spliced introns (Malek et al. 1997; Malek and Knoop 1998). Not only was the finding of four bipartite group II introns in Stigeoclonium cpDNA unexpected; it was also surprising that the sites of discontinuities of these introns lie within domain I or II, because the majority of reported trans-spliced group II introns are fragmented within domain III (Michel et al. 1989) or IV (Michel and Ferat 1995). Only the tripartite introns in Chlamydomonas chloroplast psaA (Goldschmidt-Clermont et al. 1991; Turmel et al. 1995a) and in Oenothera mitochondrial nad5 (Knoop et al. 1997) are known to have a break within domain I; the central fragments of these introns encompass part of domain I, the entire domain II and III, and part of domain IV. To our knowledge, no discontinuity within domain II of group II introns has been documented thus far.
Evolution of the chlorophycean chloroplast genome
When our comparative analysis of the Stigeoclonium, Scenedesmus and Chlamydomonas chloroplast genomes is placed in a phylogenetic framework, we find that a number of mutational events can be inferred during the evolution of chlorophycean green algae. Our recent phylogenetic analyses of genes and proteins derived from chloroplast genome sequences of green algae representing the four chlorophyte classes revealed that Stigeoclonium occupies a basal position relative to a clade uniting Scenedesmus and Chlamydomonas (our unpublished results). This topology, which was found to be very robust regardless of the methods of analysis used, is supported by several cpDNA features (Fig. 5). For example, the affiliation of Chlamydomonas and Scenedesmus to the same clade is supported by the five sets of traits that these algal cpDNAs have in common but that are lacking from Stigeoclonium cpDNA and other chlorophyte cpDNAs: (1) the absence of four genes, (2) the presence of a duplicated trnE(uuc) gene, (3) the presence of a trans-spliced group II intron at site 267 in psaA, (4) the absence of two ancestral gene pairs and the presence of 17 derived gene pairs (see Fig. 3) and (5) the split of rps2 into two separate ORFs. Following the split of the Chlamydomonadales and Sphaeropleales, the chloroplast genome sustained no further changes in the Scenedesmus lineage, except the acquisition of a cis-spliced group II intron in petD (Kück 1989). In the Chlamydomonas lineage, a second trans-spliced group II intron was gained by psaA (Kück et al. 1987), two genes were lost and rpoC1 was split into two separate ORFs. The distinctive traits displayed by the Stigeoclonium cpDNA probably reflect events that occurred specifically during the evolution of the Chaetophorales. These events include the insertion of five group II introns, the fragmentation of four of these introns, the loss of three genes, the loss of the IR as well as the losses of eight ancestral gene pairs (Fig. 5).
The branching order reported here for the Chaetophorales, Sphaeropleales and Chlamydomonadales is congruent with the current hypothesis for the divergence order of chlorophycean lineages as inferred from the nuclear-encoded small subunit and large subunit rRNA gene sequences (Buchheim et al. 2001; Shoup and Lewis 2003). According to this hypothesis, the evolution of a polymorphic DO + CW condition for the flagellar apparatus in the basal lineage represented by Stigeoclonium (Chaetophorales) became fixed for the CW condition in the Chlamydomonadales and for the DO condition in the Sphaeropleales. Of course, to better understand how the CW and DO organizations of basal bodies found in these chlorophycean lineages originated from the counterclockwise organization observed in trebouxiophytes and ulvophytes, a robust phylogeny encompassing all identified chlorophycean lineages will be required. Sequencing of the chloroplast genome from additional chlorophycean taxa would not only be useful to unravel the branching order of the major chlorophycean lineages but would also throw light into the most ancestral condition of this organelle genome in the Chlorophyceae.
We thank Jean-François Pombert and Jean-Charles de Cambiaire for their help with the sequence analyses and for critical reading of the manuscript. This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (to C.L. and M.T.).