The chloroplast genome from a lycophyte (microphyllophyte), Selaginella uncinata, has a unique inversion, transpositions and many gene losses
- First Online:
- Cite this article as:
- Tsuji, S., Ueda, K., Nishiyama, T. et al. J Plant Res (2007) 120: 281. doi:10.1007/s10265-006-0055-y
- 279 Views
We determined the complete nucleotide sequence of the chloroplast genome of Selaginella uncinata, a lycophyte belonging to the basal lineage of the vascular plants. The circular double-stranded DNA is 144,170 bp, with an inverted repeat of 25,578 bp separated by a large single copy region (LSC) of 77,706 bp and a small single copy region (SSC) of 40,886 bp. We assigned 81 protein-coding genes including four pseudogenes, four rRNA genes and only 12 tRNA genes. Four genes, rps15, rps16, rpl32 and ycf10, found in most chloroplast genomes in land plants were not present in S. uncinata. While gene order and arrangement of the chloroplast genome of another lycophyte, Hupertzia lucidula, are almost the same as those of bryophytes, those of S. uncinata differ considerably from the typical structure of bryophytes with respect to the presence of a unique 20 kb inversion within the LSC, transposition of two segments from the LSC to the SSC and many gene losses. Thus, the organization of the S. uncinata chloroplast genome provides a new insight into the evolution of lycophytes, which were separated from euphyllophytes approximately 400 million years ago.
KeywordsChloroplastLycophytePseudogeneSelaginella uncinataSelaginellaceaetRNA genes
Chloroplasts contain their own genomes mainly coding for proteins involved in the management of photosynthesis. The first complete nucleotide sequences of chloroplast genomes were reported in tobacco (Shinozaki et al. 1986) and liverwort (Ohyama et al. 1986). Since then, a variety of additional genomes have been completely sequenced (http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/plastids.html, http://www.chloroplast.cbio.psu.edu/). Genome organization is highly conserved within land plants (Palmer 1991; Sugiura 1992; Raubeson and Jansen 2005). Their size is mostly conserved at 120–160 kb with some exceptions, and they contain approximately 120 genes in general. The chloroplast genomes of most land plants are partitioned into a quadripartite architecture with a large inverted repeat (IR) separated by a large and a small single copy region (LSC and SSC, respectively). Although in most lineages gene order and arrangement remain unchanged, several lineages have highly rearranged chloroplast genomes, such as the almost-complete loss of the IR in conifers (Wakasugi et al. 1994), two inversions of the IR clarifying basal nodes in leptosporangiate ferns (Hasebe and Iwatsuki 1992; Stein et al. 1992), a 28 kb inversion restricted to the Poaceae and three other monocotyledonous families (Hiratsuka et al. 1989; Doyle et al. 1992), and so on (Raubeson and Jansen 2005). Thus, the divergence of gene arrangements provides insight into the mechanisms of genome rearrangements and is useful for inferring phylogenetic relationships.
Extant vascular plants evolved from ancestral vascular-free land plants in the early Silurian, approximately 430 million years ago (Kenrick and Crane 1997), and acquired ecological dominance resulting in great diversification. Vascular plant megafossils from the Silurian lack roots and leaves which are shared in extant vascular plants. Instead, they had bifurcatedly branched stem-like organs in both aerial and terrestrial parts. Leaf-like lateral organs subsequently evolved into at least two morphologically different extant vascular plant lineages (Gifford and Foster 1989). One is microphyllous leaves in lycophytes (microphyllophytes) and the other megaphyllous leaves in euphyllophytes, which includes monilophytes, gymnosperms, and angiosperms. Paleobotanical records and phylogenetic analyses based on both morphological and molecular data indicated that these two lineages diverged in the Silurian (Kenrick and Crane 1997). Although lycophytes were dominant in the land flora of the Carboniferous, only three lineages, including a small number of species in the Selaginellaceae, Lycopodiacea, and Isoetaceae, are extant in the present land flora.
Compared to a relatively rich knowledge of bryophyte and euphyllophyte chloroplst DNAs, little is known about lycophyte chloroplast DNAs. The completely sequenced genome of Hupertzia lucidula (Wolf et al. 2005a), a member of the Lycopodiaceae, contains mostly similar genes in a similar arrangement to bryophytes, a liverwort Marchantia polymorpha (Ohyama et al. 1986), a moss Physcomitrella patens (Sugiura et al. 2003), and a hornwort Anthoceros formosae (Kugita et al. 2003a). These genomes share a 30 kb inversion including the ycf2-rpoB region in the LSC, unlike euphyllophyte chloroplst genomes previously reported (Palmer and Stein 1986; Raubeson and Jansen 1992). This genome arrangement, together with the gene content and order shared in the bryophytes and H. lucidula, is plesiomorphic in land plants. However, chloroplast genomes in two other lycophyte lineages, which diverged in the Carboniferous approximately 300 million years ago, have not been reported and it is not clear whether the genome features observed in H. lucidula are common to the three lycophyte families.
In this study, we determined the complete nucleotide sequence of the chloroplast genome from Selaginella uncinata (Selaginellaceae). Its genome structure is quite different from that of H. lucidula with respect to the presence of a 20 kb inversion, transpositions, and many gene losses.
Materials and methods
Selaginella uncinata was collected from a population at Ishinomaki-hagihara in Toyohashi City, Aichi, Japan, in 2002. Previous karyomorphological observations revealed that S. uncinata in Japan had two non-homologous, structurally changed chromosomes in addition to eight pairs of homologous chromosomes, and that microspores were irregular in shape (Takamiya 1993). The present population is likely maintained by vegetative propagation, because no fertile microspores are produced (M. Takamiya, personal communication).
Leaves were powdered in liquid nitrogen, and DNA was extracted from chloroplasts isolated by sucrose gradient centrifugations (Saltz and Beckman 1981). The chloroplast DNA was directly shotgun sequenced by the dideoxy chain termination method (BigDye Terminator Cycle Sequencing Kit, PE Applied Biosystems) using an ABI 3100 Genetic Analyzer. Alternatively, the extracted DNA was digested with KpnI, SalI or EcoRI restriction enzymes, and the digested fragments were cloned into pUC19. Inserted DNA segments were shotgun sequenced, and were identified as chloroplast DNA based on their sequence similarity to other chloroplast genomes. The remaining gap regions, after having assembled the DNA sequences obtained from both methods, were amplified by PCR and resulting products were directly sequenced or shotgun sequenced. DNA fragments that were shotgun sequenced were confirmed that all DNA segments were entirely connected as follows: when an overlapping region of two DNA segments was less than 100 bp in size, a bridge DNA fragment spanning both DNA segments was amplified by PCR using S. uncinata DNA as a template and was directly sequenced. Furthermore, we also verified the order of genes of the S. uncinata chloroplast genome that differed from those of H. lucidula by direct sequencing of re-amplified S. uncinata DNA. A table of primers used for genome DNA sequencing is available as supplementary Table S1.
The sequences were assembled using a Phred/Phrap function (Ewing et al. 1998; Ewing and Green 1998) in DNASIS Pro (Hitachi Software Engineering). Gene annotations and comparative genome analyses were performed using the current versions of various BLAST (BLASTN, PHI-BLAST, BLASTX) (Altschul et al. 1997). The locations and secondary structures of tRNA genes were evaluated using the tRNAscan-SE (version 1.21) program (Lowe and Eddy 1997) and Aragorn (http://www.eddie.thep.lu.se/aragorn/aragorn.html).
Complete chloroplast genome sequences from 35 land plants and their accession numbers used for comparison with the gene content of the S. uncinata chloroplast genome are shown in the online supplementary Table S2.
The genome has an overall 55% G–C content, which is the highest among the chloroplast genomes that have been completely sequenced so far [from 20% in Eimeria tenella (alveolata) (Cai et al. 2003) to 42% in Adiantum capillus-veneris (embryophyta) (Wolf et al. 2003) and Nephroselmis olivacea (chlorophyta) (Turmel et al. 1999)]. The G–C content in the coding regions (55%) is similar to that in the non-coding intergenic regions. The G–C content of the IR regions (57%) containing four rRNA (58%) genes is slightly higher than that of the LSC (54%) and the SSC (54%). Thus, an unusually high G–C content is not due to the presence of a particular G–C rich region. A G–C bias of the rRNA genes is similar to that found in other land plants since an exceptionally slight G–C bias is characteristic of rRNA genes in other land plant chloroplast genomes.
Another unique feature of the overall structure is the relatively long SSC (28.4% of the total length)—caused by the transposition of some segments from the LSC to the SSC as described below—compared with other pteridophyte species, Huperzia lucidula (12.7%) (Wolf et al. 2005a), A. capillus-veneris (14.2%) (Wolf et al. 2003), and Psilotum nudum (11.7%).
List of genes found in Selaginella uncinata chloroplast genome
Group of gene
Name of gene
An internal stop codon is found in the putative rpoA gene resulting in the coding of a smaller protein (245 amino acid residues) than RpoA proteins (about 340 amino acid residues) in other species unless the stop codon is corrected by RNA editing. While an amino acid sequence deduced from this open reading frame is apparently similar to the one conserved among other RpoA proteins, an amino acid sequence behind the stop codon is quite different. We doubt whether this putative RpoA protein is functional.
In addition to these pseudogenes, five genes, rpl32, rps15, rps16, ycf10 (cemA) and ycf66, which are present in another lycophyte, H. lucidula (Wolf et al. 2005a), were not found in the S. uncinata chloroplast genome. Most of the matK genes in land plants are present in the intron of trnK-UUU sequences. However, trnK-UUU is not present in S. uncinata as in A. capillus-veneris (Wolf et al. 2003).
A unique feature of the S. uncinata chloroplast genome is that more than 50 putative protein-coding genes do not have their canonical start codons and/or stop codons at the expected positions based on alignments with corresponding genes in the chloroplast genomes of other taxa. A triplet ACG, which is changed into a start codon by C to U RNA editing, exists near the position of the expected start codon. Furthermore, conserved amino acid sequences in land plants are largely disturbed in putative proteins encoded by the S. uncinata chloroplast genome. These observations suggest that a large amount of RNA editing, by which nucleotide sequences of RNA transcripts are post-transcriptionally altered, occurs in the S. uncinata chloroplast. Since cDNA analyses of putative protein-coding genes are in progress, precise RNA editing sites will be published later on.
The psbK and trnQ (UUG) genes are duplicated in the S. uncinata chloroplast genome. One is in the LSC and another in the SSC, and they are separated by 5.6 kb with the same orientation. The duplicated sequence spans 2.7 kb and contains trnQ (UUG) in the upstream region from psbK and the 5′ part of chlL. Gene duplications in a single-copy region have also been reported for psaM (Wakasugi et al. 1994) and psbA (Lidholm et al. 1991) in Pinus species.
While organization of the chloroplast genome from a lycophyte, H. lucidula, is mostly similar to that of bryophytes (Fig. 4c), the S. uncinata chloroplast genome has a unique 20-kb inversion from psbI (gene 12) to trnC (gene 11) in the LSC (Fig. 4a). The region of this inversion is included in that of the 30-kb inversion from ycf2 (gene 19) to psbM (gene 9) that commonly occurs in euphyllophytes, including higher ferns and seed plants as shown in Fig. 4d (Palmer and Stein 1986; Raubeson and Jansen 1992). The resulting gene order is well conserved in the whisk fern, P. nudum. The difference between these two inversions is that both termini [psbM (gene 9) and the region from ycf2 (gene 19) to trnK (gene 17)] of the 30-kb inversion remain in their original positions in S. uncinata as in bryophytes and H. lucidula. Furthermore, considering the gene order in the S. uncinata genome, it should be noted that petN (gene 10) is present in the SSC with trnQ-psbK (genes 14-13). These genes are located in the LSC, but petN is more than 20 kb apart from trnQ-psbK in a hornwort, A. formosae, and in H. lucidula. The direction of petN (gene 10) transcription is the same as that of psbK (gene 13), but opposite to that of trnQ (genes 14) in A. formosae and H. lucidula (Fig. 4c,d). This situation is opposite to that in S. uncinata (Fig. 4a). Therefore, a 20-kb segment inversion from trnQ, and not psbI, to rpoB-trnC (gene 11) seems likely to be the first mutational event so that petN (gene 10) becomes fused to trnQ-psbK as shown in Fig. 4b. The resultant petN-trnQ-psbK (genes 10-14-13) was followed by transposition to the SSC. This transposition might have broken the putative psbKI operon. Since the boundaries of most gene rearrangements in the chloroplast genomes of land plants lie between operons and not within them (Palmer 1991), this transposition represents a rare gene rearrangement. Furthermore, a 2.7-kb trnQ-psbK region was probably duplicated and transferred to the position neighboring chlB (gene 15) in the LSC, since the trnQ-psbK segment in that region contains the 5′ part of chlL (gene 6) that was originally located in the SSC.
In a previous study (Raubeson and Jansen 1992) in which endpoints of the 30-kb inversion in various land plants were determined, single restriction fragments hybridized with both a DNA probe containing psbA (gene 18) (inside the inversion) and a probe containing psbC (between gene 20 and 21; outside the inversion) in Lycopodium and Isoetes, both lycophytes, but not in S. uncinata. Transposition of a 17-kb trnD-rps4 (genes 20-22) segment containing psbC from the LSC to the SSC would explain why such a single restriction fragment was not found in Selaginella. And it would suggest that extensive rearrangements have occurred in the Selaginella chloroplast genome after separation from the Isoetaceae, which is sister to the Selaginellaceae in the lycophytes (Pryer et al. 2001). Transposition of the 17-kb region may be autoapomorphic in the Selaginellaceae among lycophytes.
We found a total of 15 tRNA-coding genes corresponding to only 12 amino acids (including fMet) (Table 1). This is in remarkable contrast with the presence of 31 tRNA genes in H. lucidula (Wolf et al. 2005a) (although 29 tRNA genes were originally documented, trnG-GCC was also found between trnR-UCU and ycf12 in addition to trnI-GAU as described above). No intron-containing tRNA genes were detected. It has been reported that some chloroplast genomes lack some tRNA genes, like trnK and trnR in A. capillus-veneris (Wolf et al. 2003) and Lotus japonicus (Kato et al. 2000), respectively. One possible explanation is that post-transcriptional editing could alter an anticodon of a different tRNA to create a missing tRNA (Wolf et al. 2003), but it would be unlikely in the case of the S. uncinata chloroplast genome lacking tRNA genes for nine kinds of amino acids. These missing tRNA genes might be located on the S. uncinata nuclear genome. Since plant mitochondria DNA only encodes some of the tRNAs required for protein synthesis within the mitochondrion, the missing tRNA must be imported (Maréchal-Drouard et al. 1993; Glover et al. 2001; Delage et al. 2003). A similar import system of tRNA for chloroplasts might exist in S. uncinata as was speculated for a nonphotosynthetic parasistic angiosperm, Epifagus virginiana (Morden et al. 1991), in which many tRNA genes were lost or retained as pseudogenes. An alternative explanation is that a high frequency of RNA editing occurs in the process of tRNA maturation. If indeed this is the case, present software for searching tRNA might have difficulty finding tRNA genes, especially intron-containing tRNA genes, since each exon split by an intron contains only 30–40 bp. However, we found similar sequences to some tRNA genes including introns in loci expected from the gene orders in other taxa (Fig. 2). Transfer to nuclear genomes is likely to have given rise to the presence of such “vestigial” tRNA genes. Since a whole genome shotgun sequencing project of Selaginella moellendorfii is in progress (http://www.ucjeps.berkeley.edu/TreeofLife/data_table.php), the sequences of its nuclear and chloroplast genomes might provide us with supplementary information on tRNA genes for protein synthesis inside chloroplasts.
A comparison with H. lucidula revealed that rps15, rps16, rpl32, ycf10 and ycf66 found in H. lucidula were not present in S. uncinata. Furthermore, rps12, rpl21, accD and psaM seem to have suffered frame shift mutations (Fig. 3). These facts further support the notion that extensive rearrangements have occurred in the S. uncinata lineage. When compared with other chloroplast genomes completely sequenced so far in land plants, the following may be noted: (1) the lack of rps12 in S. uncinata is the first reported case; (2) ycf10 and rpl32 are present in land plants except for a nonphotosynthetic plant, E. virginiana; (3) rpl21 is present in bryophytes, H. lucidula, and two monilophytes (A. capillus-veneris and P. nudum), but not in gymnosperms and angiosperms; (4) accD was lost in monocots; (5) psaM has been found in bryophytes, a monilophyte, P. nudum, and pines, P. thunbergii and P. kraiensis, but is absent from another monilophyte, A. capillus-veneris and angiosperms; (6) rps16 was lost independently in the chloroplast genomes of a moss P. patens, a liverwort, M. polymorpha, P. nudum and pines; and (7) rps15 is absent from a hornwort, A. formosae, and E. virginiana. These facts suggest that parallel losses of particular genes have occurred during the evolution of land plant chloroplast genomes.
In conclusion, structural changes such as inversions, transpositions, gene duplications, and losses in the S. uncinata chloroplast genome contrasts remarkably with that of H. lucidula which has the most conserved genome organization in the bryophytes. From fossil data, the Lycopodiaceae (Huperzia) and the Selaginellaceae were divided in the Carboniferous period more than 300 million years ago (Kenrick and Crane 1997). The complete chloroplast genome sequence of the Isoetaceae, another lycophyte lineage, remains to be determined.
Chloroplast transcripts of land plants are subjected to RNA editing. Such RNA editing occurs at a small, limited number (25–31) of sites in the chloroplasts of seed plants as reported in black pine (Wakasugi et al. 1996), maize (Maier et al. 1995), tobacco (Hirose and Sugiura 2001), and Arabidopsis thaliana (Tillich et al. 2005). On the other hand, many editing sites have been found in a fern, Adiantum capillus-veneris (349 sites) (Wolf et al. 2005b), and in a hornwort, Anthoceros formosae (942 sites) (Kugita et al. 2003b). In all of these cases, RNA editing is of the base substitution type (C to U and U to C). In the case of S. uncinata, 54 and 111 RNA editing sites were found in rbcL and atpB transcripts, respectively, and all of them were a C to U conversion (Tsuji and Ueda, unpublished data) as in the chloroplasts of seed plants, while considerably lower levels of RNA editing were suggested to occur in H. lucidula (Lycopodiaceae) (Wolf et al. 2005a) than in A. capillus-veneris or A. formosae. It appears that RNA editing in S. uncinata is much more frequent than in A. formosae (20 sites in rbcL and 29 sites in atpB) and is likely the most enormous RNA editing event ever reported in land plants. Furthermore, conserved amino acid sequences including the starting amino acid, fMet, in land plants are largely replaced in putative proteins encoded by the S. uncinata chloroplast genome. Such an extraordinarily high frequency of RNA editing led us to assume that the RNA editing bias to U from C might be one of the reasons for the GC content (55%) of the genome, which is the highest among the chloroplast genome completely sequenced to date. Genome-wide cDNA analysis is worthy of further study.
We are very grateful to Prof. M. Takamiya for cytogenetical observations of S. uncinata. We also thank T. Hoshino for drawing the chromosome map and H. Mizuno for technical assistance with DNA sequencing.