Journal of Plant Research

, Volume 120, Issue 2, pp 281–290

The chloroplast genome from a lycophyte (microphyllophyte), Selaginella uncinata, has a unique inversion, transpositions and many gene losses

Authors

  • Sumika Tsuji
    • Division of Functional Genomics, Advanced Science Research CenterKanazawa University
  • Kunihiko Ueda
    • Division of Life Science, Graduate School of Natural Science and TechnologyKanazawa University
  • Tomoaki Nishiyama
    • Division of Functional Genomics, Advanced Science Research CenterKanazawa University
  • Mitsuyasu Hasebe
    • Division of Evolutionary BiologyNational Institute of Basic Biology
    • Department of Basic BiologyThe Graduate University of Advanced Studies SOKENDAI
  • Sumi Yoshikawa
    • Advanced Genome Information Technology Research GroupRIKEN GSC
    • Department of Computer Science, Graduate School of Information Science and EngineeringTokyo Institute of Technology
  • Akihiko Konagaya
    • Advanced Genome Information Technology Research GroupRIKEN GSC
    • Department of Computer Science, Graduate School of Information Science and EngineeringTokyo Institute of Technology
  • Takumi Nishiuchi
    • Division of Functional Genomics, Advanced Science Research CenterKanazawa University
    • Division of Life Science, Graduate School of Natural Science and TechnologyKanazawa University
    • Division of Functional Genomics, Advanced Science Research CenterKanazawa University
    • Division of Life Science, Graduate School of Natural Science and TechnologyKanazawa University
Regular Paper

DOI: 10.1007/s10265-006-0055-y

Cite this article as:
Tsuji, S., Ueda, K., Nishiyama, T. et al. J Plant Res (2007) 120: 281. doi:10.1007/s10265-006-0055-y

Abstract

We determined the complete nucleotide sequence of the chloroplast genome of Selaginella uncinata, a lycophyte belonging to the basal lineage of the vascular plants. The circular double-stranded DNA is 144,170 bp, with an inverted repeat of 25,578 bp separated by a large single copy region (LSC) of 77,706 bp and a small single copy region (SSC) of 40,886 bp. We assigned 81 protein-coding genes including four pseudogenes, four rRNA genes and only 12 tRNA genes. Four genes, rps15, rps16, rpl32 and ycf10, found in most chloroplast genomes in land plants were not present in S. uncinata. While gene order and arrangement of the chloroplast genome of another lycophyte, Hupertzia lucidula, are almost the same as those of bryophytes, those of S. uncinata differ considerably from the typical structure of bryophytes with respect to the presence of a unique 20 kb inversion within the LSC, transposition of two segments from the LSC to the SSC and many gene losses. Thus, the organization of the S. uncinata chloroplast genome provides a new insight into the evolution of lycophytes, which were separated from euphyllophytes approximately 400 million years ago.

Keywords

ChloroplastLycophytePseudogeneSelaginella uncinataSelaginellaceaetRNA genes

Introduction

Chloroplasts contain their own genomes mainly coding for proteins involved in the management of photosynthesis. The first complete nucleotide sequences of chloroplast genomes were reported in tobacco (Shinozaki et al. 1986) and liverwort (Ohyama et al. 1986). Since then, a variety of additional genomes have been completely sequenced (http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/plastids.html, http://www.chloroplast.cbio.psu.edu/). Genome organization is highly conserved within land plants (Palmer 1991; Sugiura 1992; Raubeson and Jansen 2005). Their size is mostly conserved at 120–160 kb with some exceptions, and they contain approximately 120 genes in general. The chloroplast genomes of most land plants are partitioned into a quadripartite architecture with a large inverted repeat (IR) separated by a large and a small single copy region (LSC and SSC, respectively). Although in most lineages gene order and arrangement remain unchanged, several lineages have highly rearranged chloroplast genomes, such as the almost-complete loss of the IR in conifers (Wakasugi et al. 1994), two inversions of the IR clarifying basal nodes in leptosporangiate ferns (Hasebe and Iwatsuki 1992; Stein et al. 1992), a 28 kb inversion restricted to the Poaceae and three other monocotyledonous families (Hiratsuka et al. 1989; Doyle et al. 1992), and so on (Raubeson and Jansen 2005). Thus, the divergence of gene arrangements provides insight into the mechanisms of genome rearrangements and is useful for inferring phylogenetic relationships.

Extant vascular plants evolved from ancestral vascular-free land plants in the early Silurian, approximately 430 million years ago (Kenrick and Crane 1997), and acquired ecological dominance resulting in great diversification. Vascular plant megafossils from the Silurian lack roots and leaves which are shared in extant vascular plants. Instead, they had bifurcatedly branched stem-like organs in both aerial and terrestrial parts. Leaf-like lateral organs subsequently evolved into at least two morphologically different extant vascular plant lineages (Gifford and Foster 1989). One is microphyllous leaves in lycophytes (microphyllophytes) and the other megaphyllous leaves in euphyllophytes, which includes monilophytes, gymnosperms, and angiosperms. Paleobotanical records and phylogenetic analyses based on both morphological and molecular data indicated that these two lineages diverged in the Silurian (Kenrick and Crane 1997). Although lycophytes were dominant in the land flora of the Carboniferous, only three lineages, including a small number of species in the Selaginellaceae, Lycopodiacea, and Isoetaceae, are extant in the present land flora.

Compared to a relatively rich knowledge of bryophyte and euphyllophyte chloroplst DNAs, little is known about lycophyte chloroplast DNAs. The completely sequenced genome of Hupertzia lucidula (Wolf et al. 2005a), a member of the Lycopodiaceae, contains mostly similar genes in a similar arrangement to bryophytes, a liverwort Marchantia polymorpha (Ohyama et al. 1986), a moss Physcomitrella patens (Sugiura et al. 2003), and a hornwort Anthoceros formosae (Kugita et al. 2003a). These genomes share a 30 kb inversion including the ycf2-rpoB region in the LSC, unlike euphyllophyte chloroplst genomes previously reported (Palmer and Stein 1986; Raubeson and Jansen 1992). This genome arrangement, together with the gene content and order shared in the bryophytes and H. lucidula, is plesiomorphic in land plants. However, chloroplast genomes in two other lycophyte lineages, which diverged in the Carboniferous approximately 300 million years ago, have not been reported and it is not clear whether the genome features observed in H. lucidula are common to the three lycophyte families.

In this study, we determined the complete nucleotide sequence of the chloroplast genome from Selaginella uncinata (Selaginellaceae). Its genome structure is quite different from that of H. lucidula with respect to the presence of a 20 kb inversion, transpositions, and many gene losses.

Materials and methods

Selaginella uncinata was collected from a population at Ishinomaki-hagihara in Toyohashi City, Aichi, Japan, in 2002. Previous karyomorphological observations revealed that S. uncinata in Japan had two non-homologous, structurally changed chromosomes in addition to eight pairs of homologous chromosomes, and that microspores were irregular in shape (Takamiya 1993). The present population is likely maintained by vegetative propagation, because no fertile microspores are produced (M. Takamiya, personal communication).

Leaves were powdered in liquid nitrogen, and DNA was extracted from chloroplasts isolated by sucrose gradient centrifugations (Saltz and Beckman 1981). The chloroplast DNA was directly shotgun sequenced by the dideoxy chain termination method (BigDye Terminator Cycle Sequencing Kit, PE Applied Biosystems) using an ABI 3100 Genetic Analyzer. Alternatively, the extracted DNA was digested with KpnI, SalI or EcoRI restriction enzymes, and the digested fragments were cloned into pUC19. Inserted DNA segments were shotgun sequenced, and were identified as chloroplast DNA based on their sequence similarity to other chloroplast genomes. The remaining gap regions, after having assembled the DNA sequences obtained from both methods, were amplified by PCR and resulting products were directly sequenced or shotgun sequenced. DNA fragments that were shotgun sequenced were confirmed that all DNA segments were entirely connected as follows: when an overlapping region of two DNA segments was less than 100 bp in size, a bridge DNA fragment spanning both DNA segments was amplified by PCR using S. uncinata DNA as a template and was directly sequenced. Furthermore, we also verified the order of genes of the S. uncinata chloroplast genome that differed from those of H. lucidula by direct sequencing of re-amplified S. uncinata DNA. A table of primers used for genome DNA sequencing is available as supplementary Table S1.

The sequences were assembled using a Phred/Phrap function (Ewing et al. 1998; Ewing and Green 1998) in DNASIS Pro (Hitachi Software Engineering). Gene annotations and comparative genome analyses were performed using the current versions of various BLAST (BLASTN, PHI-BLAST, BLASTX) (Altschul et al. 1997). The locations and secondary structures of tRNA genes were evaluated using the tRNAscan-SE (version 1.21) program (Lowe and Eddy 1997) and Aragorn (http://www.eddie.thep.lu.se/aragorn/aragorn.html).

Complete chloroplast genome sequences from 35 land plants and their accession numbers used for comparison with the gene content of the S. uncinata chloroplast genome are shown in the online supplementary Table S2.

Results

Genome structure

The chloroplast genome of S. uncinata (accession number; AB197035) was represented as a circle (Fig. 1) (Bendich 2004). A large single copy (LSC) region of 77,706 bp and a small single copy (SSC) region of 40,886 bp are separated by an inverted repeat (IRa and IRb) consisting of 25,578 bp (12,789 bp each). The total genome size of 144,170 bp is average among the land plant chloroplast genomes (117–164 kb) sequenced to date. Nucleotide numbering followed previously published chloroplast genomes by starting at the beginning of the LSC adjoining IRa. The major portion (55%) of the genome consists of gene coding regions (48% protein coding and 7% RNA region), whereas intergenic spacers (including introns) comprise 45% (6% intron and 39% spacer). However, precise gene coding regions require transcript analyses, since the start and stop codons in many genes are likely created by RNA editing, as described below.
https://static-content.springer.com/image/art%3A10.1007%2Fs10265-006-0055-y/MediaObjects/10265_2006_55_Fig1_HTML.gif
Fig. 1

Gene map of the chloroplast genome of Selaginella uncinata. The large single copy region (LSC) is separated from the small single copy region (SSC) by the inverted repeat (IRa and IRb) illustrated as thick arrows on the inside circle. Genes drawn inside the circle are transcribed clockwise, those outside the circle are transcribed counterclockwise. Asterisks indicate genes that contain introns. Pseudogenes are marked by ψ. Figures inside the inner circle mean nucleotide positions that start at the beginning of the LSC adjoining IRa

The genome has an overall 55% G–C content, which is the highest among the chloroplast genomes that have been completely sequenced so far [from 20% in Eimeria tenella (alveolata) (Cai et al. 2003) to 42% in Adiantum capillus-veneris (embryophyta) (Wolf et al. 2003) and Nephroselmis olivacea (chlorophyta) (Turmel et al. 1999)]. The G–C content in the coding regions (55%) is similar to that in the non-coding intergenic regions. The G–C content of the IR regions (57%) containing four rRNA (58%) genes is slightly higher than that of the LSC (54%) and the SSC (54%). Thus, an unusually high G–C content is not due to the presence of a particular G–C rich region. A G–C bias of the rRNA genes is similar to that found in other land plants since an exceptionally slight G–C bias is characteristic of rRNA genes in other land plant chloroplast genomes.

Another unique feature of the overall structure is the relatively long SSC (28.4% of the total length)—caused by the transposition of some segments from the LSC to the SSC as described below—compared with other pteridophyte species, Huperzia lucidula (12.7%) (Wolf et al. 2005a), A. capillus-veneris (14.2%) (Wolf et al. 2003), and Psilotum nudum (11.7%).

Gene content

All genes tentatively identified in the S. uncinata chloroplast genome are presented in Fig. 1 and Table 1. We could annotate 4 rRNA genes in the IR and only 12 tRNA genes in which 3 are duplicated: two genes are in the IR and another, trnQ-UUG, is present in both the LSC and the SSC. No intron-containing tRNA genes (trnA, trnG, trnI, trnK, trnL, trnT and trnV) commonly found in other chloroplast genomes were detected despite additional searches using tRNAscan-SE and Aragorn. Furthermore, trn genes coding for tRNASer and tRNAPro were also not found. However, a Harr plot analysis of regions where these trn genes are located in other chloroplast genomes revealed that similar sequences to some trn genes, including introns and intergenic regions, were present in expected loci. Figure 2 shows a representative Harr plot analysis of intergenic regions between rrn16 and rrn23 in IRs of S. uncinata and H. lucidula. We found that the H. lucidula genome had trnI-GAU in addition to trnA-UGC, as in other land plants chloroplast genomes, following a Harr plot analysis of the same regions from H. lucidula and P. nudum (data not shown). The S. uncinata genome retains similar sequences to both the first exon and a 5′ part of the trnI intron, but another part of the intron and the second exon of trnI and the whole trnA gene are missing. Since the putative first exon of trnI has a large insertion neighboring a putative anticodon, GAT, and since none of the second exon-like sequence of trnI was found in the S. uncinata chloroplast genome, this trnI would be nonfunctional. Similar “vestiges” of trn genes were found in the first exon and a 5′ part of the intron of trnV between ndhC and trnM, trnP between trnW and psaJ, trnT between trnE and psbD, trnG between psbZ and trnfM, and trnS between ycf3 and rps4.
Table 1

List of genes found in Selaginella uncinata chloroplast genome

Group of gene

Name of gene

RNA genes

 Ribosomal RNAs

rrn16 ×2

rrn23 ×2

rrn5 ×2

rrn4.5 ×2

 Transfer RNAs

trnC-GCA

trnD-GUC

trnE-UUC

trnF-GAA

trnH-GUG

trnfM-CAU

trnM-CAU

trnN-GUU ×2

trnQ-UUG ×2

trnR-ACG ×2

trnW-CCA

trnY-GUA

Protein genes

 Photosynthesis

  Photosystem I

psaA

psaB

psaC

psaI

psaJ

psaM (ψ)

  Photosystem II

psbA

psbB

psbC

psbD

psbE

psbF

psbH

psbI

psbJ

psbK ×2

psbL

psbM

psbN

psbT

psbZ

  Cytochrome

petA

petB

petDa

petG

petL

petN

  ATP synthase

atpA

atpB

atpE

atpFa

atpH

atpI

  Chlorophyll biosynthesis

chlB

chlL

chlN

  Rubisco

rbcL

  NADH dehydrogenease

ndhAa

ndhBa

ndhC

ndhD

ndhE

ndhF

ndhG

ndhH

ndhI

ndhJ

ndhK

 Ribosomal proteins

  Large subunits

rpl2a

rpl14

rpl16a

rpl20

rpl21 (ψ)

rpl22

rpl23 ×2

ORF87(rpl33)

rpl36

  Small subunits

rps2

rps3

rps4 ×2

rps7

rps8

rps11

rps12 (ψ)

rps14

rps18

rps19

 Transcripition/translation

  RNA polymerase

rpoA(ORF246)

rpoB

rpoC1a

rpoC2

  Initiation factor

infA

 Miscellaneous proteins

accD (ψ)

ccsA

ClpPa

matK

 Hypothetical proteins

ycf1

ycf2

ycf3a

ycf4

ycf12

×2 denotes duplicated genes

aGenes containing introns

ψ denotes pseudogene

https://static-content.springer.com/image/art%3A10.1007%2Fs10265-006-0055-y/MediaObjects/10265_2006_55_Fig2_HTML.gif
Fig. 2

a Harr plot analysis and b sequence comparison of the intergenic regions between rrn16 and rrn23 in the chloroplast genomes from S. uncinata and H. lucidula. a The 3′ end of rrn16, trnI, trnA and the 5′ end of rrn23 of the H. lucidula chloroplast genome, and the 3′ end of rrn16 and 5′ end of rrn23 of the S. uncinata chloroplast genome are presented along a vertical line and a horizontal line, respectively. Closed regions mean exons of trnI and trnA. A Harr plot analysis was performed with ten bases for the check size and eight bases for the matching size. b Alignment of the sequences within the inner square. Identical nucleotides are shaded and the 3′ ends of rrn16 from both genomes and the first exon and anticodon (GAT) of trnI from the H. lucidula chloroplast genome are shown above the sequences

A total of 81 potential protein-coding genes, 3 of which are duplicated, were identified on the genome. However, 4 (rps12, rpl21, psaM and accD) of these genes seem to carry frame shift mutations and internal stop codons. These putative genes would be pseudogenes even if their transcripts are corrected by a base-substitution type of RNA editing. Figure 3 presents the alignment of amino acid sequences encoded by the rps12 genes from four species, S. uncinata, H. lucidula, Adiuntum capillus-veneris, and Anthoceros formosae. mRNA of rps12 is matured by a trans-splicing mechanism (Zaita et al. 1987) and its amino acid sequence is well conserved among land plants (Kim and Lee 2004). Comparison of DNA sequences revealed five relatively conserved regions in S. uncinata rps12, two regions in the 5′ part of rps12 (the first exon) and three in the 3′ part of rps12 (the second exon) (data not shown). However, amino acid sequences conserved among four Rps12 proteins are encoded by different codon frames: the first three regions are encoded by the first frame (Se-1st in Fig. 3), the fourth by the second frame (Se-2nd) and the fifth by the third frame (Se-3rd).
https://static-content.springer.com/image/art%3A10.1007%2Fs10265-006-0055-y/MediaObjects/10265_2006_55_Fig3_HTML.gif
Fig. 3

Amino acid sequences deduced from the putative rps12 gene of S. uncinata. In order from the top, the amino acid sequences from A. capillus-veneris (Ad), A. formosae (An), H. lucidula (Hu), and the first frame (Se-1st), the second frame (Se-2nd) and the third frame (Se-3rd) of S. uncinata are presented. Identical amino acids to those in H. lucidula are denoted as dots. Hyphens and asterisks mean gaps and stop codons, respectively. Regions numbered 15 are those highly conserved between DNA sequences from H. lucidula and S. uncinata Amino acid sequences conserved among four species are shadowed. The arrow indicates a trans-splicing junction of the rps12 transcript

An internal stop codon is found in the putative rpoA gene resulting in the coding of a smaller protein (245 amino acid residues) than RpoA proteins (about 340 amino acid residues) in other species unless the stop codon is corrected by RNA editing. While an amino acid sequence deduced from this open reading frame is apparently similar to the one conserved among other RpoA proteins, an amino acid sequence behind the stop codon is quite different. We doubt whether this putative RpoA protein is functional.

In addition to these pseudogenes, five genes, rpl32, rps15, rps16, ycf10 (cemA) and ycf66, which are present in another lycophyte, H. lucidula (Wolf et al. 2005a), were not found in the S. uncinata chloroplast genome. Most of the matK genes in land plants are present in the intron of trnK-UUU sequences. However, trnK-UUU is not present in S. uncinata as in A. capillus-veneris (Wolf et al. 2003).

A unique feature of the S. uncinata chloroplast genome is that more than 50 putative protein-coding genes do not have their canonical start codons and/or stop codons at the expected positions based on alignments with corresponding genes in the chloroplast genomes of other taxa. A triplet ACG, which is changed into a start codon by C to U RNA editing, exists near the position of the expected start codon. Furthermore, conserved amino acid sequences in land plants are largely disturbed in putative proteins encoded by the S. uncinata chloroplast genome. These observations suggest that a large amount of RNA editing, by which nucleotide sequences of RNA transcripts are post-transcriptionally altered, occurs in the S. uncinata chloroplast. Since cDNA analyses of putative protein-coding genes are in progress, precise RNA editing sites will be published later on.

The psbK and trnQ (UUG) genes are duplicated in the S. uncinata chloroplast genome. One is in the LSC and another in the SSC, and they are separated by 5.6 kb with the same orientation. The duplicated sequence spans 2.7 kb and contains trnQ (UUG) in the upstream region from psbK and the 5′ part of chlL. Gene duplications in a single-copy region have also been reported for psaM (Wakasugi et al. 1994) and psbA (Lidholm et al. 1991) in Pinus species.

Gene order

The gene order of the chloroplast genome of H. lucidula (Wolf et al. 2005a), a lycophyte like S. uncinata, is almost the same as that of the bryophytes (Fig. 4c). However, that of the S. uncinata chloroplast genome (Fig. 4a) is considerably different from the typical structure of bryophytes. A notable difference is a 20-kb inversion of a region from psbI (gene 12 in Fig. 4) to rpoB-trnC (gene 11). Another feature of the gene order in S. uncinata is a transposition from the LSC to the SSC of a 17-kb segment spanning trnD (gene 20) to rps4 (gene 22), resulting in the expansion of the SSC. Probably after the transposition, IR extended into rps4. The expansion of IR also occurred at the junction of LSC-IRb, resulting in the presence of an orphan exon 1 of rpl2 (gene 1) at the end of IRa. A similar transposition of a 13-kb segment spanning psbD to rps4 from the LSC to the SSC was found in pines (Wakasugi et al. 1994).
https://static-content.springer.com/image/art%3A10.1007%2Fs10265-006-0055-y/MediaObjects/10265_2006_55_Fig4_HTML.gif
Fig. 4

Model for the structural rearrangement to produce the gene order in the S. uncinata chloroplast genome. From the top, gene orders of a S. uncinata, b an ancestor of S. uncinata, c a prototype of Bryophytes and H. lucidula, d P. nudum and an ancestor of Euphyllophytes, and e tobacco are depicted. Bold lines under the gene orders mean the inverted repeat, IRa and IRb, and a dotted bold line means that IR regions are changeable in species. Numbered genes are referred to on the right of the figure. Short arrows under the genes 10, 13 and 14 indicate the direction of transcription

Discussion

Genome organization

While organization of the chloroplast genome from a lycophyte, H. lucidula, is mostly similar to that of bryophytes (Fig. 4c), the S. uncinata chloroplast genome has a unique 20-kb inversion from psbI (gene 12) to trnC (gene 11) in the LSC (Fig. 4a). The region of this inversion is included in that of the 30-kb inversion from ycf2 (gene 19) to psbM (gene 9) that commonly occurs in euphyllophytes, including higher ferns and seed plants as shown in Fig. 4d (Palmer and Stein 1986; Raubeson and Jansen 1992). The resulting gene order is well conserved in the whisk fern, P. nudum. The difference between these two inversions is that both termini [psbM (gene 9) and the region from ycf2 (gene 19) to trnK (gene 17)] of the 30-kb inversion remain in their original positions in S. uncinata as in bryophytes and H. lucidula. Furthermore, considering the gene order in the S. uncinata genome, it should be noted that petN (gene 10) is present in the SSC with trnQ-psbK (genes 14-13). These genes are located in the LSC, but petN is more than 20 kb apart from trnQ-psbK in a hornwort, A. formosae, and in H. lucidula. The direction of petN (gene 10) transcription is the same as that of psbK (gene 13), but opposite to that of trnQ (genes 14) in A. formosae and H. lucidula (Fig. 4c,d). This situation is opposite to that in S. uncinata (Fig. 4a). Therefore, a 20-kb segment inversion from trnQ, and not psbI, to rpoB-trnC (gene 11) seems likely to be the first mutational event so that petN (gene 10) becomes fused to trnQ-psbK as shown in Fig. 4b. The resultant petN-trnQ-psbK (genes 10-14-13) was followed by transposition to the SSC. This transposition might have broken the putative psbKI operon. Since the boundaries of most gene rearrangements in the chloroplast genomes of land plants lie between operons and not within them (Palmer 1991), this transposition represents a rare gene rearrangement. Furthermore, a 2.7-kb trnQ-psbK region was probably duplicated and transferred to the position neighboring chlB (gene 15) in the LSC, since the trnQ-psbK segment in that region contains the 5′ part of chlL (gene 6) that was originally located in the SSC.

In a previous study (Raubeson and Jansen 1992) in which endpoints of the 30-kb inversion in various land plants were determined, single restriction fragments hybridized with both a DNA probe containing psbA (gene 18) (inside the inversion) and a probe containing psbC (between gene 20 and 21; outside the inversion) in Lycopodium and Isoetes, both lycophytes, but not in S. uncinata. Transposition of a 17-kb trnD-rps4 (genes 20-22) segment containing psbC from the LSC to the SSC would explain why such a single restriction fragment was not found in Selaginella. And it would suggest that extensive rearrangements have occurred in the Selaginella chloroplast genome after separation from the Isoetaceae, which is sister to the Selaginellaceae in the lycophytes (Pryer et al. 2001). Transposition of the 17-kb region may be autoapomorphic in the Selaginellaceae among lycophytes.

Gene losses

We found a total of 15 tRNA-coding genes corresponding to only 12 amino acids (including fMet) (Table 1). This is in remarkable contrast with the presence of 31 tRNA genes in H. lucidula (Wolf et al. 2005a) (although 29 tRNA genes were originally documented, trnG-GCC was also found between trnR-UCU and ycf12 in addition to trnI-GAU as described above). No intron-containing tRNA genes were detected. It has been reported that some chloroplast genomes lack some tRNA genes, like trnK and trnR in A. capillus-veneris (Wolf et al. 2003) and Lotus japonicus (Kato et al. 2000), respectively. One possible explanation is that post-transcriptional editing could alter an anticodon of a different tRNA to create a missing tRNA (Wolf et al. 2003), but it would be unlikely in the case of the S. uncinata chloroplast genome lacking tRNA genes for nine kinds of amino acids. These missing tRNA genes might be located on the S. uncinata nuclear genome. Since plant mitochondria DNA only encodes some of the tRNAs required for protein synthesis within the mitochondrion, the missing tRNA must be imported (Maréchal-Drouard et al. 1993; Glover et al. 2001; Delage et al. 2003). A similar import system of tRNA for chloroplasts might exist in S. uncinata as was speculated for a nonphotosynthetic parasistic angiosperm, Epifagus virginiana (Morden et al. 1991), in which many tRNA genes were lost or retained as pseudogenes. An alternative explanation is that a high frequency of RNA editing occurs in the process of tRNA maturation. If indeed this is the case, present software for searching tRNA might have difficulty finding tRNA genes, especially intron-containing tRNA genes, since each exon split by an intron contains only 30–40 bp. However, we found similar sequences to some tRNA genes including introns in loci expected from the gene orders in other taxa (Fig. 2). Transfer to nuclear genomes is likely to have given rise to the presence of such “vestigial” tRNA genes. Since a whole genome shotgun sequencing project of Selaginella moellendorfii is in progress (http://www.ucjeps.berkeley.edu/TreeofLife/data_table.php), the sequences of its nuclear and chloroplast genomes might provide us with supplementary information on tRNA genes for protein synthesis inside chloroplasts.

A comparison with H. lucidula revealed that rps15, rps16, rpl32, ycf10 and ycf66 found in H. lucidula were not present in S. uncinata. Furthermore, rps12, rpl21, accD and psaM seem to have suffered frame shift mutations (Fig. 3). These facts further support the notion that extensive rearrangements have occurred in the S. uncinata lineage. When compared with other chloroplast genomes completely sequenced so far in land plants, the following may be noted: (1) the lack of rps12 in S. uncinata is the first reported case; (2) ycf10 and rpl32 are present in land plants except for a nonphotosynthetic plant, E. virginiana; (3) rpl21 is present in bryophytes, H. lucidula, and two monilophytes (A. capillus-veneris and P. nudum), but not in gymnosperms and angiosperms; (4) accD was lost in monocots; (5) psaM has been found in bryophytes, a monilophyte, P. nudum, and pines, P. thunbergii and P. kraiensis, but is absent from another monilophyte, A. capillus-veneris and angiosperms; (6) rps16 was lost independently in the chloroplast genomes of a moss P. patens, a liverwort, M. polymorpha, P. nudum and pines; and (7) rps15 is absent from a hornwort, A. formosae, and E. virginiana. These facts suggest that parallel losses of particular genes have occurred during the evolution of land plant chloroplast genomes.

In conclusion, structural changes such as inversions, transpositions, gene duplications, and losses in the S. uncinata chloroplast genome contrasts remarkably with that of H. lucidula which has the most conserved genome organization in the bryophytes. From fossil data, the Lycopodiaceae (Huperzia) and the Selaginellaceae were divided in the Carboniferous period more than 300 million years ago (Kenrick and Crane 1997). The complete chloroplast genome sequence of the Isoetaceae, another lycophyte lineage, remains to be determined.

RNA editing

Chloroplast transcripts of land plants are subjected to RNA editing. Such RNA editing occurs at a small, limited number (25–31) of sites in the chloroplasts of seed plants as reported in black pine (Wakasugi et al. 1996), maize (Maier et al. 1995), tobacco (Hirose and Sugiura 2001), and Arabidopsis thaliana (Tillich et al. 2005). On the other hand, many editing sites have been found in a fern, Adiantum capillus-veneris (349 sites) (Wolf et al. 2005b), and in a hornwort, Anthoceros formosae (942 sites) (Kugita et al. 2003b). In all of these cases, RNA editing is of the base substitution type (C to U and U to C). In the case of S. uncinata, 54 and 111 RNA editing sites were found in rbcL and atpB transcripts, respectively, and all of them were a C to U conversion (Tsuji and Ueda, unpublished data) as in the chloroplasts of seed plants, while considerably lower levels of RNA editing were suggested to occur in H. lucidula (Lycopodiaceae) (Wolf et al. 2005a) than in A. capillus-veneris or A. formosae. It appears that RNA editing in S. uncinata is much more frequent than in A. formosae (20 sites in rbcL and 29 sites in atpB) and is likely the most enormous RNA editing event ever reported in land plants. Furthermore, conserved amino acid sequences including the starting amino acid, fMet, in land plants are largely replaced in putative proteins encoded by the S. uncinata chloroplast genome. Such an extraordinarily high frequency of RNA editing led us to assume that the RNA editing bias to U from C might be one of the reasons for the GC content (55%) of the genome, which is the highest among the chloroplast genome completely sequenced to date. Genome-wide cDNA analysis is worthy of further study.

Acknowledgments

We are very grateful to Prof. M. Takamiya for cytogenetical observations of S. uncinata. We also thank T. Hoshino for drawing the chromosome map and H. Mizuno for technical assistance with DNA sequencing.

Supplementary material

Copyright information

© The Botanical Society of Japan and Springer 2007