Introduction

Plastids are one of the main distinguishing characteristics of the plant cell. The central function of the plastid is to carry out photosynthesis, but other major cellular functions also take place in plastids, including synthesis of starch, fatty acids, pigments and amino acids (reviewed by Neuhaus and Emes 2010). As early as 1905, Konstantin S. Mereschkowski hypothesized that plant “chromatophores” are the result of the uptake of a cyanobacterium by a eukaryotic organism (English translation available by Martin and Kowallik 1999). It is now generally accepted that the plastid originated via incorporation of a free-living cyanobacterial-like prokaryote into a eukaryotic cell (primary endosymbiosis), thereby enabling the transition from heterotrophy to autotrophy by gaining the ability of utilizing photoenergy. Recent phylogenetic analyses of plastid genes from major plant lineages have converged on the hypothesis that plastids of the plant kingdom, i.e. the clade including Glaucophytes, Rhodophytes, Chlorophytes, and Streptophytes (Fig. 1; Keeling 2004), are derived from a single origin (Palmer 2000; McFadden and van Dooren 2004; Keeling 2010). This is also supported by several biochemical features, such as the composition of light harvesting complexes and their components, structural RNAs, membrane structure, and the protein import/targeting machinery (Weeden 1981; Bölter et al. 1998; Keeling 2004; Yang and Cheng 2004; Koziol et al. 2007; Vesteg et al. 2009).

Fig. 1
figure 1

Evolution of plastid gene content in land plants. Events of gene losses in Embryophytes, as well as gains and duplication of protein coding genes in green plant lineages are depicted along the branches/nodes of the Plant Tree of Life (Palmer et al. 2004; Qiu et al. 2006; Zhong et al. 2010). The putatively ancestral gene content, as reflected in Marchantia and derived from parsimony analysis after Maul et al. (2002), is given at the first land plant node. Gene losses during the evolution of land plants are indicated by red arrows (those occurring before the emergence of Embryophytes are not considered here); a green arrow indicates the evolution of a novel gene prior to the transition to land; blue arrows refer to gene duplications. Changes in the content of transfer RNAs are not considered here (refer to Gao et al. 2010 for review). A detailed summary of gene losses during the evolution of angiosperms is provided by Jansen et al. (2007) and Magee et al. (2010). Although chl-subunits are still present in some gymnosperm plastomes, multiple losses and pseudogenizations indicate a functional transfer to the nuclear genome. As chl genes have been lost entirely from angiosperm plastomes, functional chl-gene transfer might have already occurred in a common ancestor

Over evolutionary time, genetic information was functionally or more often non-functionally transferred from the endosymbiont’s genetic system to the host nuclear genome, genetically intertwining the two genomes. Except for genes involved in photometabolic processes, most other genes have been incorporated into the nuclear genome. This has resulted in a highly reduced plastid genome in Streptophytes (land plants plus their closest algal relatives), comprising less than 5–10% of the genes hypothesized for the ancestral cyanobacterial genome (ca. 2000 to 3000 genes; Martin et al. 2002). A corollary of this process is that the plastid genome (plastome) became subjected to nuclear regulation (Timmis et al. 2004), locking in their symbiotic relationship. The transfer of sequences and both functional and non-functional genes from the plastid genome to both the nuclear and the mitochondrial genome remains an ongoing process (Stern and Lonsdale 1982; Stern and Astwood 1986; Nakazono and Hira 1993; Albus et al. 2010, 1998; Shahmuradov et al. 2003; Matsuo et al. 2005; Guo et al. 2008; Sheppard and Timmis 2009). This intracellular gene transfer is considered “frequent and [to occur] in big chunks” (Martin 2003:1; Stegemann et al. 2003; Noutsos et al. 2005). The question of how many genes can eventually be transferred to the nuclear genome (and whether the plastome could eventually be lost) has been discussed for some time (Barbrook et al. 2006). Massive gene loss has been observed in several parasitic plants (e.g. Orobanchaceae: Wolfe et al. 1992; Cuscuta: Funk et al. 2007, McNeal et al. 2007). In these plants, gene loss is not restricted to genes that are primarily involved in photosynthesis and related pathways (Wolfe et al. 1992; Krause 2008); additional losses or pseudogenization is seen in genes encoding subunits of the genetic apparatus (e.g., plastid-encoded RNA polymerase, some tRNAs, some ribosomal proteins; dePamphilis and Palmer 1990; Wolfe et al. 1992; Lohan and Wolfe 1998).

Four decades of genetic, genomic and physiological research have contributed substantially to assign genes and gene functions to land plant plastid encoded proteins. Plastid genes have been grouped into functionally defined classes, including (i) those involved in primary and secondary photosynthesis pathways (photosynthetic light and dark reactions), (ii) genes not involved in photosynthetic pathways, such as sulfate transport and lipid acid synthesis, (iii) genes involved in transcription and translation, and (iv) a number of structural RNA genes (Palmer 1991; Sugiura 1992; Bock 2007). Subsequent studies have identified the roles of additional genes not falling into any of these genes classes, including genes involved in post-transcriptional modification (matK, Liere and Link 1995), protein turnover or protein complex assemblies (Peltier et al. 2004). Currently, only two genes remain, ycf1 and ycf2, whose metabolic or genetic roles have not yet been unambiguously defined (Bock 2007).

In this review, we will discuss functional and evolutionary insights from research on land plant plastid chromosomes, providing a synthesis of our knowledge of their evolution and conservation. Accordingly, particular emphasis will be placed on genetics of plastomes in the context of land plant diversification, with special attention to the roles of plastid-encoded proteins in photosynthesis and other principal genetic pathways.

Plastid genetics and synteny of land plant plastid chromosomes

Plastid inheritance

The transmission (inheritance) of plastids has been disputed for many years. For seed plants, mechanisms and occurrences of plastid inheritance have been studied in a great number of species (reviewed in Hagemann 2004; Bock 2007; Zhang and Sodmergen 2010). However, little is known about plastid transmission in earlier land plant lineages, probably due to methodological difficulties. Ultrastructural studies of functional sperm cells of bryophytes, lycophytes, horsetails and water ferns (heterosporous ferns) reported the presence of proplastids (reviewed in Sears 1980). In liverworts and mosses, the sperm cell’s proplastids are “discarded” before fertilization (Sears 1980, and references therein). Maternal plastid transmission was subsequently demonstrated for the liverwort Pellia (Pacak and Szweykowska-Kulińska 2002) and several moss representatives (Rhizomnium: Jankowiak et al. 2005; Sphagnum: Natcheva and Cronberg 2007; Plagiomnium: Jankowiak-Siuda et al. 2008). Maternal inheritance of plastids was shown for the horsetail Equisetum variegatum (Guillon and Raquin 2000), but nothing is known about the fate of the sperm cell’s proplastid. Most, though probably not all, plastid-like structures are lost from the spermatozoids of lycophytes, and it seems as if there was a strong bias towards predominantly maternal plastid transmission caused by degradation prior or immediately after fertilization (Sears 1980). The absence of a plastid-like structure in sperm cells was shown in representatives of leptosporangiate ferns (Pteridium: Bell et al. 1966; Thelypteris: Sears 1980). This suggested maternal plastid transmission, which was later confirmed using molecular biological methods for Cheilanthes (Gastony and Yatskievych 1992) and Asplenium (Vogel et al. 1998). In gymnosperms and angiosperms, uniparental inheritance is more frequent than biparental transmission (Hagemann 2004). Maternal inheritance is typical for angiosperms and the gymnosperm groups cycads and gnetophytes. In the majority of gymnosperms (conifers) paternal transmission is the dominant mode (Hagemann 2004; Zhang and Sodmergen 2010). However, biparental inheritance has evolved multiple times in seed plants, in particular in eudicot angiosperms such as Geraniaceae (e.g. Tilney-Bassett and Almouslem 1989), Campanulaceae (Corriveau and Coleman 1988) and Fabaceae (Corriveau and Coleman 1988). In gymnosperms, biparental inheritance is much less frequent (Hagemann 2004).

Architecture of plastid chromosomes

In vivo structure and molecular conformation of the plastid chromosome has long been thought to be exclusively circular. However, several studies employing in situ hybridization techniques demonstrated that often only a minor proportion of the molecules occur in a circular and covalently closed form. Instead, the majority of plastid chromosomes are arranged in concatemers of two or more molecules in either circularized or linear form (Deng et al. 1989; Bendich and Smith 1990; Bendich 1991, 2004; Harada et al. 1997; Lilly et al. 2001). It is still unknown how these concatemeric molecules are formed, and how linkage and breakage is carried out in vivo. It is speculated that the formation of these supermolecules might facilitate maintenance of gene organization and genome integrity (Day and Madesis 2007; Maréchal and Brisson 2010). However, the formation of supermolecules as a primary stabilizing factor needs to be evaluated carefully. Mitochondrial DNA forms concatemeric molecules as well, but exhibits a great variety of genome size and structure among land plants (Palmer and Herbon 1988; Bendich 2007).

The size of photosynthetic land plant plastid chromosomes ranges from 120 kb to 160 kb. The plastome in photosynthetic plants comprises 70 (gymnosperms) to 88 (liverworts) protein coding genes and 33 (most eudicots) to 35 (liverworts) structural RNA genes (Wakasugi et al. 1994; Ohyama 1996; Bock 2007), totaling 100–120 unique genes (Fig. 1). The vast majority of these genes are arranged in operons (or operon-like structures) and transcribed as polycistronic precursor molecules that are subjected to splicing and nucleolytic cleavage in order to produce mature and translatable mRNAs (Stern et al. 2010). Functional gene classes (translation/transcription, electron transfer, and photosystems) are often arranged in close vicinity to one another (Fig. 2; Cui et al. 2006). Using a parametric bootstrap-approach, Cui et al. (2006) showed that the genomic rearrangements of some chlorophytic algae (e.g. Chlamydomonas) relative to others are not random. Results indicated that the physical clustering of genes belonging to a similar functional class is positively selected. Furthermore, expression analysis indicated that some of these newly formed cluster are co-transcribed which led the authors to speculate that these could represent new regulons (Cui et al. 2006).

Fig. 2
figure 2

Synteny of land plant plastid chromosomes. The plastid chromosomes are shown in linearized form illustrating relative gene synteny. Genes are depicted by boxes colored according to their relevant functional class (see legend). Genes encoded by the leading strand (+ strand) or by the lagging strand (- strand) are shown above or below the grey chromosome bar, respectively. Lengths of boxes do not reflect lengths of genes, but are artificially increased to aid legibility (consequently, overlapping genes on ± strand do not indicate overlapping reading frames). Lines from selected genes/gene-regions mentioned above the first chromosome bar roughly indicate genes clusters that have been reorganizated during land plant evolution. Not all regions that underwent genomic relocations prior or during land plant evolution are depicted here. The chromosome bars are colored gray to highlight the positions of the two large Inverted Repeat regions (IRA/IRB) and are connected by gray lines between the different lineages. Gray lines are discontinued once to indicate loss of the large inverted repeat in Pinus. Drawn with GenomePixelizer (Kozik et al. 2002) using genome annotations deposited in public sequence databases. Refer to the text for genome references and original publications.]

The plastid chromosome displays a quadripartite structure, i.e. it is divided into four major segments (Fig. 2). Two of those contain only single copy (SC) genes and are referred to as Single Copy regions. The Large Single Copy region (LSC) harbors the majority of plastid genes; its smaller counterpart is known as the Small Single Copy region (SSC). The third segment is duplicated and exists in two nearly identical copies separating the SC regions (Kolodner and Tewari 1979). These copies are inverted and, therefore, termed large Inverted Repeats A and B (IRA, IRB). An IR is between 20 and 30 kb in size in angiosperms compared to only 10–15 kb in most non-seed plant lineages (Kolodner and Tewari 1979; Palmer 1991; Raubeson and Jansen 2005; Wu et al. 2009; Wolf et al. 2010a). However, several lineages deviate strongly from the average, such as Cycas (25 kb, Wu et al. 2007), the cypress Cryptomeria (114 bp, Hirao et al. 2008) or the eudicot Geraniaceae (Monsonia: 7 kb, Guisinger et al. 2010; Pelargonium: 76 kb, Chumley et al. 2006). As the IRs are essentially identical, one might describe the plastid genome structure also as tripartite (as in Bock 2007), since the IRs share molecular evolutionary patterns that clearly differ from those observed in the SC regions. This quadripartite (or tripartite) architecture is already present in algal lineages including the closest relatives of land plants (e.g. Chaetosphaeridium, Chara; Turmel et al. 2002, 2006), implying a pre-land plant origin for this important conserved structural feature.

The plastid chromosomes of charophyte algae, the closest relatives of land plants (Qiu et al. 2006), are larger than those of land plants. They contain several genes that have either been lost or functionally transferred to the nuclear genome in Embryophytes (Turmel et al. 1999; 2006). Parsimony analyses reconstructing unambiguous changes in gene content among plants revealed that the gene ycf1 was gained in a common ancestor of several green algae and land plants (Maul et al. 2002). The gain of an intron in the trnKUUU coding regions, including an intact open reading frame (ORF; matK), is shared by Charophytes and Embryophytes (Maul et al. 2002; Lewis and McCourt 2004; McNeal et al. 2009). Comparative analysis revealed that the plastome structure and gene content in Chaetosphaeridium, a unicellular freshwater charophyte alga, is most similar to that of early land plants (Turmel et al. 2002): Large blocks of co-linear groups of genes are already present in this genus. Yet, in order to obtain the structural organization of early land plant plastomes, several functional gene transfers to the nuclear genome (e.g. tufA, ftsH, odpB, rpl5), one gene gain (ycf2), and a minimum of eight inversions are necessary (Turmel et al. 2006; Gao et al. 2010). One of those inversions involves a region of the LSC approximately 30 kb in length (Raubeson and Jansen 1992). A huge inversion of the complete matK—atpA-I—rpoB-C1/2-region is shared between ferns and seed plants (Fig. 2), whereas liverworts (Ohyama et al. 1988; Wickett et al. 2008a), mosses (Sugiura et al. 2003; Oliver et al. 2010), hornworts (Kugita et al. 2003), and lycophytes (Wolf et al. 2005; Tsuji et al. 2007; Karol et al. 2010) show a more ancestral organization similar to that of Chaetosphaeridium (Quandt et al. 2003; Turmel et al. 2002). Generally, the presence of such rearrangements implies that additional transitional forms probably existed and might still be observable in lineages that have remained unstudied so far.

Synteny and structural rearrangements

Plastome rearrangements

Hotspots for structural rearrangements within plastid genomes include the IRs, which are frequently subject to expansion, contraction or even complete loss. Such changes occurred several times independently during the evolution of land plants and often are specific for single orders and families, sometimes even for just one or a few species within a genus (Downie and Bewley 1992; Goulding et al. 1996; Plunkett and Downie 2000; Daniell et al. 2006; Guisinger et al. 2010; Wolf et al. 2010a). Furthermore, extensive changes within the IRs appear to have an effect on the structural integrity of the entire plastid chromosome beyond the IRs and their immediate neighborhood. This is likely due to their role as putatively important players in the stabilization of the plastid chromosome via homologous recombination-induced repair mechanisms (Maréchal et al. 2009; Rowan et al. 2010; reviewed in detail by Maréchal and Brisson 2010).

Early branching gymnosperms (McCoy et al. 2008; Wu et al. 2009), angiosperms (Goremykin et al. 2003; Cai et al. 2006) and derived leptosporangiate ferns possess much larger IRs than the remaining land plant lineages (Wakasugi et al. 1998; Roper et al. 2007; Karol et al. 2010). Thus, large scale expansions of the IRs most likely occurred at least twice independently over the evolution of major land plant groups, including once in the common ancestor of seed plants. Additional large- (Guisinger et al. 2010) and small-scale (Goulding et al. 1996) expansions have occurred within angiosperms. As a result of the re-location into the IR, several previously SC genes became duplicated, including the largest plastid gene, ycf2 (Wolf et al. 2010a). A duplication of the ycf2 gene occurs independently in derived leptosporangiate ferns (tree and polypod ferns) and might be functionally relevant for plant development. In angiosperms, ycf2 expression is highest in fruits (Drescher et al. 2000), but comparable data for leptosporangiate ferns (or other land plant lineages) are lacking so far. Interestingly, plastome re-structuring in ferns is correlated with an expansion of the IR (Thompson et al. 1986; Stein et al. 1992; Raubeson and Stein 1995; Wolf et al. 2010a).

Contraction of the large inverted repeats involves only few (tens to hundreds of) base pairs up to and including complete IR loss. The positions of the LSC-IR junctions vary slightly within groups, but usually this has only negligible effects on plastome size (Goulding et al. 1996; Daniell et al. 2006; Wang et al. 2008). It has been suggested that such positional changes of IR-junctions among species are the result of gene conversion (Goulding et al. 1996). In several groups, one of the IR-region has been completely lost, for instance in several legumes (Palmer et al. 1987b; Cai et al. 2008; Jansen et al. 2008; Tangphatsornruang et al. 2010), members of Geraniaceae (Guisinger et al. 2010), and some representatives of Orobanchaceae (Downie and Bewley 1992; S. Wicke, C. W. dePamphilis, D. Quandt and G. M. Schneeweiss, unpublished data). So far, no properties have been identified that are shared between these rather distantly related angiosperms and might provide an explanation for these IR losses. In legumes, the loss apparently affects overall structural stability, leading to mutational hotspots (Palmer et al. 1987b; Milligan et al. 1989; Cai et al. 2008; Magee et al. 2010) and an overall increase of nucleotide substitution rates (Perry and Wolfe 2002). The changes in gene order of a Vigna angularis cultivar relative to other members of Fabaceae have been proposed to either be caused by a large inversion or mediated by a two-step model including IR expansion and contraction (Perry et al. 2002).

Small dispersed repeats

Reorganizations are in many cases associated with small dispersed repeats (SDR), which are hypothesized to contribute to the double-strand break induced repair mechanism (Milligan et al. 1989; Maul et al. 2002; Odom et al. 2008). SDRs often contribute significantly to repeat space in genomes with highly rearranged gene order and add to structural polymorphism in even closely related lineages (Maul et al. 2002). SDRs mainly occur in non-coding DNA fractions (spacer, introns; Raubeson et al. 2007), where they are often associated with small hairpin structures (Quandt et al. 2003; Kim and Lee 2005). The greatest concentrations of SDRs have so far been reported in green algal plastid genomes (ca. 20% of the Chlamydomonas plastome), although this seems to be highly lineage specific (Maul et al. 2002). Large repeats are assumed to be suppressed (or selectively eliminated) in plastid DNA because of their ability to cause recombination that may destabilize genome structure (Gray et al. 2009; Maréchal and Brisson 2010). Among angiosperms, the most abundant sizes of SDRs are on average smaller than 50 bp with direct repeats being more frequent than inverted repeats (Raubeson et al. 2007). A significant increase of repeats larger than the average has been reported in highly rearranged genomes such as Geraniaceae (Guisinger et al. 2010), Campanulaceae (Haberle et al. 2008), and Fabaceae (Cai et al. 2008), supporting the notion that repeats and genomic rearrangement are causally related. Possibly, tRNA genes might be recognized as repeated elements causing rearrangements by intramolecular or non-homologous recombination (Ogihara et al. 1988; Hiratsuka et al. 1989). In many cases, breakpoints of inversions are flanked by tRNA genes and short repetitive sequences (Hiratsuka et al. 1989; Haberle et al. 2008; Guisinger et al. 2010).

A unique switch in IR orientation (inversion) has occurred along the branch separating early diverging fern lineages (Psilotum, Angiopteris: Wakasugi et al. 1998; Roper et al. 2007; Karol et al. 2010) from derived leptosporangiate ferns (Adiantum, Alsophila: Wolf et al. 2003; Gao et al. 2009). This might be an outcome of the flip-flop recombination process proposed by Palmer (1983). Two smaller rearrangements occur at the breakpoint of the large inversion that is synapomorphic to all vascular plants except lycophytes (Raubeson and Jansen 1992; Wolf et al. 2003). The inversions reported in derived leptosporangiates are likely to be caused by two overlapping inversions during the evolution of leptosporangiate ferns (Wolf et al. 2003, 2010).

Several small and large inversions that are not accompanied by expansion and contraction of an IR have been reported for diverse angiosperm lineages (Asteraceae: Jansen and Palmer 1987; Kim et al. 2005; Spinacia: Schmitz-Linneweber et al. 2001; some Oleaceae: Lee et al. 2007; Mariotti et al. 2010; grasses: Hiratsuka et al. 1989; Bortiri et al. 2008), but seem to be less frequent in early land plants lineages. However, one large inversion (71 kb), affecting nearly the entire LSC, is found in the model moss Physcomitrella patens (Sugiura et al. 2003). This inversion was shown to be autapomorphic to Physcomitrella and Funariales, but absent in other mosses (Goffinet et al. 2007). Due to the small number of plastid genomes sequenced from early land plant lineages, little is known about other structural rearrangements in bryophytes. As of this writing, no structural changes (inversions) have been identified in liverworts (L. L. Forrest and B. Goffinet, Ecology and Evolutionary Biology, University of Connecticut/USA, personal communication). Some of the largest inversions observed may be attributable to flip-flop recombination due to the existence of the large inverted repeats (Palmer 1983). In the flowering plants studied so far, it has been shown that flip-flop recombination and inversions predominantly occur around the origin of replication (ori). In some angiosperms, the ori B maps to the rDNA-ycf1 region within the IR, which is located more closely to the IR-SSC-boundary than to the IR-LSC junction (Thompson et al. 1986; Lu et al. 1996; Kunnimalaiyaan and Nielsen 1997; Eisen et al. 2000; Mackiewicz et al. 2001).

Genome size reduction, gene transfer, and gene gains

Genome size reduction is another major aspect of non-canonical structural evolution. The most dramatic changes in genome size and gene content have been reported for non-photosynthetic parasitic plants. The plastome of Epifagus (Wolfe et al. 1992) measures only about half the size of an average eudicot plastome (Bock 2007). This is mainly due to non-functionalization of most photosynthesis-related genes (dePamphilis and Palmer 1990) and some genes for transcription and translation (Morden et al. 1991). Although there is a general trend of (functional) plastid genome reduction in parasitic plants, the size and gene content seem to vary widely among different lineages because some highly heterotrophic species retain photosynthetic ability (Revill et al. 2005; Funk et al. 2007; McNeal et al. 2007; Nickrent and García 2009). Independent of parasitism, genome reduction was observed in Pinaceae and Gnetophytes (McCoy et al. 2008; Wu et al. 2009), due in large part to the loss of ndh genes. The plastomes of Gnetum and Welwitschia are also more compact than in other seed plant lineages due to the reduction of intron and spacer regions (McCoy et al. 2008; Wu et al. 2009). This genome reduction is speculated to be the result of a low-cost strategy that could facilitate rapid genome replication under disadvantageous environmental conditions (McCoy et al. 2008; Wu et al. 2009).

Translocation of single genes is rare in plastid genomes, and this is likely a reflection of the overall rarity of inserted (vs. lost or rearranged) sequences in plastid genomes. Reports of foreign DNA being naturally inserted into the plastid DNA are rare (Maul et al. 2002; Haberle et al. 2008; Guisinger et al. 2010); perhaps in part because of the difficulty of detecting insertions in poorly conserved intergenic regions. Many of the repetitive elements found in highly rearranged genomes seem to be derived from plastid sequences (Cai et al. 2008; Haberle et al. 2008; Guisinger et al. 2010). However, some are unique which might suggest either rapid divergence or a non-plastid origin (Guisinger et al. 2010). As already mentioned by Park et al. (2007), the putatively horizontally acquired rbcL gene copies found in several Phelipanche species (Orobanchaceae) are most likely located in the nuclear or mitochondrial genome, and are not plastid encoded. RbcL appears to be generally absent from Phelipanche plastid genomes (S. Wicke, D. Quandt, C. W. dePamphilis, G. M. Schneeweiss, unpublished data).

Gene gains, too, are exceptional during plant evolution (e.g. matK, ycf1/2; Fig 1). The organization and regulation of genes in operons might be one stabilizing factor. Most often, localized changes of gene order are caused by the loss of single genes to the nuclear genome, or due to non-functionalization in parasitic or mycotrophic plants.

Functional transfer of genes and subsequent loss of the plastid gene copy has been reported for some rosids (Jansen et al. 2010), some monocots (e.g. Hiratsuka et al. 1989; Masood et al. 2004; Saski et al. 2007) and the spikemoss Selaginella uncinata (Tsuji et al. 2007).

Contrasting with the overall high degree of conservation of plastome structure and gene content in land plants, massive structural changes are occasionally found in several unrelated lineages. These include derived angiosperm families such as Geraniaceae (Palmer et al. 1987a; Chumley et al. 2006; Guisinger et al. 2010), Fabaceae Palmer et al. (1987b); Milligan et al. 1989; Cai et al. 2008; Tangphatsornruang et al. 2010), members of Onagraceae (Oenothera: Hupfer et al. 2000; Greiner et al. 2008), Campanulaceae (Knox and Palmer 1999; Cosner et al. 1997, Cosner et al. 2004; Haberle et al. 2008), but also leptosporangiate ferns (Wolf et al. 2003, 2010; Gao et al. 2009). Because some of the extensively re-shuffled angiosperm plastomes occur in lineages with biparental plastid inheritance (Corriveau and Coleman 1988), it is tempting to speculate that the nature of plastid inheritance may affect plastid genome stability. Biparental inheritance combined with fusion of paternal and maternal plastids (although rare; Wellburn and Wellburn 1979) would likely result in homologous recombination between putatively divergent plastome copies (experimentally shown by Fejes et al. 1990), eventually leading to alteration of the genome structure. In other plants, major rearrangements, in particular gene losses, are obviously connected to a change in lifestyle from autotrophy to parasitism or myco-heterotrophy (Aneura: Wickett et al. 2008a; Orobanchaceae: dePamphilis and Palmer 1990; Wolfe et al. 1992; Convolvulaceae: Funk et al. 2007; McNeal et al. 2007, 2009; Viscaceae: Nickrent and García 2009; and Lennoaceae: Y. Zhang and C.W. dePamphilis, unpublished data).

The precise mechanisms underlying structural changes are as yet unknown, but they are often associated with the presence of nearby repeat sequences, including small repeated sequences that are dispersed through the genome (Maul et al. 2002; Cui et al. 2006; Omar et al. 2008; Cai et al. 2008; Gray et al. 2009; Maréchal and Brisson 2010). Similarly to the plastid genome, in both the nuclear and mitochondrial genomes, structural reorganizations often are observed in proximity to structural RNA genes and short repetitive flanking sequence motifs (Grewe et al. 2009). In the nuclear genome, the latter is often associated with transposon activity (Woodhouse et al. 2010). In mitochondrial genomes, transposons are restricted to angiosperms (Knoop et al. 1996; Kubo et al. 2000; Notsu et al. 2002), but are absent in early land plant lineages (Ohyama 1996; Knoop 2004; Grewe et al. 2009). No (retro-) transposons, or traces thereof, have ever been reported from land plant plastomes. Yet, the plastid chromosome of the model green algae Chlamydomonas harbors two copies of the non-functional transposable element Wendy (Fan et al. 1995, Maul et al. 2002). Consequently, mechanisms suggested for nuclear and mitochondrial genomes are less likely for plastid genomes given the current knowledge on their evolution (reviewed in Palmer 1991; Raubeson and Jansen 2005; Bock 2007).

Other possible candidates for causing restructuring of plastid genomes are relaxed repair mechanisms and/or recombination processes. Recently, several nuclear encoded genes and gene families have been identified that mediate stabilization, repair and maintenance of the plastid chromosome (Day and Madesis 2007; Maréchal and Brisson 2010). It might be possible that mutations in these proteins could lead to impaired maintenance of the plastid genome structure (Guisinger et al. 2010).

Gene content and function of the plastid genome

The central function of the chloroplast is to carry out photosynthesis and carbon fixation. Besides genes encoding elements for the genetic apparatus, such as structural and transfer RNAs, the plastome encodes numerous proteins for photometabolic pathways (Palmer 1991; Sugiura 1992; Raubeson and Jansen 2005; Bock 2007). The following functional protein categories can be distinguished (Table 1): proteins for the genetic apparatus, for non-photosynthesis related metabolic pathways, for primary (light-dependent) photosynthetic reactions, and for secondary (light-independent) photosynthesis pathways. In most cases, fully functional protein complexes are assembled from plastid encoded gene products and nuclear encoded subunits that are imported into the plastid organelle.

Table 1 Summary of plastid encoded genes in land plants. Genes are divided primarily according to their principal function (light-independent pathways, light-dependent pathways, genetic apparatus), and, secondarily according to the function of their respective subunits in a given protein

Plastid encoded elements for the plastid genetic apparatus

Many genes that encode pathways for the plastid genetic apparatus have been transferred to the nucleus and are now imported into the plastid. However, genes for transcription and protein biosynthesis are retained in the plastome. These comprise structural RNAs (rRNA, tRNA), some ribosomal proteins, and genes for a DNA-dependent RNA polymerase as well as few genes coding for DNA and protein processing enzymes.

Genes for DNA/RNA processing enzymes

Plastid genetics is sometimes described as “chimeric” in that eukaryotic cytosolic (e.g. poly-A-binding proteins) and eubacterial components (e.g. Shine-Dalgarno interactions) are combined with novelties such as regulating stem loops in the 5′- and 3′- untranslated regions of plastid mRNAs (Zerges 2000). Transcription of plastid genes is carried out by a set of DNA-dependent RNA polymerases: nuclear encoded (phage-type) polymerase (NEP) and plastid-encoded (eubacterial-type) polymerase (PEP). Both transcribe distinct groups of genes (Hajdukiewicz et al. 1997; Cahoon and Stern 2001; Shiina et al. 2005) and require different transcription promoting signals (Weihe and Börner 1999). Promoter signals of PEP-transcribed genes are highly similar to those of eubacterial σ70-promoters with AT-rich sequences in the -35 promoter element (consensus 5′-TTGACA-3′) and the -10 TATA-box (consensus 5′-TATAAT-3′) upstream of the transcription initiation site (Briat et al. 1986). Promoter elements of NEP-transcribed genes are less conservative and share only short elements (Weihe and Börner 1999). Three different types are known. Two are characterized by a common core promoter YRT-element (i.e. purine-pyrimidine-thymidine stretch) that is highly conserved among flowering plants. This motif is localized in close proximity to the start codon (less than 10 bp away), where it can be preceded by a GAA-box. The different classes of promoters are recognized by two phage type polymerases. In Arabidopsis, the existence of at least two plastid targeted NEPs has been experimentally corroborated (Swiatecka-Hagenbruch et al. 2008), but evidence for differential usage or affinity to particular promoters is currently lacking. In eudicots, one of these NEPs is targeted to mitochondria and plastids (Kobayashi et al. 2001), which is reflected in partially shared promoter architectures between both organelles (Kühn et al. 2005). However, this dual-targeted phage type polymerase appears to be absent from other land plants including monocots and early diverging angiosperms (Yin et al. 2010).

PEP is lost or pseudogenized in some parasitic plants with minimal or no photosynthetic activity such as Cuscuta (Funk et al. 2007; McNeal et al. 2007) and Orobanchaceae (Wolfe et al. 1992; Delavault et al. 1996). The loss of PEP subunits renders its promoters dispensable, potentially allowing them to be lost from the plastome (Krause et al. 2003). However, NEP seems to be able to take over at least some of PEP’s transcriptional functions as suggested by the frequent presence of both NEP and PEP promoters upstream of several plastid transcription units, for instance in the rrn16-trnV region (Krause et al. 2003). In both Cuscuta (Berg et al. 2004) and Lathraea (Lusson et al. 1998) expression of the rbcL gene is accomplished by NEP after the loss of PEP.

MatK—a general group IIA intron maturase?

Protein coding genes that are related to (post-) transcriptional activity include the matK gene. The matK-gene product is thought to act as a splicing factor for plastid group IIA (gIIA) introns (Liere and Link 1995). It is commonly referred to as a ‘general’ maturase associated with several different intron-containing plastid mRNAs (Zoschke et al. 2010). MatK is transcribed from the sole intact plastid gII intron ORF localized between the exons coding for the lysine-tRNA (trnKUUU). In contrast to other gII ORFs, MatK has lost domains assigned to a reverse transcriptase and endonuclease function. Similarity to typical gII ORF maturases is only retained in the DNA-binding domain (Mohr et al. 1993; San Filippo and Lambowitz 2002; Mohr and Lambowitz 2003; Lambowitz and Zimmerly 2004; Pyle and Lambowitz 2006; Hausner et al. 2006). The molecular evolution of the matK coding region is unusual compared to other plastid genes in that all three codon positions evolve at nearly equal rates (Hilu and Liang 1997). This feature makes it particularly useful for phylogenetic reconstruction (Müller et al. 2006; Wicke and Quandt 2009). Equal substitution rates at all codon positions, however, are indicative of relaxed purifying selection (Müller et al. 2006; Duffy et al. 2009), which led several authors to question its function or functionality in land plants (Hausner et al. 2006). Substitution rate analysis, however, demonstrated purifying selection for matK in parasitic lineages including Orobanchaceae (Young and dePamphilis 2000) and some Cuscuta species (McNeal et al. 2009), providing evidence for sustained functionality. In Cuscuta, however, matK is absent from species (Funk et al. 2007; McNeal et al. 2007) that have lost all of the seven gIIA introns that likely depend upon the matK maturase for splicing (McNeal et al. 2009; Zoschke et al. 2010), which lends further support to the hypothesis of a more general demand for the matK-encoded maturase function.

Structural RNAs

Reflecting their localization within the IR region, two sets of structural ribosomal RNA species (rrn23, rrn16, rrn5, rrn4.5) are encoded in most plastid genomes of green plants studied so far. The few exceptions with only one set occur in lineages that have lost one copy of the IR. The ancient duplication of the plastid ribosomal DNA operon and its conservation throughout plant evolution might be attributed to generally high quantities of rRNA required for ribosome synthesis during early developmental stages (Bendich 1987). The large ribosomal subunit (rrn23, cpLSU) is arranged upstream of the smallest ribosomal subunits of 4.5S (rrn4.5) and 5S RNA (rrn5), which might facilitate expression and delivery of either subunit at equal ratios. Moreover, the existence of two copies facilitates the maintenance of these genes by, e.g., gene conversion (Lemieux and Lee 1987). The small ribosomal subunit (rrn16, cpSSU) is separated from the remainder rRNAs by two tRNA genes. Functional domains of either rRNA species are highly conserved and show 65–80% similarity to eubacterial (cyanobacterial) ribosomal RNAs (Palmer 1985; Harris et al. 1994; Stoebe and Kowallik 1999; Zerges 2000).

30 different tRNAs are encoded in a typical angiosperm plastid genome. Recognition of all 61 codons is possible by superwobbling (“two out of three”-mechanism; Lagerkvist 1978; Pfitzinger et al. 1990; Rogalski et al. 2008). Superwobbling allows reading of all possible codons even if there is only one tRNA encoded as in the case of alanine, arginine, asparagine, aspartic acid, cysteine, glutamic acid, histidine, lysine, phenylalanine, proline, tryptophan, and tyrosine (Palmer 1991; Sugiura 1992; Bock 2007). In addition to protein biosynthesis, glutamyl tRNA (encoded by the plastid trnE gene) plays a prominent role during activation of heme biosynthesis (Smith 1988; Howe and Smith 1991; Jahn et al. 1992). This and the low rates of tRNA import into cell organelles (Dietrich et al. 1992, Dietrich et al. 1996; Lohan and Wolfe 1998) led Barbrook et al. (2006) to suggest that a minimal plastid genome would at least contain the trnE gene. However, experimental data concerning the import machinery for small structural RNAs are rare and evidence for general tRNA import into plastids is lacking. Therefore, it remains speculative to what extent the plastid genome could eventually be reduced.

Nonphotosynthetic and minimally photosynthetic angiosperms typically retain only a fraction of tRNAs (Morden et al. 1991; Lohan and Wolfe 1998; Funk et al. 2007; McNeal et al. 2007, 2009; Nickrent and García 2009). In Orobanchaceae, the loss of some tRNA-genes, e.g. trnC, seems to be correlated with the loss of photosynthesis (Taylor et al. 1991). Because expression analyses of retained genes in the highly reduced plastomes of Epifagus (Wolfe et al. 1992) and Conopholis (Wimpee et al. 1991, Wimpee et al. 1992) suggest an intact translation apparatus, the loss of tRNAs from their genomes might be indicative of tRNA import into plastid organelles. Pseudogenization of tRNAs has been reported for the mistletoe Arceuthobium (Nickrent and García 2009) and for Cuscuta (Funk et al. 2007; McNeal et al. 2007). In non-parasitic plants, the loss of e.g. trnKUUU has occurred independently multiple times (Selaginella: Tsuji et al. 2007; leptosporangiate ferns: Duffy et al. 2009; Wolf et al. 2010; Gao et al. 2010; Geraniaceae: Guisinger et al. 2010).

Plastid ribosomal proteins and ribosomes

Plastid protein biosynthesis is carried out at eubacterial-like 70S ribosomes (Zerges 2000). These are assembled from the small 30S ribosomal subunit and the large 50S subunit. The 16S ribosomal RNA builds the backbone of the 30S ribosome subunit, which additionally includes 25 ribosomal proteins (Yamaguchi et al. 2000). The remaining three plastid rRNA species together with 33 ribosomal proteins constitute the 50S ribosome subunit (Yamaguchi and Subramanian 2000). Most genes coding for ribosomal subunit proteins have been transferred to the nuclear genome. However, land plant plastomes commonly encode twelve proteins for the small ribosomal subunits (rps genes) and nine large ribosomal subunit proteins (rpl genes). Loss of rps and rpl genes from plastomes is rare, but has been detected in rosids (e.g. rpl22, rpl23, rps16; see Jansen et al. 2007; Jansen et al. 2010; Magee et al. 2010 for an overview) and a variety of non-photosynthetic or minimally photosynthetic angiosperms (Epifagus: dePamphilis and Palmer 1990; Conopholis: Y. Zhang and C. W. dePamphilis, unpublished data; Cuscuta: Funk et al. 2007; McNeal et al. 2007; Arceuthobium: Nickrent and García 2009). Whether parasitic angiosperms are able to translate proteins with a reduced set of ribosomal proteins or import missing components is still unknown.

Other proteins associated with plastid ribosomes are a nuclear encoded ribosome recycling factor and several plastid ribosome specific proteins (PSRPs) that are unique to plants and show no similarities to bacterial homologs (Yamaguchi et al. 2000, Yamaguchi et al. 2003; Yamaguchi and Subramanian 2000; Sharma et al. 2010). The assembly of the eubacterial-type ribosomes has been studied intensively (reviewed in Moore 1998), but so far no such studies are available for plastid ribosomes. Given the high similarity of ribosomal RNA and most ribosomal proteins between eubacteria and plastids, it can be assumed that plastid ribosome assembly is similar to that of eubacteria. Most of the ribosomal proteins of the 30S ribosome subunit bind to the so-called S7-branch or are dependent on other (plastid encoded) proteins for binding (Grondek and Culver 2004). Thus, through analogy with eubacterial ribosomal proteins, plastid encoded ones might be divided into primary, secondary and tertiary binding components of the 30S and the 50S (Table 1) ribosome subunit according to their rRNA binding features.

Four proteins that are bound to the 30S ribosome subunits have no homologs in the eubacterial (i.e. E. coli-type) ribosome and are nuclear-encoded PSRPs. Two additional PSRP-proteins are bound to the 50S ribosome subunit (Yamaguchi et al. 2000; Yamaguchi and Subramanian 2000). It remains unknown how PSRPs are assembled into functional ribosome complexes. Recent analyses of PSRPs suggest that they play a role in light-dependent regulation of transcription/translation processes (Sharma et al. 2010).

One translation initiation factor assisting in the assembly of the translation initiation complex is encoded by the plastid gene infA (translation initiation factor; a total of three are known from eubacterial translation mechanisms). InfA has been lost multiple times independently during land plant evolution. Although present in all bryophyte and fern lineages, it is pseudogenized in the lycophyte Isoëtes (Karol et al. 2010), but appears to be functional in other lycophytes (Selaginella: Tsuji et al. 2007; Huperzia: Wolf et al. 2005). In angiosperms, multiple losses have been reported (summarized in Jansen et al. 2007; Magee et al. 2010), accumulating in lineages known for their non-canonical plastid genome evolution (e.g. legumes; Millen et al. 2001).

clpP—a protein-modifying enzyme

High levels of photosynthetic gene expression coincide with an enormous protein turn-over in plastids. Both maturation and protein degradation involve ATP-dependent synthase/protease complexes that act as molecular chaperones restoring or degrading damaged proteins according to the severity of protein denaturation (Wawrzynow et al. 1996; Adam et al. 2001; Adam and Clarke 2002). In plastids, three different protease complexes have been identified: Fts (filamentation temperature sensitive protease), DegP/HtrA (high temperature requirement protease A) and Clp (Caseinolytic protease). Whereas all subunits of the first two complexes are encoded by the nuclear genome, ClpP is plastid encoded.

Plastid genes coding for protein subunits involved in photosynthetic dark reactions and biogenesis

Genes for protochlorophyllide reductase subunits, proteins for CO2 uptake and cytochrome C biogenesis

Bryophytes, lycophytes, ferns and most gymnosperms harbor genes for three subunits of a light-independent protochlorophyllide reductase (chlB, chlL, chlN) in their plastomes. This enzyme is involved in porphyrin and chlorophyll metabolism (Reinbothe and Reinbothe 1996; Karpinska et al. 1997). In gnetophytes, an aberrant gymnosperm group with still controversial phylogenetic position (e.g. Zhong et al. 2010), chlB, chlL and chlN are lost to different extents (McCoy et al. 2008; Wu et al. 2009). In Ephedra, sister group to the remaining Gnetales (Zhong et al. 2010), all three genes are present and intact, whereas Gnetum and Welwitschia possess pseudogenes of two subunits and have lost the third (McCoy et al. 2008; Wu et al. 2009). Different patterns in pseudogenization and chl-gene loss in both genera might indicate relaxation of evolutionary constraints to maintain functional copies, perhaps due to import of as yet unidentified nuclear substitutes.

The gene ccsA (ycf5) encodes a protein mediating the attachment of heme to c-type cytochromes during cytochrome biogenesis (Xie and Merchant 1996; Saint-Marcoux et al. 2009). The gene is localized in the plastid SSC region, and widely conserved among photosynthetic plants. However, ccsA is lost from Epifagus (Wolfe et al. 1992), and pseudogenized in Aneura mirabilis (Wickett et al. 2008a) The reading frame is, however, retained in all Cuscuta species sequenced so far (McNeal et al. 2007; Funk et al. 2007).

Land plant plastomes also encode a protein localized in the inner envelope membrane (inner-envelope protein, cemA/ycf10; Sasaki et al. 1993b). Knockouts of the gene cemA in Chlamydomonas severely affected the uptake of CO2, while not affecting photosynthetic reactions (Rolland et al. 1997). CemA is lost from the plastid genome of Epifagus (Wolfe et al. 1992) and other Orobanchaceae (S. Wicke et al., unpublished data), but present in Cuscuta (Funk et al. 2007; McNeal et al. 2007), and Aneura (Wickett et al. 2008a).

rbcL

The rbcL gene encodes the large subunit of the ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO). RuBisCO is estimated to be the most abundant protein on earth (Ellis 1979). With the assistance of chaperones, it is assembled from eight large subunits (RbcL) and eight small subunits (RbcS). In contrast to red algae and Glaucophytes, Chlorophytes and Streptophytes do not possess a functional gene copy for the small RuBisCO subunit (rbcS gene) in the plastid genome. Instead, RbcS is encoded as a nuclear gene family and targeted to the plastid (Clegg et al. 1997). In contrast to many other photosynthesis related genes, rbcL is often retained in non-photosynthetic plants. Putatively functional copies of rbcL are retained in several representatives of Orobanchaceae, such as Lathraea (Delavault et al. 1996; Lusson et al. 1998), Orobanche corymbosa, O. fasciculata (Wolfe and dePamphilis 1997; Leebens-Mack and dePamphilis 2002), most species of Harveya (Leebens-Mack and dePamphilis 2002; Randle and Wolfe 2005), and the non-photosynthetic liverwort Aneura mirabilis (Wickett et al. 2008a). In other broomrape species, rbcL is only found as a pseudogene (as in Epifagus: Wolfe et al. 1992, O. cernua: Wolfe and dePamphilis 1997; Hyobanche, Randle and Wolfe 2005), or has been completely lost (S. Wicke et al., unpublished data). Retention, expression, and evidence for strong purifying selection in hemiparasitic and some holoparasitic plants have led to the speculation that rbcL is involved in another, yet photosynthesis unrelated pathway (Leebens-Mack and dePamphilis 2002; Randle and Wolfe 2005; McNeal et al. 2007; see section “Plastid encoded genes for photosynthesis unrelated pathways”).

Plastid genes for thylakoid complexes involved in photosynthetic light reactions

Oxygenic photosynthesis requires efficient light harvesting systems as well as an electron transport chain. The inner (thylakoid) membrane of the plastid contains at least five major protein complexes: photosystem I (PSI), photosystem II (PSII), cytochrome b 6 /f complex, ATP synthase and an NAD(P)H-plastoquinone oxidoreductase-complex (summarized in Table 1; Gounaris et al. 1986; Nixon et al. 1989).

Photosystem I and II (psa and psb genes)

In plants, light is harvested by two photosynthetic reaction centers, PSI and PSII. These are localized in the thylakoid membrane and form supercomplexes, each with its own light harvesting complex that absorbs light via antenna molecules (chlorophyll a/b, and carotenoids). The exact number of proteins and cofactors associated with PSI and PSII supercomplexes is not known. PSII contains at least 17 subunits, 15 of which are encoded by the plastid genome (psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z). These genes are scattered across the LSC region. All plastid psb-gene products form transmembrane helices (Nelson and Yocum 2006) and bind to the subunits PsbA (syn. D1), B, C, and D (syn. D2; Eckardt 2001). The gene products of psbN and psbZ (syn. ycf9) supposedly interact with the chlorophyll-bound subunit PsbC that reaches into the thylakoid lumen (Nelson and Yocum 2006). The structure of PSI is less complex than that of PSII, because it contains fewer polypeptides in its reaction center. The genes encoding for its plastid encoded subunits (psa genes) are found in the LSC region with the exception of psaC, which is embedded in an operon of ndh-genes in the plastome SSC region. Five subunits of plastid encoded PSI (A, B, C, I, J) are transmembrane proteins. The structurally highly similar apoproteins PsaA and PsaB bind to the iron-sulfur reaction center that mediates the transfer of excitated electrons from plastoquinone to ferrodoxin (Nelson and Yocum 2006). PsaC codes for a peripheral subunit on the stromal side of PSI, which is directly involved in ferrodoxin reduction by binding the terminal electron acceptor molecules and linking them to the PSI iron-sulfur center (Fischer et al. 1998). Subunits I and J are not essential for PSI function (Bock 2007).

Photosystem assembly factors (ycf3, ycf4)

Both photosystems have been shown to be assembled with the help of chaperones (Nelson and Yocum 2006). The products of two plastid genes, ycf3 (orf62) and ycf4 (orf184), function as assembly factors for the photosystem I complex (Boudreau et al. 1997a; Ruf et al. 1997; Naver et al. 2001; Ozawa et al. 2009). Mutations in certain amino acid residues that mediate protein–protein interactions led to decreasing levels of PSI accumulation in the thylakoid membrane, as did gene disruption experiments (Boudreau et al. 1997a). Recently, it has been shown that Ycf3 interacts with at least one nuclear encoded protein during the assembly of PSI (Albus et al. 2010). The naming of both genes is somewhat misleading as it implies that their function is still unknown. However, the transcripts of both ORFs are obviously translated and the resulting polypeptides assist during the assembly of the photosystem I. We therefore suggest renaming both genes to PSI assembly factor I (pafI, the former ycf3) and II (pafII, the former ycf4). The specifications I and II refer to the timing at which they are thought to interact with PSI following the model proposed by Ozawa et al. (2009).

Cytochrome b6f complex (pet-genes) and ATP-Synthase complex (atp-genes)

PSII and PSI are electrochemically connected in series by the cytochrome b 6 /f complex. This is a functional complex composed of nine different subunits plus several inorganic cofactors, such as chlorophyll a, heme, β-carotene and an iron-sulfur cluster (Baniulis et al. 2008).

Six subunits are plastid-encoded (petA, B, D, G, L, N). These participate in electron transfer, generating a proton gradient across the thylakoid membrane (Stroebel et al. 2003). Together with the nuclear encoded Rieske protein (PetC), the gene products of petA (cytochrome f), petB (cytochrome b 6 ) and petD (subunit IV) form the core complex that acts in the linear electron transport (Kurisu et al. 2003). The remaining subunits (PetN, PetG, PetL plus nuclear encoded PetM, PetH) are hydrophobic molecules and are arranged peripherally around the core (Cramer et al. 2006).

Plastid ATP Synthase is a multi-subunit complex composed of nine different proteins generating ATP using the proton gradient. These constitute an integral membrane domain (F0-domain) and an extrinsic catalytic domain (F1-domain) reaching into the stroma (Mccarty 1992). The F1-subunit consists of five different polypeptides (α–ε), three of which are encoded by the plastome (atpA, B, E). The F0-domain involved in proton translocation is built from three different polypeptides (ac) that are exclusively plastid encoded (atpF, I, H; Vollmar et al. 2009).

All plastid-encoded genes for the photosynthetic apparatus are highly conserved in land plant plastomes (with the exception of ndhA–K, see below). Loss or pseudogenization have only been reported in non-photosynthetic parasitic (Krause 2008) or myco-heterotrophic (Wickett et al. 2008a, b) plants.

Plastid NAD(P)H-complex (ndh-genes)

Electrons are recycled around PSI in different pathways. One of which is carried out by a plastid NAD(P)H-dehydrogenase complex (Ndh1-complex) incorporated in the thylakoid membrane (Casano et al. 2000; Nixon 2000). This complex might also be involved in chlororespiration, i.e. the process of respiratory electron transport in addition to and/or in interaction with the photosynthetic electron transport. The plastid Ndh1-complex non-photochemically reduces and oxidizes plastoquinones. Furthermore, it may also mediate the transport of electrons from PSI-ferrodoxins back to PSII (reverse electron transport; Peltier and Cournac 2002). Subunit composition appear to be highly divergent between cyanobacteria and derived land plants (reviewed in Suorsa et al. 2009). Together with several partly uncharacterized subunits, Ndh1 consists of distinct subcomplexes ranging from ca. 500 to over 1,000 kDa (Suorsa et al. 2009).

Eleven subunits of the Ndh1-complex are encoded by the plastid genome (ndhA, B, C, D, E, F, G, H, I, J, K). Plastid subunits A-D as well as H–K are homologous to the eubacterial (mitochondrial) proton pumping complex I (Friedrich et al. 1995). Experimental studies have shown that plastid encoded Ndh1-subunits might not be essential for plant survival in tobacco, although ndh-gene knockouts did cause phenotypic alterations (Peltier and Cournac 2002 and references therein).

The plastid encoded genes of the Ndh1 are pseudogenized or entirely lost several times during land plant evolution. Given current data, these losses seem to be predominantly connected to a heterotrophic lifestyle in land plants (parasitism, some forms of myco-heterotrophy). This includes the myco-heterotrophic and non-photosynthetic liverwort Aneura mirabilis (Wickett et al. 2008a), the photosynthetic or partially non-photosynthetic parasitic Cuscuta (McNeal et al. 2007; Funk et al. 2007), the non-photosynthetic parasite Epifagus (dePamphilis and Palmer 1990), orchid species (Chang et al. 2006; Wu et al. 2010), and some gymnosperms (Wu et al. 2009) as well as some representatives of carnivorous Lentibulariaceae (B. Schäferhoff, S. Wicke, C. W. dePamphilis and K. Müller, unpublished data), and some species of Geraniaceae (Blazier et al. 2011). Ndh genes are also absent from several chlorophyte algae genomes (incl. Chlamydomonas), but they are present in plastomes of the closest relatives of land plants (Turmel et al. 2006; see also Martín and Sabater 2010).

The Ndh1 complex may also be associated with other pathways, and might play an important role in adaptation to environmental stress (reviewed in Suorsa et al. 2009). Abiotic stress, such as nutrient starvation (in particular nitrogen starvation), affected and up-regulated ndh-gene expression indicating a putative regulating function of Ndh1 for the photosynthetic electron flow (Peltier and Schmidt 1991). Due to the presence of nuclear genes of Arabidopsis with strong similarities to ndh complexes and plastid transit peptide sequences (Peltier and Cournac 2002), the existence of a second, nuclear encoded plastid ndh complex (Nda2) has long been suspected. Recently, an alternative form of an plastid localized Ndh-complex involved in non-photochemical plastoquinone reduction was identified (Sirpiö et al. 2009; Takabayashi et al. 2009; Ishida et al. 2009; Suorsa et al. 2009, 2010). The existence of a second form might explain the multiple losses of Ndh1 genes from land plant plastomes. It may be that the function of an alternative Ndh-complex, or of fewer or incompletely assembled Ndh1-subcomplexes is sufficient for photosynthetic and related pathways in some, yet not all, plants—in particular, if they exhibit a certain degree of heterotrophy (e.g. myco-heterotrophy, parasitism, carnivory). It might be that nutrient supplies could affect the activity of the Ndh1 complex in a way that renders it dispensable. In the light of expression analyses under nitrogen starvation (Peltier and Schmidt 1991), the responsible factor may include the type of nitrogen source (nitrate vs. ammonium) or the excess of nitrogen (and/or other nutrients or even assimilates) obtained from a host plant. It is unclear whether this also accounts for the loss of ndh genes from the plastomes of Pinaceae, Gnetophytes and some Geraniaceae. As with many land plants, gymnosperms live in close association with mycorrhizae (Wang and Qiu 2006). Thus, it may be possible that fungal associations, or the fungal symbiont itself contributes to the fate of ndh-genes. On the other hand, throughout land plants, the presence of mycorrhizae and ndh loss appear to be only imperfectly correlated; evidently, more data is necessary before sound conclusions can be drawn, since other reasons such as multiple independent functional gene transfers must be considered as well (see also Blazier et al. 2011).

Plastid encoded genes for photosynthesis unrelated pathways

Plastid genes for metabolic pathways unrelated to photosynthesis include proteins for fatty acid synthesis, and sulfur metabolism.

AccD and the RuBisCO “shunt”

Acetyl-CoA carboxylase is another key enzyme in plastids mediating the irreversible conversion of acetyl-CoA to malonyl-CoA during the biosynthesis of fatty acids (Neuhaus and Emes 2010). The beta subunit of this multimeric enzyme (accD) is encoded in the LSC of the plastome in Streptophytes (Sasaki et al. 1993a) and is considered to be crucial for leaf development (Kode et al. 2005). The accD gene has been lost from the plastid genome several times in angiosperms (Jansen et al. 2007) where its function is fulfilled by nuclear copies (Nakkaew et al. 2008).

Recently, RuBisCO has been found to be involved in a previously unrecognized glycolysis bypassing reaction that converts carbohydrates to fatty acids at low carbon cost in oily seeds of white turnip (Brassica rapa, Schwender et al. 2004). This has been proposed as a likely reason for the retention of a photosynthetic pathway in parasitic species of Cuscuta that are fully heterotrophic, yet nonetheless would benefit from the RuBisCO “shunt” to enable rapid and efficient lipid synthesis (McNeal et al. 2009).

Genes related to sulfur metabolism

Liverworts contain at least two more protein coding genes absent from most other land plants, cysA and cysT. CysA (designated mbpX in the Marchantia polymorpha plastome) shows functional domains similar to inner membrane sulfate ABC (ATP binding cassette) transporters. Although conservation of amino acid composition drops dramatically towards the N-terminus, similarity searches suggest that both genes might belong to sulfate related transport complexes or sulfate permeases and thus may have a function related to sulfate metabolism (Laudenbach and Grossman 1991). However, both subunits are lacking from most other land plant plastid genomes (mosses, ferns, seed plants) and the green algae Chlamydomonas (Sugiura 1992; Maul et al. 2002; Melis and Chen 2005; Lindberg and Melis 2008). In hornworts, a cysA-like gene is present in the plastid genome, but it appears to be non-functional (Kugita et al. 2003).

Plastid genes of unknown function

ycf1 and ycf2

Green algae, including the closest relatives of Embryophytes, possess an ftsH reading frame, which encodes a metalloprotease. Predominantly at the carboxyl-terminus, ftsH exhibits similarities to the largest, yet functionally uncharacterized ORF (ycf2) in land plants (Wolfe 1994). Nucleotide sequence similarity among land plant ycf2 is extraordinarily low compared to other plastid-encoded genes, being less than 50% across bryophytes, ferns, and seed plants. Ycf2 harbors nucleotide binding sites typical for green algal and eubacterial ftsH and CDC48 gene families, which are involved in cell division processes, proteolysis, and protein transport (Wolfe 1994). Furthermore, a “DPAL”-motif, shared by CDC48 and ycf2, is highly conserved. In several angiosperm plastomes, a smaller ORF, ycf15, is present directly downstream of the ycf2 gene (Raubeson et al. 2007 and references therein). So far, an exact function has not been assigned to the ycf15 gene product. Expression studies in spinach suggested that ycf15 might act as a regulator for Ycf2 on the RNA level, but might not function on protein level (Schmitz-Linneweber et al. 2001). Consistent with an RNA-level function, Raubeson et al. (2007) showed that ycf15 is not under purifying selection as expected for most protein coding sequences. A non-protein function might also account for the conservation of the cryptic reading frame ycf68 found in the IRs of several angiosperms (Raubeson et al. 2007) and Aneura mirabilis (Wickett et al. 2008a). The persistence of both ycf15 and ycf68 ORFs might be attributable to their localization in the slowly evolving IR region.

Ycf1, the second largest gene in plastid genomes, codes for a protein of approximately 1800 amino acids, yet its precise function remains to be determined. Experimental data and comparisons of Chlamydomonas and angiosperm ycf1 homologs revealed conserved nucleotide binding sites (Boudreau et al. 1997b). Based on these data, functions of ycf1 and ycf2 have been hypothesized to involve ATPase-related activities, chaperone-function, activity in cell divisions (depicted from similarities with ftsH) and structural remodeling and/or linkage of plastid chromosomes to protein and/or membrane structures (Wolfe 1994; Boudreau et al. 1997b). Available data on gene expression in tobacco show that, similar to ycf2, ycf1 is expressed in fruits (Drescher et al. 2000). Products of both genes are essential for plant cell survival (Drescher et al. 2000; Boudreau et al. 1997b). In most land plant lineages, ycf1 and ycf2 genes have elevated substitution rates and may have undergone pseudogenization (Oliver et al. 2010; Wolf et al. 2010a). For the most part, however, the 5′ end of both genes are are relatively conserved, whereas other parts seem to evolve more freely. In the case of ycf1, this might be due to the co-localization of a replication origin (oriB) in this region (Kunnimalaiyaan and Nielsen 1997). This implies that both genes seem to undergo at least weak selective constraints. Analyses regarding differences in dn/ds ratios and mutational hotspots within the genic region might corroborate the assignment of a function to both these genes. The losses observed in several photosynthetic lineages, however, raise the question whether they really carry out essential functions in all plants. Complete loss of both ycf1 and ycf2 from the plastomes of some (but not all) derived monocot lineages and putative pseudogenization in other plants (Downie et al. 1994) are in contrast to the high structural conservation in parasites (dePamphilis and Palmer 1990; Wolfe et al. 1992; McNeal et al. 2007). This might in fact point towards a function decoupled from photosynthesis. Nuclear encoded and plastid targeted proteins similar to Ycf1/Ycf2 were not found in lineages where both genes have been lost from the plastid genome, such as Poaceae (Downie et al. 1994).

Conclusions

In terms of structure, land plant plastid chromosomes evolve much more slowly than their mitochondrial or nuclear counterparts. This structural conservatism might be a result of the common organization of genes in operons that are conserved features between cyanobacteria, green algae and land plants. Other relevant factors include the mode of plastid transmission, the activity of highly effective repair mechanisms, as well as the rarity of plastid fusion and fission. The latter property is one of the major differences relative to mitochondrial genomes that have been shown to frequently fuse, and in doing so, provide opportunities for exchanging divergent genome copies. Most plastome rearrangements appear to be restricted to lineages that show one or more of the following characteristics: (i) aberrant behavior of the inverted repeat region (expansion, contraction, loss), (ii) biparental plastid transmission; (iii) a high frequency of small dispersed repeat sequences, (iv) heterotrophic lifestyle (parasites, myco-heterotrophs). Among land plants, angiosperms show the greatest variation in plastome structure, although distortion of gene synteny by rearrangements and gene loss is still rare compared to the genomes of other cell compartments. Interestingly, plastid chromosome restructuring appears to occur most commonly in the more derived clades of a given lineage (leptosporangiate ferns, Funariales within mosses, Pinaceae and Gnetophytes within gymnosperms, eudicots and Poales within angiosperms). It will be interesting to see whether similar patterns occur in liverwort plastome evolution. The gene content of land plants does not appear to have dramatically changed, and only few gene losses or putative functional transfers (chl, cys) might have taken place in the course of land plant evolution. The retention of photosynthetically relevant genes might be attributable to several factors. On the one hand, functional gene transfer is a complex issue since it involves the transfer itself and the evolution of transit peptides; thus, it is expected to be rare. On the other hand, most protein subunits encoded by the plastome (in particular photosynthesis relevant proteins) harbor trans-membrane proteins, and might therefore be difficult to import (as known from mitochondria). Finally, many gene products are required at high expression levels and at early developmental stages (e.g. translation/transcription apparatus, photosynthesis genes) and their retention might be selected for.