The sequencing of both organelle and nuclear genomes from phylogenetically diverse species will help us to infer how these genomes have evolved and the forces that have shaped them. Recent findings of high rates of transfer of organelle DNA to the nucleus [1], and of high rates of functional gene transfer from organelles to the nucleus [25], demonstrate that the endosymbiotic origin of organelles was a major determinant in defining eukaryotic nuclear genomes and was probably a defining event for the formation of the eukaryotic cell [1, 6]. Clearly there is an evolutionary pressure to centralize genetic information in the nucleus, but the forces behind this transfer are not obvious. Muller's ratchet - the unidirectional process of building up mutations in an asexually reproducing population - is one commonly suggested hypothesis to account for this centralization, but is limited in its ability to explain more 'recent' gene transfer events (reviewed in [5]). But despite a wealth of information, it is still not clear from genome sequencing why some genes remain encoded in organelles such as mitochondria and chloroplasts.

A recent study in yeast [7] indicated that an astonishing 25% of the mitochondrial proteome (around 185 proteins) is required for the maintenance and expression of the eight polypeptides encoded by the mitochondrial genome. Analysis of the Arabidopsis mitochondrial and chloroplast proteomes indicates that a similar amount of cellular effort is required to maintain and express organelle genomes in plants [8, 9]. In this article we address the perplexing question of why some genes in the small organelle genomes have been maintained when the majority have been relocated to the nucleus. Figure 1 shows the steps needed for a gene to transfer from the nucleus to the mitochondrion. Historical arguments to explain the retention of a core set of organellar genes fall into two broad categories: either the genes have been 'trapped' in the organelle, or they have been 'preferentially maintained' there. We discuss the merits of each argument.

Figure 1
figure 1

The steps required for a gene to be transferred from an organelle to the nucleus. (a) The gene must be transferred from the organelle, either as a fragment of organellar DNA or as a cDNA, and (b) integrated into a nuclear chromosome. (c) The gene must then acquire the signals for expression, including promoter, terminator, and polyadenylation signals, and also a signal to target the protein back to the organelle. These events may occur together or separately. (d) The expressed gene may be translated on free polysomes to produce a protein that is targeted to mitochondria, or alternatively the mRNA may be targeted to mitochondria to be translated on the surface. (e) The targeting signal must be removed and (f) the protein has to be assembled in order for it to function. Assembly may require re-sorting to the correct location within the organelle and additional processing of sorting signals.

Too hot to handle - have organellar genes been 'trapped'?

The idea that some genes have been trapped in organellar genomes stems largely from the idea that the proteins encoded by these genes are difficult to transport back to the organelle for assembly when synthesized in the cytosol. Intrinsic to this idea is that there has been a hierarchical loss of organellar genes, whereby those that were first to be successfully relocated were those encoding proteins that are easiest to transport back, while those that were last to be transferred encode proteins that are difficult to transport back. Many bacterial proteins have or are predicted to have mitochondrial targeting properties, or can be targeted to mitochondria without the acquisition of a targeting presequence, and these are predicted to be the first organellar genes to be successfully relocated to the nucleus [10, 11]. As there seem to be no limitations on the transfer of genetic material, organellar gene loss should, according to this theory, have continued until the cell solved the targeting and assembly problems for all proteins, and organellar genomes would then no longer exist [12]. This may yet occur in plants, where transfer of organellar genes to the nucleus continues to erode the organelle genomes [4].

Why, then, has the transfer of genes to the nucleus not gone to completion? The difference in genetic code between mitochondria and the nucleus in eukaryotic organisms other than plants is one plausible explanation for the apparent 'freeze' on gene relocation [13]. Regardless of the reason, organellar genomes linger, and they are enriched in genes that encode hydrophobic, membrane-embedded proteins that are difficult to transfer from cytosol to organelle. Notably, even the reduced plastid genome of dinoflagellates is enriched in genes encoding hydrophobic proteins [14]. These initial observations instigated the 'hydrophobicity hypothesis', which was first proposed by von Heijne in 1986 [15] and was later expanded by others [16, 17] (see Box 1). Since then there has been a steady accumulation of both bioinformatic observation and experimental evidence suggesting that hydrophobic regions in proteins are a major obstacle to both targeting and import, and that proteins must overcome this obstacle if their genes are to relocate.

figure 2

Box 1

Following the original hydrophobicity hypothesis [15], several studies have shown that targeting to mitochondria and the endoplasmic reticulum are competing pathways and that subcellular location is determined by a combination of the length of the transmembrane region, the degree of hydrophobicity and the number of positive residues flanking the transmembrane region [18, 19]. Specifically, it was concluded that moderate transmembrane region length and charge distribution resulted in mitochondrial targeting for some proteins, whereas increasing the length of the transmembrane region resulted in mis-localization to other membrane systems [20, 21]. Although these proteins were not organelle-encoded, the findings demonstrate that hydrophobic transmembrane regions can cause mis-targeting of cytosolically synthesized proteins.

Evidence that organelles are unable to import certain hydrophobic proteins has also accumulated since the initial observations that there was a limit on the number of transmembrane regions that could be imported [17]. Direct experimental evidence indicates that a reduction in hydrophobicity was essential for the rare transfer event that occurred for the cytochrome c oxidase subunit 2 (Cox2) gene in legumes [22]. Also, the other rare gene transfer events of Cox2, Cox3 and ATP6 from the mitochondrion in green algae have been accompanied by a reduction in hydrophobicity of the encoded protein [2325]. These events highlight the fact that there are hydrophobicity limits on import, and that many mitochondrially encoded proteins lie naturally outside this limit. But they also indicate that in those organisms for which the location of the gene has not been 'frozen' by a change in genetic code, organellar genomes will continue to be eroded.

Another observation that suggests that gene location is affected by hydrophobicity is the finding that cytochrome f and subunit IV of the cytochrome bf complex of Euglena gracilis are encoded in the nucleus [26]. Euglena is somewhat unusual as it has three chloroplast-envelope membranes because an additional endosymbiosis has taken place and thus the outer envelope membrane - the perichloroplast membrane - is closely related to the endoplasmic membrane. As well as the decrease in hydrophobicity of the cytochrome f and subunit IV polypeptides in relation to their chloroplast-encoded counterparts, it is tempting to speculate that this transfer was only feasible because of the additional outer membrane, as proteins destined for the chloroplast in Euglena are first targeted co-translationally to the endoplasmic reticulum and subsequently sorted to the chloroplast (see below) [27].

The solutions that nature has found for overcoming the hydrophobicity problem associated with relocating some genes have been both original and instructive. Similar efforts by researchers to express organellar genes allotopically have proven difficult and also give credit to the hydrophobicity hypothesis. Although the coding location could be experimentally moved from the mitochondrion to the nucleus for ATP6 and ATP8 [28, 29], this could not be achieved for apocytochrome b or ND4 [29]. Additionally, overexpression resulted in depolarization of the mitochondrial membrane potential. Thus these proteins seem to have a toxic effect on cells when expressed in the cytosol, and this may be linked to their hydrophobic nature [29].

Hard-wired - have organellar genes been 'preferentially maintained'?

The preferential 'maintenance' of a core set of organellar genes is encompassed by the CORR theory (co-location for redox regulation; see Box 1), which was first proposed in 1992 [30, 31]. This hypothesis proposes that there is a direct link between coding location and regulation, either transcriptional or post-transcriptional, which gives a fitness advantage compared with nuclear-encoded genes. In this way, expression of a gene within the organelle gives an advantage, thus preventing transfer of the gene to the nucleus. An example often quoted to support the hypothesis is that if more subunits of a protein complex are needed in a particular chloroplast, for example the DI subunit of photosystem II, which might be required because of photo-oxidative damage, it is more efficient to have the gene encoded within the organelle, as nuclear encoding would mean that the protein would be sent even to those chloroplasts that did not require this subunit [32]. Although rich in predictions, there is no direct experimental evidence for CORR (for mitochondria) [5], in contrast to several experimental investigations that support the validity of the hydrophobicity hypothesis [5, 1626, 28, 29].

The flaws of each hypothesis

On the surface, the hydrophobicity hypothesis does not appear to adequately explain the retention of all organelle-encoded genes. Notable flaws are, firstly, that not all protein-coding genes encoded in organelles encode hydrophobic proteins, the most obvious example being the large subunit of ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco-LSU) in chloroplasts; and secondly, both mitochondria and chloroplast already import hydrophobic proteins. The mitochondrial carrier family and the light-harvesting protein of the light-harvesting chlorophyll-protein complex are cited examples for mitochondria and chloroplasts, respectively.

Similarly, the CORR hypothesis also has some deficiencies. Firstly, redox control has as yet been demonstrated for only a handful of plastid-encoded genes. Secondly, the expression of many nuclear-encoded mitochondrial and chloroplast proteins is under redox control and yet these proteins are not organelle-encoded [33, 34]; thus, why only some redox-controlled genes must be organelle-encoded is not explained by CORR. And thirdly, even for the redox-regulated components encoded by chloroplasts, the products are functional only when combined with additional nuclear-encoded subunits; thus, being organelle-encoded does not offer any immediate advantage in terms of protein function.

Thus it might be possible that there are a variety of reasons why genes are organellar and the reason for each gene might differ, or even be a combination of a number of different factors. It is worth examining the exceptions to each hypothesis to see whether there is evidence to validate or invalidate the objection. Also, there is a need to question how being chloroplast-encoded and under redox control is an advantage in evolutionary terms.

Exceptions to the rule

The hydrophobicity hypothesis centers on the problem of targeting and importing a protein following its synthesis in the cytosol. Although hydrophobic proteins clearly present targeting problems, many organellar genes do not encode hydrophobic proteins. If we look at Rubisco-LSU, as it is the obvious and cited counterexample to the hydrophobicity hypothesis, can it be synthesized in the cytosol and imported to produce a functional protein in chloroplasts? The answer is yes, as this has been successfully achieved in some dinoflagellates [14]. In other plants where Rubisco-LSU is normally plastid-encoded the reported attempts to express the gene allotopically have been successful qualitatively but not quantitatively. Although it is possible to express and import the protein, only a small proportion of the wild-type activity can be achieved [35]. One reason for this failure might relate to the assembly of the complex. As the holoenzyme of Rubisco is the most abundant protein in a cell, efficient assembly is critical, and specific chaperone systems are involved in this process (for review, see [36]). Thus, although it is possible to import the protein, the limitations of assembly - that is, reduced assembly efficiency - may reduce fitness and thus successful gene transfer. An unfolded protein response, a stress-induced pathway, has been described for mitochondria and thus, in addition to the fact that inefficient assembly may reduce fitness, unfolded or unassembled proteins may be degraded and/or may induce stress pathways, as has been demonstrated in mitochondria [36, 37]. It should be noted that as the small subunit of Rubisco is nuclear-encoded, the reason for the failure to express Rubisco-LSU adequately from a nuclear location is unlikely to be due to a gene dosage effect of plastid genome versus nuclear genome.

This assembly concept could be extended further to explain the organellar coding location of other proteins that are not encompassed by the hydrophobicity hypothesis. One common feature of almost all organelle-encoded genes is that the products are assembled into multisubunit complexes that contain at least one other protein - for example the Rubisco holoenzyme - but usually many others, as is evident for the electron-carrying components of the photosynthetic and respiratory chains. Studies on how such complexes assemble indicate that there are ordered sequential assembly pathways. The order of assembly is critical to producing a functional complex, and importantly organelles have specific protease and chaperone systems for degrading proteins that have not assembled correctly [38, 39].

The extensively studied photosystem II from chloroplasts indicates "a hierarchy in the protein components that allows a stepwise building of the complex" [40]. An excellent example is the DI protein, which is encoded by the chloroplast gene psbA. This protein is inserted into the thylakoid membrane in a co-translational manner with the aid of the chloroplast signal-recognition particle, and requires the presence of several other subunits of photosystem II [40, 41]. Studies of the unicellular green alga Chlamydomonas reinhardtii indicate that specific sequences in the 5'-untranslated region of the mRNA bind specific proteins that might define thylakoid membrane targeting [42]. Allotopic expression of genetically altered psbA resistant to herbicide demonstrated that the protein product could be imported into chloroplasts but plants were still sensitive to herbicide (albeit less than wild-type plants) [43], possibly as a result of ineffective or inefficient assembly of the cytosolically synthesized protein. This example indicates that assembly may define an organelle-encoded location. An excellent review containing more details of this process is available [44].

The second apparent 'flaw' in the hydrophobicity hypothesis is that both mitochondria and chloroplast import many hydrophobic proteins. Why then should some hydrophobic proteins be resistant to this process? The answer may lie in the protein itself; many nuclear-encoded mitochondrial proteins are imported across the organellar membranes to the matrix, and then rerouted via conserved sorting pathways. This import pathway requires that all but the last transmembrane stretch must pass through the import machinery. If, however, a transmembrane stretch is recognized as a 'stop-transfer' sequence, the import process stops [22], the offending stretch of amino acids is moved laterally into the membrane, and the protein is unable to fold to its active conformation [45]. Clearly, all nuclear-encoded organellar proteins have evolved so that their transmembrane stretches do not resemble stop-transfer sequences, enabling polytopic proteins to be easily imported and assembled. Thus it is the subtle signals contained in a transmembrane stretch that can prevent import, something we are not yet able to predict from gene sequence alone.

The often-cited examples of mitochondria and chloroplasts importing hydrophobic proteins do not contradict the principle that assembly can define organelle-coding location. Members of the mitochondrial carrier family, present in the inner membrane, may be hydrophobic but function in homodimeric complexes; thus there is no sequential assembly required [46]. The hydrophobic light-harvesting chlorophyll proteins were derived from simpler forms in cyanobacteria that are single-membrane-spanning [47, 48]. These two sets of proteins represent 'eukaryotic' proteins and thus import and assembly pathways were invented de novo by eukaryotic cells. But, when multisubunit protein complexes were derived from the endosymbiotic ancestor, the assembly pathways were dictated.

One, two or more reasons not to move?

As outlined above, there is compelling experimental evidence that the targeting and import of some proteins might be the major determinant for their organelle-coding location. But not all organelle-encoded proteins pose targeting and import problems, and it is becoming increasing clear that assembly should be added to the list of difficulties. We therefore feel that the term 'importability' better encompasses the difficulties experienced by some proteins when expressed in the cytosol, and therefore the retention of organellar genomes. The importability concept does not ignore the observations of the CORR hypothesis: rather, the elegant redox regulation of some chloroplast-encoded genes [34] may be a mechanism for ensuring that these organelle-encoded subunits are synthesized in the correct sequential manner, so as to ensure correct assembly. Redox regulation may have specialized to a stage at which it facilitates the ordered assembly of multisubunit complexes and may now represent a barrier to gene relocation.

The importability hypothesis also encompasses the proposal that some products may be toxic if synthesized in a cytosolic location, as has been demonstrated for the apocytochrome b and ND4 proteins [29]. An important point to note is that even if, under experimental conditions, allotopic expression of some organelle genes can be achieved and can rescue mutant phenotypes, in evolutionary terms it is the efficiency of import and assembly that can be a selective factor. Thus, reducing the growth rate by achieving allotopic expression may reduce fitness and result in an organellar location for a gene even though a nuclear location can be achieved in the laboratory.

Importability may not be sufficient to explain an organellar coding location for all genes in all organisms. There might be alternative reason(s) why some genes are retained in organelles. The evolution of organellar and nuclear genomes must be a complementary process in cells. Effective cross-talk takes place between nuclear and organellar genomes to coordinate function, as is evident with retrograde regulation for nuclear-encoded genes for mitochondrial and chloroplast proteins [4951]. There is also evidence for an additional form of regulation, termed intergenomic communication [52, 53]; this is based on the physical presence and expression of a gene within an organelle genome independent of the function of the encoded protein [54]. Furthermore, it appears that mutations in the mitochondrial genome in yeast increase the rate of nuclear mutation [55]. Mutations in mitochondrial genomes cause defects or alterations in development in mammalian and plant systems [56, 57]. Thus, perhaps an additional reason that genes are encoded in organelles is that some genes must be encoded there in order for expression of organelle and nuclear genomes to be coordinated. Genes encoding protein products that present additional barriers for successful gene transfer will also most often be observed in organelle genomes. With all the genome information that is now available, care needs to be taken to look at genomes rather than focusing solely on individual genes. This approach may yield insights that would not be possible with single-gene analysis and may provide more inclusive hypotheses for explaining organelle genome maintenance.