With the completion of the genome sequences of yeast, human and Arabidopsis, which contain approximately 6,000, 35,000 and 28,000 genes, respectively [13], the world's attention is now shifting to elucidation of gene function, and major proteomic studies are currently under way on a variety of organisms [46]. As a step towards assembling a list of the total complement of proteins in any one cell type (its proteome), proteomic studies of subcellular compartments and organelles have become a major focus, because smaller and more manageable subsets of proteins are involved. Given that compartmentation is a hallmark of the eukaryotic cell, and because the functions of organelles are biochemically well defined, such studies have an immediate functional impact, in contrast to the relatively limited insights that can be gained from the complete, unstructured cell proteome.

Mitochondria are attractive targets for subcellular proteomics because they play vital roles in energy production, anabolic and catabolic metabolism and in programmed cell death pathways, they can be purified readily from model organisms, and defects in mitochondrial proteins can have dramatic effects on the functions of cells and organs. Defining mitochondrial proteomes in a number of model organisms across the divisions of eukaryotes facilitates cross-species comparisons, thus greatly aiding validation of conclusions from each species and providing insights into both function and evolution [5].

The recent identification of 615 proteins from the mitochondrial proteome of the human heart [7] represents the first comprehensive analysis of a mitochondrial proteome and the highest number of proteins identified to date from any subcellular compartment. This is likely to change soon, as concerted efforts towards defining other subcellular proteomes are currently in progress [6, 8]. We now have glimpses of the mitochondrial proteomes from the yeast Saccharomyces cerevisiae and Arabidopsis, as well as humans (Table 1), although these are far from complete. Various approaches have predicted that approximately 10% of the coding capacity of the nuclear genome is devoted to proteins destined for the mitochondrion [911]. For yeast, predictions of the total number of proteins in a mitochondrion, made using a combination of sequence homology and gene tagging or knockouts, vary between 423 and 630 proteins, which is close to the number predicted by a variety of bioinformatic analyses of protein targeting [911]. Direct protein sequencing using mass spectrometry has so far yielded only 179 mitochondrial proteins, however, and gene-tagging and knockout analysis have given 332 and 466 proteins, respectively [12, 13]. Thus, even in yeast, the experimentally confirmed proteome is less than 50% complete, according to current predictions. In plants, the experimentally determined set so far contains only 135 mitochondrial proteins for Arabidopsis [14, 15] and 136 for rice [16]; these numbers are significantly lower than the 10% of the nuclear genome that is predicted by bioinformatic approaches to encode mitochondrial proteins [3, 9]. Even the 615 proteins directly identified in human mitochondria represent only about 25-35% of the proteins predicted to be mitochondrial by targeting analyses and by extrapolations from yeast studies [8, 9]. In reality, the true number of mitochondrial proteins will probably lie somewhere between the current experimentally determined numbers and the predictions.

Table 1 Predicted and experimentally determined numbers of proteins present in mitochondria

Sorting the identified sets of proteins (either predicted or known) by their functions reveals both expected and unexpected outcomes (Figure 1). Such comparisons vary slightly depending on the lists used, but those shown here are based on the functional analyses reported for Arabidopsis [14, 15], human [7] and yeast [17]. The yeast protein set is derived from both genetic and mass-spectrometric data, whereas the human and Arabidopsis sets are derived only from mass spectrometry; this means that more low-abundance DNA-, RNA- and protein-synthesis components have been identified in yeast than in the other two species.

Figure 1
figure 1

Functional classification of the proteins from the experimentally determined proteomes of yeast, Arabidopsis and human. (Ox phos, oxidative phosphorylation; TCA, tricarboxylic acid cycle).

As expected, the predominant mitochondrial proteins found are oxidative-phosphorylation complexes, enzymes of the tricarboxylic acid cycle, components of the protein-import and protein-synthesis machinery, and transport proteins; these represent one third to one half of the identified sets in each species. The large number of proteins of unknown function (10-20%) and the large number of enzymes of the carbohydrate, amino-acid and lipid metabolism pathways have come as more of a surprise, however. In particular, the presence of glycolytic enzymes in purified mitochondrial preparations, and the diverse kinds of predicted signaling components such as kinases and receptors, were largely unexpected, as their presence in mitochondria has not been documented in earlier studies. These findings need further substantiation, and this has become an area of active research, as has the search for protein-protein associations within the proteome [7, 8, 1820]. The absence of some proteins is also perplexing. For example, despite the presence of many genes from the mitochondrial carrier superfamily in all of the genomes so far examined, only a handful of carrier proteins have been experimentally identified in mitochondria to date [7, 18].

Mitochondrial proteomes also need to be defined in terms of their evolutionary origins. Mitochondria almost certainly evolved from an α-proteobacterium that was engulfed by an early eukaryotic cell and entered into symbiosis with it. Surprisingly, conservative estimates indicate that, in yeast, only 25-50% of mitochondrial proteins can be identified as most closely related to α-proteobacterial proteins [21, 22]. This suggests that approaches to defining subcellular proteomes that rely on homology to prokaryotic 'ancestors' are useful but have limitations. Divergence of the mitochondrial proteomes between different major eukaryotic lineages may mean that, even in identical pathways, components in one organism may have different phylogenetic origins from the equivalent components in another [21]. A glimpse of this is seen with the mitochondrial ribosome of Arabidopsis, which has proteins from three distinct genetic origins: the mitochondrion, the plastid and the nucleus of the host eukaryotic cell [23].

It is evident that mitochondrial proteomes have undergone expansion in function during evolution, in addition to the loss of bacterial metabolic pathways such as glycolysis [21]. The evolutionary expansion of mitochondrial proteomes means that proteins of eukaryotic origin are also represented in the mitochondrial proteome, complicating comparisons with α-proteobacterial ancestors [24]. In plants the situation is further complicated by proteins of cyanobacterial origin, presumably gained from chloroplasts via gene transfer from the plastid to the nucleus and subsequent duplication and re-targeting to mitochondria [23]. It has been observed that proteins derived from α-proteobacteria that are found in mitochondria but encoded in the nucleus appear to be preferentially synthesized on ribosomes attached to the mitochondria [25]; this may provide an experimental avenue for investigating the different genetic origins of mitochondrial proteins.

From an evolutionary point of view, it is tempting to estimate the numbers of mitochondrial proteins by comparison with modern-day obligate intracellular parasites, such as Rickettsia prowazekii, which contains 834 proteins [26]. Many common functions found in mitochondria, such as amino-acid biosynthetic pathways, are absent from these parasites, however. Obligate intracellular parasites provide examples of genome reduction, and the mitochondrial ancestor almost certainly had a larger genome and protein-coding capability than Rickettsia.

Defining the complete mitochondrial proteome will require a variety of experimental approaches, including the direct proteomic-identification and protein-tagging strategies that are presently underway [6]. Defining a static mitochondrial proteome will certainly be an achievement, but this is only the beginning. Determining how the proteome changes under certain conditions, such as during oxidative stress [27, 28], between tissues and through development, will use this basic set of proteins as a platform. Identifying new functions and interactions of proteins, and of signal-transduction pathways, will require knockouts, overexpression experiments and analysis of the phosphorylated components of the proteome [5]. Finally, comparative mitochondrial proteomics between organisms will give insights into how proteins have diverged in function through evolution and may well help answer the still vexing question of the ancestral origins of the eukaryotic cell.