Functional genomics and proteomics

Yeast is widely used as a eukaryotic model system to study protein function because of its relative simplicity and the availability of powerful genetic tools. The completion of the genome sequence of the yeast Saccharomyces cerevisiae in 1996 [1] allowed researchers to analyze a eukaryotic organism on a genomic scale for the first time. This has greatly accelerated the development of technologies for performing large-scale proteomic and functional-genomic studies. Many of the initial studies in yeast were focused on the use of DNA microarray chips to measure expression profiles of large sets of genes in mutant strains or under varying growth conditions [2], but recent studies have mainly focused on large-scale proteomic experiments, including genome-wide two-hybrid protein-protein interaction screens [35], high-throughput affinity-purification of protein complexes [6, 7], large-scale protein localization experiments [8] and even proteome chips [9]. Another recent study examined the growth phenotypes of yeast strains with gene deletions; approximately 96% of the annotated open reading frames (ORFs) were covered by this deletion collection [10]. Most recently, Peng et al. [11] have used a plethora of mutant yeast strains and microarray technology to screen for proteins involved in the synthesis and processing of ribosomal and other non-coding RNAs.

Synthesis and processing of rRNA and small non-coding RNAs

Strikingly, over 95% of the nucleic acid in yeast cells is non-coding RNA [12]. Most of these RNAs are ribosomal RNAs (mostly cytoplasmic rRNAs but including some mitochondrial rRNAs); indeed, a large portion of the cell's energy is devoted to the synthesis of ribosomes and rRNA, a process that requires hundreds of trans-acting factors [13].

Ribosome biogenesis takes place in a subnuclear cellular compartment, the nucleolus. Here, three of the four rRNAs are transcribed by RNA polymerase I as a single precursor or pre-rRNA. The nascent pre-rRNA is processed in a series of cleavage reactions to produce the mature 18S, 5.8S and 25S-28S rRNAs. Interestingly, processing of the nascent pre-rRNA in yeast has recently been shown to require the assembly of a pre-rRNA ribonucleoprotein (RNP) complex (the small subunit (SSU) processome, or 90S complex) that is about the size of a ribosome itself [1416], underscoring the complexity of ribosome biogenesis. Using affinity-tag purification procedures several laboratories have isolated a number of other large pre-rRNA RNP complexes [6, 7, 1420]. Also, an organelle-scale proteomic analysis of the human nucleolus has revealed the human homologs of many of these proteins as well as new ones [21]. In general, much remains to be discovered about the exact function of the proteins involved in ribosome biogenesis in the nucleolus. Moreover, the precise mechanism by which the endonucleolytic steps in pre-rRNA processing occur is not yet clear. It is not even known, in most cases, whether cleavage involves the activity of (as yet unidentified) endonucleases.

Apart from rRNAs the other non-coding RNAs comprise a long list of abundant, small RNAs, including small nucleolar RNAs (snoRNAs), small nuclear RNAs (snRNAs), transfer RNAs (tRNAs), telomerase RNA, signal-recognition-particle RNAs and the RNA components of the RNase P and RNase MRP endonucleases. Most snoRNAs are involved in cotran-scriptional chemical modification of pre-rRNA, particularly 2'-O-ribose methylation (in the case of 'box-C/D' snoRNAs) and base pseudouridylation (for 'box-H/ACA' snoRNAs; reviewed in [22]). The snRNAs are probably the catalysts for pre-mRNA splicing, and their association with each other and the pre-mRNA leads to the formation of the spliceosome [23]. As is the case for rRNAs, the mechanism by which many small non-coding RNAs are matured is not yet completely understood. Interestingly, it appears that several components of the machinery responsible for the cleavage and polyadenylation of mRNAs are also involved in the maturation of snRNAs and snoRNAs [2426]. This is one of many examples of the way in which processing machineries are shared by different biogenesis pathways for non-coding RNA.

Using microarrays to probe the yeast RNA-processing proteome

Comparative bioinformatic analyses [27, 28] of protein-interaction data from several studies has revealed hundreds of uncharacterized protein-coding genes that are predicted to have a role in RNA processing and/or RNP biogenesis; many of these have not been detected or validated in large-scale proteomic studies. To test these predictions experimentally, Peng and colleagues [11] set out to measure defects in the biogenesis of non-coding RNA using oligonucleotide microarrays. The microarrays contained 212 different oligonucleotides that recognized unprocessed mRNAs and partially processed and mature products of a wide array of non-coding RNA species. These arrays were hybridized to steady-state RNA harvested from a set of strains, from each of which a protein was depleted or otherwise mutated. The mutant strains tested were chosen from the yeast deletion collection [10], from mutant strains previously collected by others, or constructed by the authors [11] using the tetO 7 system, which allows regulation of the protein of interest by tetracycline. The microarray showed which particular RNAs were depleted or overrepresented in each strain; strains with aberrant patterns were taken to have mutations in a gene involved in RNA biogenesis. To their credit, the authors sought to validate their microarray findings individually by northern blotting, greatly strengthening their conclusions.

The authors used a variety of sources to choose which candidate ORFs to test for in the mutant strains using their new methodology. A total of 413 ORFs (making up 7% of the yeast genome) had been previously characterized as having a role in non-coding-RNA biogenesis (Table 1). From comparative analyses of other genome-wide studies (such as [48, 21]) the authors [11] then predicted an additional 919 ORFs to be involved in non-coding RNA biogenesis (to bring the total to 1,332 ORFs). Of the 919 additional ORFs implicated in non-coding RNA biogenesis, 578 were annotated in the databases as 'biological process unknown' and 341 were annotated with unrelated functions (see Table 1). A higher proportion than expected of the 413 previously characterized ORFs was encoded by essential genes (253/413 or 61%, and these represent nearly one quarter of all the essential genes in the whole genome; Table 1). Of the 1,332 ORFs implicated in non-coding-RNA biogenesis, 39% were encoded by essential genes (Table 1), again higher than a random sampling of the yeast genome would predict.

Table 1 Generating ORFs to test for their involvement in non-coding-RNA metabolism

Of the pool of proteins implicated in non-coding-RNA biogenesis, 468 were selected (of which 41% are essential) and the effects of their deletion or conditional depletion were analyzed by microarray (Table 2). These included 169 strains in which the proteins could be conditionally depleted (using the tetO 7 system; 36% of the tested proteins). From the microarray results, a computational classification technique was used to generate a score in the range of 1-5 for each protein; a score of 5 was considered 'positive' (that is, the protein functions in the processing of non-coding RNA). Surprisingly, using this classification system only 53% the proteins known to be involved in non-coding RNA processing, 74% of the proteins known to be involved in ribosome biogenesis, and 36% of the proteins involved in snRNA/snoRNA/mRNA biogenesis were considered positive (Table 2). This is probably due to the fact that very stringent criteria were used to designate a positive; perusal of the supplementary data to the article [11] suggests that many with lower scores are indeed true positives. Investigation of the proteins not previously implicated in non-coding-RNA biogenesis revealed that 32% of the ORFs annotated as 'biological process unknown' were positive, as were 21% of the ORFs annotated as having unrelated functions (Table 2).

Table 2 Numerical overview of the microarray results from Peng et al. [11]

Uncovering new proteins required for RNA maturation and ribosome biogenesis

The results presented by Peng et al. [11] clearly prove the usefulness of their methodology in assigning function to proteins required for ribosome biogenesis. Unexpectedly, 20 ORFs annotated in the databases as 'biological process unknown' appeared to be involved in pre-rRNA processing but their mutant strains did not show a recognizable alteration in the pattern of RNA-processing defects on the microarray. Unfortunately, most of the processing defects for this subset of mutants were not investigated in more detail. As the authors have themselves stated [11], these proteins are very attractive candidates for further study.

Notably, many proteins that were annotated with functions in unrelated cellular processes appeared to (also) have a primary role in RNA biogenesis (21% of the 'unrelated' class; Table 2). One example is YOR145C, otherwise referred to as Pno1p. This protein had previously been shown to be required for biogenesis of the yeast proteasome [29], but both the microarray and the subsequent northern blot analysis of pre-RNA intermediates [11] strongly suggest a role in 18S rRNA synthesis. A second example is Lrp1p, which was previously described to be involved in non-homologous DNA end-joining [30]. Peng et al. [11] have shown that it is required for correct processing of the 5.8S rRNA and that it is a component of the yeast exosome complex, a protein complex that is involved in 3'-end trimming of many RNA species and involved in mRNA degradation ([31] and references therein).

One of the problems encountered by the authors [11] was that alterations in the processing of low-abundance non-coding RNAs (such as many snoRNAs and snRNAs) were difficult to detect with their methods. Indeed, only about 36% of the proteins already known to be involved in the biogenesis of tRNA, snoRNA or snRNA were classified as positive in their screen (Table 2). The analyses did identify Bcd1p, a protein that is essential for stable accumulation of box-C/D-type snoRNAs, however. In vivo depletion of Bcd1p resulted in a dramatic reduction of box-C/D snoRNA steady-state levels, while box-H/ACA snoRNA levels appeared to be unaffected [11]. Thus, Bcd1p is likely to be involved in the biogenesis of box-C/D snoRNAs; it thus has a function similar to Naf1p, which is required for stable accumulation of box-H/ACA snoRNAs [32]. More detailed studies on Bcd1p will probably provide significant new insights into box-C/D snoRNA maturation.

Surprisingly, the methodology [11] was sufficiently sensitive to detect nucleotide modifications in pre-tRNAs. Deletion of the non-essential tRNA dihydrouridine synthetase Dus1p resulted in increased hybridization of oligonucleotides to the 5' ends of tRNA, which was shown to be due to an increase in hybridization of the same amount of tRNA, rather than increased levels of the tRNA. This increased microarray hybridization signal correlated with a lack of covalent uridine modifications in a dus1 deletion strain. This result represents the first time that covalent modifications have been detected in a microarray experiment.

The various genome-wide proteomic and functional genomic studies to date have provided a large amount of information that has allowed researchers to envisage connections between many protein and pathways. Peng et al. [11] have now developed some innovative tools to test predictions of protein function in non-coding-RNA biogenesis on a proteomic scale. There are now many new proteins to be analyzed and functions to be assigned.