Background

The proteome of a eukaryotic cell is largely organized as a collection of multi-subunit protein complexes [14]. These complexes are defined empirically by the stable association of their subunits during biochemical purification [3, 4] and act as molecular machines [5] or information processing modules [6] in cellular networks. For example some of the many integrated complexes required for gene expression include the RNA polymerase complexes, chromatin remodeling complexes, RNA processing complexes such as the spliceosome, exosome and decapping complex, the ribosome, and the proteosome [7].

In this paper we address the question of whether there are any general principles concerning how the activity of protein complexes respond to changes in the expression of their subunits. Available global data in yeast show that reducing the expression of any subunit of a protein complex normally produces the same change in phenotype [8]. However we show here that this is not true for changes in phenotype resulting from increases in the expression of subunits, and this applies to both core and peripheral subunits of complexes. We propose the principle that the overall activity of a protein complex is normally robust to an increase, but not to a decrease in the expression of its subunits. We highlight some of the implications of this principle for understanding the regulation and evolution of biological systems.

Results

Genes that reduce fitness when under- but not over-expressed are enriched amongst protein complexes

Most essential functions of the eukaryotic cell are performed by multi-subunit protein complexes. As previously shown [8], genes with essential functions are enriched amongst the subunits of multi-protein complexes (Figure 1). This is also true for haploinsufficient genes (i.e. genes that reduce fitness when their dosage is reduced by half in heterozygotes [9]) and for genes that cause slow growth when they are deleted [10] (Figure 1). Thus inhibiting the expression of a subunit of a protein complex is very likely to disrupt the function of that complex. However genes that slow growth when they are over-expressed [11] (referred to here as genes with over-expression phenotypes) are not enriched amongst the subunits of protein complexes (Figure 1). This lack of enrichment could reflect the fact that many protein complexes are not essential for normal growth and therefore perturbing their function will not result in a visible phenotype. However, we find that genes that reduce fitness when they are over-expressed are also not enriched amongst protein complexes that perform essential functions (Table 1), nor are they enriched amongst the subunits of protein complexes that are essential when deleted (Table 1). Thus in general over-expressing a subunit of an essential protein complex does not normally disturb its function.

Figure 1
figure 1

Genes with under- but not over-expression phenotypes are enriched amongst protein complexes. Essential genes, genes required for normal growth in rich media and haploinsufficient genes are all enriched amongst the subunits of protein complexes. In contrast genes with over-expression phenotypes are equally represented amongst protein complex subunits and other genes. The graph shows the percentage of genes found in MIPS protein complexes and the percentage of all other genes that have each phenotype. ** Chi square test p < 0.05 for difference between protein complex subunits and all genes.

Table 1 Protein complexes with essential functions are not enriched for subunits with over-expression phenotypes.

Genes with under- but not over-expression phenotypes cluster into individual protein complexes

Even if over-expressing a subunit of a protein complex does not in general disrupt the overall activity of the entire complex, it is still possible that a subset of protein complexes may be particularly sensitive to the over-expression of their subunits. To test this we investigated the distribution of genes with under-or over-expression phenotypes amongst complexes. For each phenotype we divided the protein complexes into ten evenly spaced bins according to the fraction of subunits associated with the phenotype. We then compared this distribution of phenotypes to that seen when the subunits are randomized amongst complexes.

As shown in Figure 2, genes with under-expression phenotypes (essential genes, haploinsufficient genes and genes required for normal growth) cluster into particular protein complexes. For example, 44 complexes have >90% essential subunits compared to 13 expected by chance, and for all phenotypes arising from decreased gene expression there are many more complexes with no genes having that phenotype than expected by chance. In contrast, for genes that reduce fitness when they are over-expressed, only two bins contain more complexes than expected by chance – one complex has 80–90% of tested subunits with an over-expression phenotype (compared to 0.01 expected, p = 0.006) and 5 complexes have >90% of tested subunits with an over-expression phenotype (1.54 expected, p = 0.02). Thus only a few complexes (~3/183) contain more subunits that are toxic when over-expressed than expected by chance. For the vast majority of complexes the distribution of genes with over-expression phenotypes is not different to that expected by chance.

Figure 2
figure 2

Genes with under- but not over-expression phenotypes cluster into individual protein complexes. Genes with essential functions (A), genes required for normal growth in rich media (B), and haploinsufficient genes (C) are arranged amongst protein complexes very differently to the random expectation. In contrast genes with over-expression phenotypes are arranged much more randomly (D). The graphs show the observed number of complexes in each of ten bins defined by the proportion of subunits having each phenotype. These are compared to the expected values (the mean of 100,000 randomisations). ** Bins significantly different from random at a 5% FDR (Benjamini-Hochberg method).

To further confirm this conclusion we asked whether any individual protein complexes contain more subunits with over-expression phenotypes than expected by chance. To do this we randomised the assignment of subunits to protein complexes and for each complex counted the number of times it had the same or more subunits with an over-expression phenotype than seen with the real data. There are 9 complexes with more genes with over-expression phenotypes than in 5% of randomisations, but none of these are significantly enriched for over-expression phenotypes after adjusting for multiple hypothesis testing (see Supplementary table 1 in Additional file 1, Benjamini-Hochberg false discovery rate, FDR = 5%). In contrast, there are 41 complexes with more essential genes than are seen in 5% of randomisations, and 17 of these complexes are still significantly enriched after adjusting for multiple hypothesis testing (see Supplementary table 2 in Additional file 1, FDR = 5%). Indeed the complex most enriched for genes with over-expression phenotypes is the nucleosome complex, and here the over-expression phenotype may be more related to the disruption of the precise temporal regulation of histone expression during the cell cycle [12] rather than disruption of protein complex formation per se. Indeed there is an overall enrichment for genes with over-expression phenotypes amongst cell cycle regulated genes (p = 0.037, Fisher's exact test).

Thus we conclude that for protein complexes performing essential functions, inhibiting the expression of any subunit of a protein complex is likely to reduce the overall activity of that complex. In contrast, over-expressing any individual subunit of a protein complex does not normally inhibit the overall activity of that protein complex. This conclusion most likely applies to the vast majority of protein complexes in a eukaryotic cell.

Neither core nor peripheral subunits of protein complexes are enriched for genes with over-expression phenotypes

Previously it has been suggested that subunits that form the structural core of a protein complex might be particularly sensitive to alterations in expression level [13, 14]. Therefore we tested whether subunits with under- or over-expression phenotypes are enriched amongst the core or peripheral/isoform-specific subunits of protein complexes. In a genome-wide study of protein complexes identified by tandem affinity purification, Gavin et al. identified a total of 491 complexes and classified their subunits as "core" – those present in most complex isoforms, "attachment" – those present only in some isoforms, and "modules" – two or more attachment proteins that tended to occur together in different complexes [3]. As shown in Figure 3, there is no difference between the percentage of genes with over-expression phenotypes in cores, modules, or attachments when compared with yeast genes in general. In contrast, subunits with essential or haploinsufficient phenotypes are significantly enriched among all three types of subunit (p < 0.0001, Fisher's exact test). The same result is seen when only considering genes that fall exclusively within each classification, except that haploinsufficient genes are only enriched amongst attachments (Figure 3).

Figure 3
figure 3

Genes sensitive to a reduction in expression level, but not to over-expression are enriched amongst both the core and peripheral subunits of protein complexes. Percentages of genes with essential, overexpression or haploinsufficient phenotypes among different structural components of protein complexes as defined by Gavin et al. (2006). The percentages of all genes are shown for comparison. Inset: schematic representation of the overlap between the datasets used. Only 21 genes are found exclusively in modules, so we did not test these as a separate category. ** Fisher's exact test p < 0.0001 for difference between genes with the particular phenotype and all genes without that phenotype.

We conclude that complexes are often sensitive to reduction of a subunit from any part of the complex, and that isoform-specific subunits are particularly sensitive to a partial reduction in the expression of a subunit. These isoform-specific subunits are likely to be regulatory subunits (i.e. limiting the overall activity of a complex) and so may be particularly sensitive to a reduction in expression. In contrast there is no evidence that complexes are sensitive to the over-expression of any particular structural subclass of subunit. Our findings also do not support the previous prediction that the core subunits of protein complexes will be particularly sensitive to over-expression [13, 14].

Discussion

A simple principle for the robustness of protein complex function and its implications for systems biology

In summary we have shown that in yeast reducing the expression of any individual subunit of a protein complex that performs an essential function under laboratory conditions is likely to disrupt the function of that complex. In contrast increasing the expression of any subunit generally has no effect on the overall activity of a complex. Both of these findings apply equally to core and isoform-specific subunits of protein complexes. Although the over-expression of some complex subunits does result in reduced growth, these phenotypes do not seem related to the disruption of the complex with the possible exception of a very small number of complexes (~3).

Therefore we propose the following principle concerning the robustness of protein complex function to alterations in gene expression (Figure 4): protein complex activity in eukaryotic cells is in general robust to an increase, but not to a decrease in the expression levels of individual subunits. This may reflect either an overall insensitivity of protein complex assembly and activity to the over-expression of subunits or that the cell encodes active mechanisms for degrading subunits produced in excess.

Figure 4
figure 4

A simple principle concerning the robustness of protein complex activity. The results presented here suggest that protein complex activity in eukaryotic cells is in general robust to an increase, but not to a decrease in the expression levels of individual subunits.

This principle contrasts with previous predictions [1315] and has several important implications for understanding the design principles and evolution of eukaryotic cells. Here we briefly highlight three implications of the principle: (1) the strategies a cell can use to regulate protein complex function, (2) the trajectories by which eukaryotes can evolve new proteins, and (3) how perturbations of gene expression in human disease can be connected to disease phenotypes.

First, according to the principle, reducing the expression of most subunits of a protein complex will down-regulate the activity of that complex. Therefore there are many alternative strategies available for reducing the activity of a protein complex by altering gene expression. This provides the cell with a very flexible and evolvable framework for regulating protein complex function. In contrast, to up-regulate the activity of a protein complex the cell must coordinately increase the expression levels of all of the subunits, unless the expression of a single subunit is limiting. Thus, in the absence of a limiting subunit [16], up-regulation of complex activity can be most easily achieved by up-regulating a trans-acting factor that regulates the expression of all of the subunits.

Second, the insensitivity of protein complex activity to the over-expression of subunits may have facilitated the evolution of novel protein complexes by gene duplication. Most protein complex subunits can probably be duplicated with little phenotypic effect, a situation that would not be true if over-expressing subunits more frequently disrupted the activity of complexes. Indeed such a mechanism of protein complex subunit duplication has been very important in the evolution of new complexes and protein functions [17].

Finally, the principle also has practical implications for understanding the etiology of genetic disease in humans. The results we present here suggest that if a subunit of a protein complex is over-expressed [18] or duplicated [19] in a human disease, then any connection with the disease phenotype is unlikely to be due to an overall reduction in the activity of that complex. Moreover, the fact that genes with over-expression phenotypes do not cluster into protein complexes means that over-expression phenotypes probably cannot be predicted using a comprehensive map of human protein complexes as is possible for loss-of-function phenotypes [2022]. More sophisticated methods therefore need to be developed to predict the consequences of increases in gene expression levels.

Methods

Datasets

769 genes that reduce fitness when they are over-expressed were identified by Sopko et al. who tested the phenotypes of 5280 strains each over-expressing a single yeast gene [11]. 1010 essential genes were downloaded from the MIPS database [23]. 184 haploinsufficient genes were identified in a genome-wide screen of heterozygous mutants grown in rich medium [9]. 614 genes required for normal growth in rich media were identified by Giaever et al.[10] As a high quality set of protein complexes we used the manually annotated set of MIPS protein complexes (downloaded from MIPS [23] on 14 March 2007, removing one redundantly listed complex, complex 510.190.10.20.10). A second set of systematically identified protein complexes was taken from the data of Gavin et al.[3] who classified subunits into cores (1148), modules (393) and attachments (959) of complexes. We used three alternative definitions of an "essential" protein complex – a complex for which at least one, or at least 25% or 50% of subunits have a nonviable deletion phenotype. Cell cycle regulated genes were identified by Spellman et al.[12].

Statistical tests

To compare the distribution of phenotypes amongst protein complexes to that expected by chance we divided the set of protein complexes into ten evenly spaced bins according to the percentage of tested subunits that shared each phenotype. We then randomized the assignment of subunits to protein complexes 100,000 times (but keeping the distribution of complex sizes the same) to calculate the expected frequency of complexes in each bin. To identify bins significantly over- or under-represented for phenotypes we counted the number of times the real enrichments for each bin were seen in the randomizations.

To identify individual complexes significantly enriched for each phenotype we compared the number of subunits of each complex that share a phenotype to the frequencies seen in randomised complexes. To correct for multiple hypothesis testing we used the Benjamini-Hochberg method [24] to identify those complexes enriched at a 5% false-discovery rate (FDR). When testing the association between over-expression phenotypes and protein complex subunits, we only considered complexes for which at least two subunits had been tested for over-expression phenotypes. Hence in this case the total number of complexes considered was 183 rather than 217. The percentages of genes with over-expression phenotypes represent the percentage of tested genes.