, Volume 131, Issue 2, pp 151–156 | Cite as

Understanding relationship between sequence and functional evolution in yeast proteins

  • Seong-Ho Kim
  • Soojin V. YiEmail author
Original Paper


The underlying relationship between functional variables and sequence evolutionary rates is often assessed by partial correlation analysis. However, this strategy is impeded by the difficulty of conducting meaningful statistical analysis using noisy biological data. A recent study suggested that the partial correlation analysis is misleading when data is noisy and that the principal component regression analysis is a better tool to analyze biological data. In this paper, we evaluate how these two statistical tools (partial correlation and principal component regression) perform when data are noisy. Contrary to the earlier conclusion, we found that these two tools perform comparably in most cases. Furthermore, when there is more than one ‘true’ independent variable, partial correlation analysis delivers a better representation of the data. Employing both tools may provide a more complete and complementary representation of the real data. In this light, and with new analyses, we suggest that protein length and gene dispensability play significant, independent roles in yeast protein evolution.


Partial correlation Principal component regression Functional genomic data Yeast protein evolution 



We thank D. Allan Drummond and Claus Wilke for helpful personal communications, Charles Warden for critical reading of the manuscript. SY is supported by funds from the Georgia Institute of Technology.

Supplementary material

10709_2006_9125_MOESM1_ESM.pdf (186 kb)
ESM (PDF 186 kb)


  1. Coghlan A, Wolfe KH (2000) Yeast 16:1131–1145Google Scholar
  2. Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327–337PubMedCrossRefGoogle Scholar
  3. Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhapbditis, Drosoophila, and Arabidopsis. Proc Nat Acad Sci USA 96:4482–4487PubMedCrossRefGoogle Scholar
  4. Ghaemmaghaml S, Huh W-K, Bower K, Howson RW, Belle A, Dephoure N, O’Shea JS, Weissman (2003) Global analysis of protein expression in yeast. Nature 425:737–741CrossRefGoogle Scholar
  5. Hahn MW, Conant GC, Wagner A (2004) Molecular evolution in large genetic networks: does connectivity equal constraint? J Mol Evol 58:203–211PubMedCrossRefGoogle Scholar
  6. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M (2004) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430:88–93PubMedCrossRefGoogle Scholar
  7. Hirsh AE, Fraser HB, Wall DP (2005) Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol 22:174–177PubMedCrossRefGoogle Scholar
  8. Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95:717–728PubMedCrossRefGoogle Scholar
  9. Jordan IK, Wolf YI, Koonin EV (2003) No simple dependence between protein evolution and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3:1PubMedCrossRefGoogle Scholar
  10. Kim S-H, Yi S (2006) Correlated asymmetry between sequence and functional divergence of duplicate proteins in Saccharomyces cerevisiae. Mol Biol Evol 23:1068–1075PubMedCrossRefGoogle Scholar
  11. Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol 22:1345–1354PubMedCrossRefGoogle Scholar
  12. R Development Core Team (2004) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–900051–00-3, URL
  13. Rocha EP, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacteria. Mol Biol Evol 21:108–116PubMedCrossRefGoogle Scholar
  14. Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman M W (2005) Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA 102:5483–5488PubMedCrossRefGoogle Scholar
  15. Weisberg S (1985) Applied linear regression. John Wiley and Sons, 336 ppGoogle Scholar
  16. Whittaker J (1996) Graphical models in applied multivariate statistics. John Wiley and Sons, New York, 466 ppGoogle Scholar
  17. Zhang JG, He X (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol 22:1147–1155PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  1. 1.School of BiologyGeorgia Institute of TechnologyAtlantaUSA
  2. 2.Division of BiostatisticsSchool of Medicine, Indiana UniversityIndianapolisUSA

Personalised recommendations