Summary
Modern experimental techniques have produced a wealth of high-throughput data that has enabled the ongoing genomic revolution. As the field continues to integrate experimental and computational analyzes of this data, it is essential that performance evaluations of high-throughput results be carried out in a consistent and biologically informative manner. Here, we present an overview of evaluation techniques for high-throughput experimental data and computational methods, and we discuss a number of potential pitfalls in this process. These primarily involve the biological diversity of genomic data, which can be masked or misrepresented in overly simplified global evaluations. We describe systems for preserving information about biological context during dataset evaluation, which can help to ensure that multiple different evaluations are more directly comparable. This biological variety in high-throughput data can also be taken advantage of computationally through data integration and process specificity to produce richer systems-level predictions of cellular function. An awareness of these considerations can greatly improve the evaluation and analysis of any high-throughput experimental dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kitano H. (2002). Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet;41(1):1–10.
Steinmetz LM, Deutschbauer AM. (2002). Gene function on a genomic scale. J Chromatogr B Analyt Technol Biomed Life Sci;782(1–2):151–63.
Ideker T, Galitski T, Hood L. (2001). A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet;2:343–72.
Cahill DJ, Nordhoff E. (2003). Protein arrays and their role in proteomics. Adv Biochem Eng Biotechnol;83:177–87.
Sydor JR, Nock S. (2003). Protein expression profiling arrays: tools for the multiplexed high-throughput analysis of proteins. Proteome Sci;1(1):3.
Oleinikov AV, Gray MD, Zhao J, Montgomery DD, Ghindilis AL, Dill K. (2003). Self-assembling protein arrays using electronic semiconductor microchips and in vitro translation. J Proteome Res;2(3):313–9.
Huang RP. (2003). Protein arrays,  an excellent tool in biomedical research. Front Biosci;8:d559–76.
Cutler P. (2003) Protein arrays: the current state-of-the-art. Proteomics;3(1):3–18.
Bartel PL, Fields S. (1995). Analyzing protein-protein interactions using two-hybrid system. Methods Enzymol;254:241–63.
Grunenfelder B, Winzeler EA. (2002). Treasures and traps in genome-wide data sets: case examples from yeast. Nat Rev Genet;3(9):653–61.
Chen Y, Xu D. (2003). Computational analyses of high-throughput protein-protein interaction data. Curr Protein Pept Sci;4(3):159–81.
Bader GD, Heilbut A, Andrews B, Tyers M, Hughes T, Boone C. (2003). Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell Biol;13(7):344–56.
von Mering C, Krause R, Snel B, et al. (2002). Comparative assessment of large-scale data sets of protein-protein interactions. Nature;417(6887):399–403.
Deane CM, Salwinski L, Xenarios I, Eisenberg D. (2002). Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics;1(5):349–56.
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA;98(8):4569–74.
Yue H, Eastman PS, Wang BB, et al. (2001). An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res;29(8):E41-1.
Primig M, Williams RM, Winzeler EA, et al. (2000). The core meiotic transcriptome in budding yeasts. Nat Genet;26(4):415–23.
Myers CL, Barrett DR, Hibbs MA, Huttenhower C, Troyanskaya OG. (2006). Finding function: evaluation methods for functional genomic data. BMC Genomics;7:187.
Lee I, Date SV, Adai AT, Marcotte EM. (2004). A probabilistic functional network of yeast genes. Science;306(5701):1555–8.
van Rijsbergen CJ. (1979). Information retrieval. London, Boston: Butterworth.
Egan JP. (1975). Signal detection theory and ROC-analysis. New York: Academic.
Davis J, Goadrich M. (2006). The relationship between precision-recall and ROC curves. 23rd international Conference on Machine Learning, 2006, Pittsburgh, PA: ACM. pp233–40.
Mewes HW, Frishman D, Guldener U, et al. (2002). MIPS: a database for genomes and protein sequences. Nucleic Acids Res;30(1):31–4.
Ball CA, Dolinski K, Dwight SS, et al. (2000). Integrating functional genomic information into the Saccharomyces genome database. Nucleic Acids Res;28(1):77–80.
Kanehisa M, Goto S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res;28(1):27–30.
Ashburner M, Ball CA, Blake JA, et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet;25(1):25–9.
Choi JK, Yu U, Kim S, Yoo OJ. (2003). Combining multiple microarray studies and modeling interstudy variation. Bioinformatics (Oxford, England);19(Suppl 1):i84–90.
Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M. (2003). Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet;19(10):570–7.
Hu P, Greenwood CM, Beyene J. (2005). Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models. BMC Bioinformatics;6:128.
Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D. (2003). A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA;100(14):8348–53.
Jaimovich A, Elidan G, Margalit H, Friedman N. (2006). Towards an integrated protein-protein interaction network: a relational Markov network approach. J Comput Biol;13(2):145–64.
Deng M, Chen T, Sun F. (2004). An integrated probabilistic model for functional prediction of proteins. J Comput Biol;11(2–3): 463–75.
Karaoz U, Murali TM, Letovsky S, et al. (2004). Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA;101(9):2888–93.
Barutcuoglu Z, Schapire RE, Troyanskaya OG. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics (Oxford, England);22(7):830–6.
Myers CL, Robson D, Wible A, et al. (2005). Discovery of biological networks from diverse functional genomic data. Genome Biol;6(13):R114.
Myers CL, Troyanskaya OG. (2007). Context-sensitive data integration and prediction of biological networks. Bioinformatics (Oxford, England);23(17):2322–30.
Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG. (2007). Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics (Oxford, England);23(20):2692–9.
Alter O, Brown PO, Botstein D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA;97(18):10101–6.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Huttenhower, C., Myers, C.L., Hibbs, M.A., Troyanskaya, O.G. (2009). Computational Analysis of the Yeast Proteome: Understanding and Exploiting Functional Specificity in Genomic Data. In: Stagljar, I. (eds) Yeast Functional Genomics and Proteomics. Methods in Molecular Biology, vol 548. Humana Press. https://doi.org/10.1007/978-1-59745-540-4_15
Download citation
DOI: https://doi.org/10.1007/978-1-59745-540-4_15
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-934115-71-8
Online ISBN: 978-1-59745-540-4
eBook Packages: Springer Protocols