Abstract
The revolution in high throughput biology experiments producing genome-scale data has heightened the challenge of integrating functional genomics data. Data integration is essential for making reliable inferences from functional genomics data, as the datasets are neither error-free nor comprehensive. However, there are two major hurdles in data integration: heterogeneity and correlation of the data to be integrated. These problems can be circumvented by quantitative testing of all data in the same unified scoring scheme, and by using integration methods appropriate for handling correlated data. This chapter describes such a functional genomics data integration method designed to estimate the “functional coupling” between genes, applied to the baker's yeast Saccharomyces cerevisiae. The integrated dataset outperforms individual functional genomics datasets in both accuracy and coverage, leading to more reliable and comprehensive predictions of gene function. The approach is easily applied to multicellular organisms, including human.
Key words
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gollub, J., Ball, C. A., Binkley, G., et al. (2003) The Stanford Micro-array Database: data access and quality assessment tools. Nucleic Acids Res 31, 94–96.
Barrett, T., Suzek, T. O., Troup, D. B., et al. (2005) NCBI GEO: mining millions of expression profiles-database and tools. Nucleic Acids Res 33, D562–566.
Uetz, P., Giot, L., Cagney, G., et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevi-siae. Nature 403, 623–627.
Ito, T., Chiba, T., Ozawa, R., et al. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98, 4569–4574.
Giot, L., Bader, J. S., Brouwer, C., et al. (2003) A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736.
Li, S., Armstrong, C. M., Bertin, N., et al. (2004) A map of the interactome network of the metazoan C. elegans. Science 303, 540–543.
Rual, J. F., Venkatesan, K., Hao, T., et al. (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178.
Stelzl, U., Worm, U., Lalowski, M., et al. (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957–968.
Gavin, A. C., Bosche, M., Krause, R., et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147.
Ho, Y., Gruhler, A., Heilbut, A., et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183.
Bouwmeester, T., Bauch, A., Ruffner, H., et al. (2004) A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway. Nat Cell Biol 6, 97–105.
Tong, A. H., Evangelista, M., Parsons, A. B., et al. (2001) Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368.
Tong, A. H., Lesage, G., Bader, G. D., et al. (2004) Global mapping of the yeast genetic interaction network. Science 303, 808–813.
Wong, S. L., Zhang, L. V., Tong, A. H., et al. (2004) Combining biological networks to predict genetic interactions. Proc Natl Acad Sci USA 101, 15682–15687.
Kelley, R., Ideker, T. (2005) Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol 23, 561–566.
Mellor, J. C., Yanai, I., Clodfelter, K. H., et al. (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res 30, 306–309.
Troyanskaya, O. G., Dolinski, K., Owen, A. B., et al. (2003) A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharo-myces cerevisiae). Proc Natl Acad Sci USA 100, 8348–8353.
Jansen, R., Yu, H., Greenbaum, D., et al. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453.
von Mering, C., Huynen, M., Jaeggi, D., et al. (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res 31, 258–261.
Bowers, P. M., Pellegrini, M., Thompson, M. J., et al. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 5, R35.
Lee, I., Date, S. V., Adai, A. T., et al. (2004) A probabilistic functional network of yeast genes. Science 306, 1555–1558.
Gunsalus, K. C., Ge, H., Schetter, A. J., et al. (2005) Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 436, 861–865.
Myers, C. L., Robson, D., Wible, A., et al. (2005) Discovery of biological networks from diverse functional genomic data. Genome Biol 6, R114.
Kanehisa, M., Goto, S., Kawashima, S., et al. (2002) The KEGG databases at Genom-eNet. Nucleic Acids Res 30, 42–46.
Jensen, F. V. (2001) Bayesian Networks and Decision Graphs. Springer, New York.
Martin, A., Schneider, S., Schwer, B. (2002) Prp43 is an essential RNA-depend-ent ATPase required for release of lariat-intron from the spliceosome. J Biol Chem 277, 17743–17750.
Lebaron, S., Froment, C., Fromont-Racine, M., et al. (2005) The splicing ATPase prp43p is a component of multiple preribosomal particles. Mol Cell Biol 25, 9269–9282.
Leeds, N. B., Small, E. C., Hiley, S. L., et al. (2006) The splicing factor Prp43p, a DEAH box ATPase, functions in ribosome biogenesis. Mol Cell Biol 26, 513–522.
Combs, D. J., Nagel, R. J., Ares, M., Jr., et al. (2006) Prp43p is a DEAH-box spliceo-some disassembly factor essential for ribosome biogenesis. Mol Cell Biol 26, 523–534.
Bork, P., Jensen, L. J., von Mering, C., et al. (2004) Protein interaction networks from yeast to human. Curr Opin Struct Biol 14, 292–299.
Acknowledgments
This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061, 0241180), N.I.H. (GM06779-01), Welch (F1515), and a Packard Fellowship (E.M.M.).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Lee, I., Marcotte, E.M. (2008). Integrating Functional Genomics Data. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_14
Download citation
DOI: https://doi.org/10.1007/978-1-60327-429-6_14
Publisher Name: Humana Press
Print ISBN: 978-1-60327-428-9
Online ISBN: 978-1-60327-429-6
eBook Packages: Springer Protocols