Bioinformatics pp 267-278 | Cite as

Integrating Functional Genomics Data

  • Insuk Lee
  • Edward M. Marcotte
Part of the Methods in Molecular Biology™ book series (MIMB, volume 453)


The revolution in high throughput biology experiments producing genome-scale data has heightened the challenge of integrating functional genomics data. Data integration is essential for making reliable inferences from functional genomics data, as the datasets are neither error-free nor comprehensive. However, there are two major hurdles in data integration: heterogeneity and correlation of the data to be integrated. These problems can be circumvented by quantitative testing of all data in the same unified scoring scheme, and by using integration methods appropriate for handling correlated data. This chapter describes such a functional genomics data integration method designed to estimate the “functional coupling” between genes, applied to the baker's yeast Saccharomyces cerevisiae. The integrated dataset outperforms individual functional genomics datasets in both accuracy and coverage, leading to more reliable and comprehensive predictions of gene function. The approach is easily applied to multicellular organisms, including human.

Key words

Data integration function prediction guilt-by-association gene association functional coupling data correlation data heterogeneity 



This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061, 0241180), N.I.H. (GM06779-01), Welch (F1515), and a Packard Fellowship (E.M.M.).


  1. 1.
    Gollub, J., Ball, C. A., Binkley, G., et al. (2003) The Stanford Micro-array Database: data access and quality assessment tools. Nucleic Acids Res 31, 94–96.PubMedCrossRefGoogle Scholar
  2. 2.
    Barrett, T., Suzek, T. O., Troup, D. B., et al. (2005) NCBI GEO: mining millions of expression profiles-database and tools. Nucleic Acids Res 33, D562–566.PubMedCrossRefGoogle Scholar
  3. 3.
    Uetz, P., Giot, L., Cagney, G., et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevi-siae. Nature 403, 623–627.Google Scholar
  4. 4.
    Ito, T., Chiba, T., Ozawa, R., et al. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98, 4569–4574.PubMedCrossRefGoogle Scholar
  5. 5.
    Giot, L., Bader, J. S., Brouwer, C., et al. (2003) A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736.PubMedCrossRefGoogle Scholar
  6. 6.
    Li, S., Armstrong, C. M., Bertin, N., et al. (2004) A map of the interactome network of the metazoan C. elegans. Science 303, 540–543.PubMedCrossRefGoogle Scholar
  7. 7.
    Rual, J. F., Venkatesan, K., Hao, T., et al. (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178.PubMedCrossRefGoogle Scholar
  8. 8.
    Stelzl, U., Worm, U., Lalowski, M., et al. (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957–968.PubMedCrossRefGoogle Scholar
  9. 9.
    Gavin, A. C., Bosche, M., Krause, R., et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147.PubMedCrossRefGoogle Scholar
  10. 10.
    Ho, Y., Gruhler, A., Heilbut, A., et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183.PubMedCrossRefGoogle Scholar
  11. 11.
    Bouwmeester, T., Bauch, A., Ruffner, H., et al. (2004) A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway. Nat Cell Biol 6, 97–105.PubMedCrossRefGoogle Scholar
  12. 12.
    Tong, A. H., Evangelista, M., Parsons, A. B., et al. (2001) Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368.PubMedCrossRefGoogle Scholar
  13. 13.
    Tong, A. H., Lesage, G., Bader, G. D., et al. (2004) Global mapping of the yeast genetic interaction network. Science 303, 808–813.PubMedCrossRefGoogle Scholar
  14. 14.
    Wong, S. L., Zhang, L. V., Tong, A. H., et al. (2004) Combining biological networks to predict genetic interactions. Proc Natl Acad Sci USA 101, 15682–15687.PubMedCrossRefGoogle Scholar
  15. 15.
    Kelley, R., Ideker, T. (2005) Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol 23, 561–566.PubMedCrossRefGoogle Scholar
  16. 16.
    Mellor, J. C., Yanai, I., Clodfelter, K. H., et al. (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res 30, 306–309.PubMedCrossRefGoogle Scholar
  17. 17.
    Troyanskaya, O. G., Dolinski, K., Owen, A. B., et al. (2003) A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharo-myces cerevisiae). Proc Natl Acad Sci USA 100, 8348–8353.PubMedCrossRefGoogle Scholar
  18. 18.
    Jansen, R., Yu, H., Greenbaum, D., et al. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453.PubMedCrossRefGoogle Scholar
  19. 19.
    von Mering, C., Huynen, M., Jaeggi, D., et al. (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res 31, 258–261.CrossRefGoogle Scholar
  20. 20.
    Bowers, P. M., Pellegrini, M., Thompson, M. J., et al. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 5, R35.PubMedCrossRefGoogle Scholar
  21. 21.
    Lee, I., Date, S. V., Adai, A. T., et al. (2004) A probabilistic functional network of yeast genes. Science 306, 1555–1558.PubMedCrossRefGoogle Scholar
  22. 22.
    Gunsalus, K. C., Ge, H., Schetter, A. J., et al. (2005) Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 436, 861–865.PubMedCrossRefGoogle Scholar
  23. 23.
    Myers, C. L., Robson, D., Wible, A., et al. (2005) Discovery of biological networks from diverse functional genomic data. Genome Biol 6, R114.PubMedCrossRefGoogle Scholar
  24. 24.
    Kanehisa, M., Goto, S., Kawashima, S., et al. (2002) The KEGG databases at Genom-eNet. Nucleic Acids Res 30, 42–46.PubMedCrossRefGoogle Scholar
  25. 25.
    Jensen, F. V. (2001) Bayesian Networks and Decision Graphs. Springer, New York.Google Scholar
  26. 26.
    Martin, A., Schneider, S., Schwer, B. (2002) Prp43 is an essential RNA-depend-ent ATPase required for release of lariat-intron from the spliceosome. J Biol Chem 277, 17743–17750.PubMedCrossRefGoogle Scholar
  27. 27.
    Lebaron, S., Froment, C., Fromont-Racine, M., et al. (2005) The splicing ATPase prp43p is a component of multiple preribosomal particles. Mol Cell Biol 25, 9269–9282.PubMedCrossRefGoogle Scholar
  28. 28.
    Leeds, N. B., Small, E. C., Hiley, S. L., et al. (2006) The splicing factor Prp43p, a DEAH box ATPase, functions in ribosome biogenesis. Mol Cell Biol 26, 513–522.PubMedCrossRefGoogle Scholar
  29. 29.
    Combs, D. J., Nagel, R. J., Ares, M., Jr., et al. (2006) Prp43p is a DEAH-box spliceo-some disassembly factor essential for ribosome biogenesis. Mol Cell Biol 26, 523–534.PubMedCrossRefGoogle Scholar
  30. 30.
    Bork, P., Jensen, L. J., von Mering, C., et al. (2004) Protein interaction networks from yeast to human. Curr Opin Struct Biol 14, 292–299.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Insuk Lee
    • 1
  • Edward M. Marcotte
    • 2
  1. 1.Center for Systems and Synthetic Biology, Institute for Molecular BiologyUniversity of Texas at AustinAustin
  2. 2.Center for Systems and Synthetic Biology, and Department of Chemistry and Biochemistry, Institute for Molecular BiologyUniversity of Texas at AustinAustin

Personalised recommendations