Integrating Functional Genomics Data
The revolution in high throughput biology experiments producing genome-scale data has heightened the challenge of integrating functional genomics data. Data integration is essential for making reliable inferences from functional genomics data, as the datasets are neither error-free nor comprehensive. However, there are two major hurdles in data integration: heterogeneity and correlation of the data to be integrated. These problems can be circumvented by quantitative testing of all data in the same unified scoring scheme, and by using integration methods appropriate for handling correlated data. This chapter describes such a functional genomics data integration method designed to estimate the “functional coupling” between genes, applied to the baker's yeast Saccharomyces cerevisiae. The integrated dataset outperforms individual functional genomics datasets in both accuracy and coverage, leading to more reliable and comprehensive predictions of gene function. The approach is easily applied to multicellular organisms, including human.
Key wordsData integration function prediction guilt-by-association gene association functional coupling data correlation data heterogeneity
This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061, 0241180), N.I.H. (GM06779-01), Welch (F1515), and a Packard Fellowship (E.M.M.).
- 3.Uetz, P., Giot, L., Cagney, G., et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevi-siae. Nature 403, 623–627.Google Scholar
- 25.Jensen, F. V. (2001) Bayesian Networks and Decision Graphs. Springer, New York.Google Scholar