Integration of genomic datasets to predict protein complexes in yeast

  • Ronald Jansen
  • Ning Lan
  • Jiang Qian
  • Mark Gerstein


The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale.

It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics.

The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome.

In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at TP: true possitive; TN: true negative; FP: false positive; FN: false negative; Y2H: yeast two-hybrid.


Protein Complex Localization Information Supplementary Information Individual Gene Biological Role 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ermolaeva, O., Rastogi, M., Pruitt, K.D., Schuler, G.D., Bittner, M.L., Chen, Y., Simon, R., Meltzer, P., Trent, J.M., and Boguski, M.S. (1998) Nat. Genet. 20: 19-23.Google Scholar
  2. 2.
    Gaasterland, T., and Bekiranov, S. (2000) Nat. Genet. 24: 204-206.Google Scholar
  3. 3.
    Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J.E., Snesrud, E., Lee, N., and Quackenbush, J. (2000) Biotechniques. 29: 548-550.Google Scholar
  4. 4.
    Kim, S., Dougherty, E.R., Bittner, M.L., Chen, Y., Sivakumar, K., Meltzer, P., and Trent, J.M. (2000) J. Biomed. Opt. 5: 411-424.Google Scholar
  5. 5.
    Shalon, D., Smith, S.J., and Brown, P.O. (1996) Genome Res. 6: 639-645.Google Scholar
  6. 6.
    Ross-Macdonald, P., Coelho, P., Roemer, T., Agarwal, S., Kumar, A., Jansen, R., Cheung, K., Sheehan, A., Symoniatis, D., Umansky, L., Heidtman, M., Nelson, F., Iwasaki, H., Hager, K., Gerstein, M., Miller, P., Roeder, G., and Snyder, M. (1999) Nature. 402: 413-418.Google Scholar
  7. 7.
    Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D., Bussey, H., Chu, A.M., Connelly, C., Davis, K., Dietrich, F., Dow, S.W., El Bakkoury, M., Foury, F., Friend, S.H., Gentalen, E., Giaever, G., Hegemann, J.H., Jones, T., Laub, M., Liao, H., Davis, R.W., and et al. (1999) Science 285: 901-906.Google Scholar
  8. 8.
    Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T., Mitchell, T., Miller, P., Dean, R.A., Gerstein, M. and Snyder, M. (2001) Science 293: 2101-2105.Google Scholar
  9. 9.
    Zhu, H., Klemic, J.F., Chang, S., Bertone, P., Casamayor, A., Klemic, K.G., Smith, D., Gerstein, M., Reed, M.A., and Snyder, M. (2000) Nat Genet. 26: 283-289.Google Scholar
  10. 10.
    Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001) Pr. Natl. Acad. Sci. USA 98: 4569-4574.Google Scholar
  11. 11.
    Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J.M. (2000) Nature 403: 623-627.Google Scholar
  12. 12.
    Ben-Dor, A., Shamir, R. and Yakhini, Z. (1999) J. Comput. Biol. 6: 281-297.Google Scholar
  13. 13.
    Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Jr., and Haussler, D. (2000) Proc. Natl. Acad. Sci. USA 97: 262-267.Google Scholar
  14. 14.
    Bussemaker, H.J., Li, H. and Siggia, E.D. (2001) Nat. Genet. 27: 167-171.Google Scholar
  15. 15.
    Ge, H., Liu, Z., Church, G.M., and Vidal, M. (2001) Nat. Genet. 29: 482-486.Google Scholar
  16. 16.
    Gerstein, M., and Jansen, R. (2000) Curr. Opin. Struct. Biol. 10: 574-584.Google Scholar
  17. 17.
    Greenbaum, D., Jansen, R., and Gerstein, M. (2002) Bioinformatics 18: 1-12.Google Scholar
  18. 18.
    Greenbaum, D., Luscombe, N.M., Jansen, R., Qian, J., and Gerstein, M. (2001) Genome Res. 11: 1463-1468.Google Scholar
  19. 19.
    Gygi, S.P., Rochon, Y., Franza, B.R., and Aebersold, R. (1999) Mol. Cell. Biol. 19: 1720-1730.Google Scholar
  20. 20.
    Heyer, L.J., Kruglyak, S., and Yooseph, S. (1999) Genome Res. 9: 1106-1115.Google Scholar
  21. 21.
    Jansen, R., and Gerstein, M. (2000) Nucleic Acids Res. 28: 1481-1488.Google Scholar
  22. 22.
    Jansen, R., Greenbaum, D., and Gerstein, M. (2002) Genome Res. 12: 37-46.Google Scholar
  23. 23.
    Qian, J., Dolled-Filhart, M., J., L., Yu, H., and Gerstein, M. (2001a) J. Mol. Biol. 314: 1053-1066.Google Scholar
  24. 24.
    Qian, J., Stenger, B., Wilson, C.A., Lin, J., Jansen, R., Teichmann, S.A., Park, J., Krebs, W.G., Yu, H., Alexandrov, V., Echols, N., and Gerstein, M. (2001b) Nucleic Acids Res. 29: 1750-1764.Google Scholar
  25. 25.
    Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. (1999) Proc. Natl. Acad. Sci. USA 96: 2907-2912.Google Scholar
  26. 26.
    Toronen, P., Kolehmainen, M., Wong, G., and Castren, E. (1999) FEBS Lett. 451: 142-146.Google Scholar
  27. 27.
    Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O., and Eisenberg, D. (1999) Nature 402: 83-86.Google Scholar
  28. 28.
    Drawid, A., and Gerstein, M. (2000) J. Mol. Biol. 301: 1059-1075.Google Scholar
  29. 29.
    Drawid, A., Jansen, R., and Gerstein, M. (2000) Trends Genet. 16: 426-430.Google Scholar
  30. 30.
    Cohen, B., Mitra, R., Hughes, J. and Church, G. (2000) Nat. Genet. 26: 183-186.Google Scholar
  31. 31.
    Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C., Stocker, S., and Weil, B. (2000) Nucleic Acids Res. 28: 37-40.Google Scholar
  32. 32.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., and Davis, R.W. (1998) Mol. Cell. 2: 65-73.Google Scholar
  33. 33.
    Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H.Y., He, Y.D.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., and Friend, S.H. (2000) Cell 102: 109-126.Google Scholar
  34. 34.
    Bairoch, A., and Apweiler, R. (2000) Nucleic Acids Res. 28: 45-48.Google Scholar
  35. 35.
    Hodges, P.E., McKee, A.H., Davis, B.P., Payne, W.E., and Garrels, J.I. (1999) Nucleic Acids Res. 27: 69-73.Google Scholar
  36. 36.
    Gerstein, M., Lan, N., and Jansen, R. (2002) Science 295: 284-287.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Ronald Jansen
    • 1
  • Ning Lan
    • 1
  • Jiang Qian
    • 1
  • Mark Gerstein
    • 2
  1. 1.Department of Molecular Biophysics & BiochemistryYale UniversityNew HavenUSA
  2. 2.Department of Molecular Biophysics & Biochemistry and Computer ScienceYale UniversityNew HavenUSA

Personalised recommendations