Advertisement

Combining One-Class Classification Models Based on Diverse Biological Data for Prediction of Protein-Protein Interactions

  • José A. Reyes
  • David Gilbert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5109)

Abstract

This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse biological data. Gold Standard data sets frequently employed for this task contain a high proportion of instances related to ribosomal proteins. We demonstrate that this situation biases the classification results and additionally that the prediction of non-ribosomal based PPI is a much more difficult task. In order to improve the performance of this subtask we have integrated more biological data into the classification process, including data from mRNA expression experiments and protein secondary structure information. Furthermore we have investigated several strategies for combining diverse one-class classification (OCC) models generated from different subsets of biological data. The weighted average combination approach exhibits the best results, significantly improving the performance attained by any single classification model evaluated.

Keywords

Protein Pair Protein Secondary Structure mRNA Expression Data Support Vector Data Description Relative Solvent Accessibility 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M.: A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature 403, 623–627 (2000)CrossRefGoogle Scholar
  2. 2.
    Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. 98, 4569–4574 (2001)CrossRefGoogle Scholar
  3. 3.
    Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., Remor, M., Hofert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.A., Copley, R.R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., Superti-Furga, G.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)CrossRefGoogle Scholar
  4. 4.
    Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yang, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A.R., Sassi, H., Nielsen, P.A., Rasmussen, K.J., Andersen, J.R., Johansen, L.E., Hansen, L.H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Srensen, B.D., Matthiesen, J., Hendrickson, R.C., Gleeson, F., Pawson, T., Moran, M.F., Durocher, D., Mann, M., Hogue, C.W.V., Figeys, D., Tyers, M.: Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002)CrossRefGoogle Scholar
  5. 5.
    von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)CrossRefGoogle Scholar
  6. 6.
    Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)CrossRefGoogle Scholar
  7. 7.
    Lin, N., Wu, B., Jansen, R., Gerstein, M., Zhao, H.: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 5(154) (2004)Google Scholar
  8. 8.
    Zhang, L., Wong, S., King, O., Roth, F.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5(38) (2004)Google Scholar
  9. 9.
    Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., Gerstein, M.: Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 15, 945–953 (2005)CrossRefGoogle Scholar
  10. 10.
    Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(suppl. 1), i38–i46 (2005)CrossRefGoogle Scholar
  11. 11.
    Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins: Structure, Function, and Bioinformatics 63, 490–500 (2006)CrossRefGoogle Scholar
  12. 12.
    Ben-Hur, A., Noble, W.S.: Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 7(S2) (2006)Google Scholar
  13. 13.
    Reyes, J.A., Gilbert, D.: Prediction of protein-protein interactions using one-class classification methods and integrating diverse data. Journal of Integrative Bioinformatics 4 (2007)Google Scholar
  14. 14.
    Tax, D.M.J., Duin, R.P.W.: Support vector data description. Machine Learning 54, 45–66 (2004)zbMATHCrossRefGoogle Scholar
  15. 15.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6, 1–6 (2004)CrossRefGoogle Scholar
  16. 16.
    Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: Mips: a database for genomes and protein sequences. Nucl. Acids Res. 30, 31–34 (2002)CrossRefGoogle Scholar
  17. 17.
    Browne, F., Wang, H., Zheng, H., Azuaje, F.: An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions. Journal of Integrative Bioinformatics 3 (2006)Google Scholar
  18. 18.
    Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S.H.: Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000)CrossRefGoogle Scholar
  19. 19.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65–73 (1998)CrossRefGoogle Scholar
  20. 20.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)Google Scholar
  21. 21.
    Drummond, C., Holte, R.C.: Learning to live with false alarms. In: Workshop on Data Mining Methods for Anomaly Detection, Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005)Google Scholar
  22. 22.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  23. 23.
    Joachims, T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in kernel methods: support vector learning, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  24. 24.
    Van Berlo, R.J.P., Wessels, L.F., Ridder, D.D.E., Reinders, M.J.T.: Protein complex prediction using an integrative bioinformatics approach. J. Bioinform. Comput. Biol. 5, 839–864 (2007)CrossRefGoogle Scholar
  25. 25.
    Tax, D.M.J.: Ddtools, the Data Description Toolbox for Matlab, http://www-ict.ewi.tudelft.nl/~davidt/dd_tools.html
  26. 26.
    Guo, Z., Li, Y., Gong, X., Yao, C., Ma, W., Wang, D., Li, Y., Zhu, J., Zhang, M., Yang, D., Wang, J.: Edge-based scoring and searching method for identifying condition-responsive protein protein interaction sub-network. Bioinformatics 23, 2121–2128 (2007)CrossRefGoogle Scholar
  27. 27.
    Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000)Google Scholar
  28. 28.
    Neuvirth, H., Raz, R., Schreiber, G.: Promate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 338, 181–199 (2004)CrossRefGoogle Scholar
  29. 29.
    Hoskins, J., Lovell, S., Blundell, T.L.: An algorithm for predicting protein-protein interaction sites: Abnormally exposed amino acid residues and secondary structure elements. Protein Sci. 15, 1017–1029 (2006)CrossRefGoogle Scholar
  30. 30.
    Guharoy, M., Chakrabarti, P.: Secondary structure based analysis and classification of biological interfaces: identification of binding motifs in protein protein interactions. Bioinformatics 23, 1909–1918 (2007)CrossRefGoogle Scholar
  31. 31.
    Zhou, H.X., Qin, S.: Interaction-site prediction for protein complexes: a critical assessment. Bioinformatics 23, 2203–2209 (2007)CrossRefGoogle Scholar
  32. 32.
    Cheng, J., Randall, A.Z., Sweredoski, M.J., Baldi, P.: SCRATCH: a protein structure and structural feature prediction server. Nucl. Acids Res. 33(suppl-2), W72–W76 (2005)CrossRefGoogle Scholar
  33. 33.
    Fontana, P., Bindewald, E., Toppo, S., Velasco, R., Valle, G., Tosatto, S.C.E.: The SSEA server for protein secondary structure alignment. Bioinformatics 21, 393–395 (2005)CrossRefGoogle Scholar
  34. 34.
    Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22, 1456–1463 (2006)CrossRefGoogle Scholar
  35. 35.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  36. 36.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)zbMATHCrossRefGoogle Scholar
  37. 37.
    Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Information Fusion 6, 83–98 (2005)CrossRefGoogle Scholar
  38. 38.
    Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Machine Learning 65, 247–271 (2006)CrossRefGoogle Scholar
  39. 39.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  40. 40.
    Yule, G.U.: On the association of attributes in statistics. Philosophical Transactions of the Royal Society of London A(194), 257–319 (1900)CrossRefGoogle Scholar
  41. 41.
    Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: 13th International Conference on Machine Learning, pp. 275–283. Morgan Kaufmann, San Francisco (1996)Google Scholar
  42. 42.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Chichester (2004)Google Scholar
  43. 43.
    Duin, R.: The combining classifier: to train or not to train? In: 16th International Conference on Pattern Recognition, vol. 2, pp. 765–770 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • José A. Reyes
    • 1
    • 2
  • David Gilbert
    • 1
  1. 1.Bioinformatics Research Centre, Department of Computing ScienceUniversity of GlasgowGlasgowUK
  2. 2.Facultad de IngenieríaUniversidad de TalcaChile

Personalised recommendations