Comparative Analysis of Classification Methods for Protein Interaction Verification System

  • Min Su Lee
  • Seung Soo Park
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4243)


A comparative study for assessing the reliability of protein-protein interactions in a high-throughput dataset is presented. We use various state-of-the-art classification algorithms to distinguish true interacting protein pairs from noisy data using the empirical knowledge about interacting proteins. Then we compare the performance of classifiers with various criteria. Experimental results show that classification algorithms provide very powerful tools in distinguishing true interacting protein pairs from noisy protein-protein interaction dataset. Furthermore, in the data setting with lots of missing values like protein-protein interaction dataset, K-Nearest Neighborhood and Decision Tree algorithms show best performance among other methods.


Support Vector Machine Functional Category Classification Algorithm Protein Interaction Network Protein Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21, 697–700 (2003)CrossRefGoogle Scholar
  2. 2.
    Steffen, M., Petti, A., Aach, J., D’haeseleer, P., Church, G.: Automated modelling of signal transduction networks. BMC Bioinformatics 3, 34–44 (2002)CrossRefGoogle Scholar
  3. 3.
    Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000)CrossRefGoogle Scholar
  4. 4.
    Ito, T., Chiba, T., Ozawa, R., Yoshida, M., et al.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569–4574 (2001)CrossRefGoogle Scholar
  5. 5.
    Gavin, A.C., Bosche, M., Krause, R., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)CrossRefGoogle Scholar
  6. 6.
    Ho, Y., Gruhler, A., Heilbut, A., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002)CrossRefGoogle Scholar
  7. 7.
    von Mering, C., Krause, R., Snel, B., Cornell, M., et al.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)CrossRefGoogle Scholar
  8. 8.
    Sprinzak, E., Sattath, S., Margalit, H.J.: How reliable are experimental protein-protein interaction data? Mol. Biol. 327, 919–923 (2003)CrossRefGoogle Scholar
  9. 9.
    Lee, M.S., Park, S.S., Kim, M.K.: A Protein verification system based on a neural network algorithm. IEEE Computational Systems Bioinformatics, 151–154 (August 2005)Google Scholar
  10. 10.
    Mattews, L.R., Vaglio, P., Reboul, J., Ge, H., et al.: Identification of Potential Interaction Networks Using Sequence-Based Searches for Conserved Protein-Protein Interactions or Interologs. Genome. Res. 11, 2120–2126 (2001)CrossRefGoogle Scholar
  11. 11.
    Ge, H., Liu, Z., Church, G.M., Vidal, M.: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 29, 482–486 (2001)CrossRefGoogle Scholar
  12. 12.
    Kemmeren, P., van Berkum, N., Vilo, J., Bijma, T., et al.: Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 9, 1133–1143 (2002)CrossRefGoogle Scholar
  13. 13.
    Gygi, S., Rochon, Y., Franza, B.R., Aebersold, R.: Correlation between protein and mRNA abundance in yeast. MCB 19, 1720–1730 (1999)Google Scholar
  14. 14.
    Jasen, R., Greenbaum, D., Gerstein, M.: Relating whole-genome expression data with protein-protein interaction. Genome Res. 12, 37–46 (2002)CrossRefGoogle Scholar
  15. 15.
    Bhardwaj, N., Lu, H.: Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 21, 2730–2738 (2005)CrossRefGoogle Scholar
  16. 16.
    Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002)CrossRefGoogle Scholar
  17. 17.
    Sato, T., Yamanishi, Y., Kanehisa, M., Toh, H.: The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21, 3482–3489 (2005)CrossRefGoogle Scholar
  18. 18.
    Ruepp, A., Zollner, A., Maier, D., Albermann, K., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32, 5539–5545 (2004)CrossRefGoogle Scholar
  19. 19.
    Huh, W.K., Falvo, J.V., Gerke, L.C., et al.: Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003)CrossRefGoogle Scholar
  20. 20.
    Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411, 41–42 (2001)CrossRefGoogle Scholar
  21. 21.
    Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, A.N., Barabasi, A.L.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002)CrossRefGoogle Scholar
  22. 22.
    Saito, R., Suzuki, H., Hayashizaki, Y.: Construction of reliable protein-protein interaction networks with a new interaction generality measure. Bioinformatics 19, 756–763 (2003)CrossRefGoogle Scholar
  23. 23.
    Quinlan, R.: C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  24. 24.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in kernel methods - support vector learning. MIT Press, Cambridge (1998)Google Scholar
  25. 25.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proc. of the 11th Conf. on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar
  26. 26.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  27. 27.
    Witten, I.J., Frank, E.: Data mining: practical machine learning tools with java implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Min Su Lee
    • 1
  • Seung Soo Park
    • 1
  1. 1.Department of Computer Science and EngineeringEwha Womans UniversitySeoulKorea

Personalised recommendations