Protein-Protein Interactions Classification from Text via Local Learning with Class Priors

  • Yulan He
  • Chenghua Lin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5723)

Abstract

Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semi-supervised learning algorithms such as SVM and it also performs better than local learning without incorporating class priors.

Keywords

Text classification Protein-protein interactions Semi-supervised learning Local learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hermjakob, H., Montecchi-Palazzi, L., Lewington, C., Mudali, S., Kerrien, S., Orchard, S., Vingron, M., Roechert, B., Roepstorff, P., Valencia, A., Margalit, H., Armstrong, J., Bairoch, A., Cesareni, G., Sherman, D., Apweiler, R.: IntAct: an open source molecular interaction database. Nucleic Acids Research 32(1) (2004)Google Scholar
  2. 2.
    Xenarios, I., Rice, D., Salwinski, L., Baron, M., Marcotte, E., Eisenberg, D.: DIP: the database of interacting proteins. Nucleic Acids Research 28(1), 289–291 (2000)CrossRefGoogle Scholar
  3. 3.
    Marcotte, E., Xenarios, I., Eisenberg, D.: Mining literature for protein-protein interactions. Bioinformatics 17(4), 359–363 (2001)CrossRefGoogle Scholar
  4. 4.
    Chen, D., Muller, H.M., Sternberg, P.W.: Automatic document classification of biological literature. BMC Bioinformatics 7 (2006)Google Scholar
  5. 5.
    Donaldson, I., Martin, J., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G., Michalickova, K., et al.: PreBIND and Textomy – mining the biomedical literature for protein protein interactions using a support vector machine. BMC Bioinformatics 11(4) (2003)Google Scholar
  6. 6.
    Han, B., Obradovic, Z., Hu, Z., Wu, C., Vucetic, S.: Substring selection for biomedical document classification. Bioinformatics 22(17), 2136–2142 (2006)CrossRefGoogle Scholar
  7. 7.
    Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Advances in Neural Information Processing Systems, vol. 14 (2002)Google Scholar
  8. 8.
    Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: 18th Annual Conf. on Neural Information Processing Systems (2003)Google Scholar
  9. 9.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning (2003)Google Scholar
  10. 10.
    Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML 2006: Proceedings of the 23rd international conference on Machine learning, pp. 985–992 (2006)Google Scholar
  11. 11.
    Wu, M., Scholkopf, B.: Transductive classification via local learning regularization. In: Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS 2007), pp. 628–635 (2007)Google Scholar
  12. 12.
    Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the 9th International Conference on Artificial Intelligence and Statistics, AISTATS 2005 (2005)Google Scholar
  13. 13.
    Wang, F., Zhang, C., Li, T.: Regularized clustering for documents. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 95–102. ACM, New York (2007)CrossRefGoogle Scholar
  14. 14.
    Mann, G.S., McCallum, A.: Simple, robust, scalable semi-supervised learning via expectation regularization. In: Proceedings of the 24th international conference on Machine learning, pp. 593–600. ACM, New York (2007)CrossRefGoogle Scholar
  15. 15.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: The 20th International Conference on Machine Learning, pp. 912–919 (2003)Google Scholar
  16. 16.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  17. 17.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2007)Google Scholar
  18. 18.
    Yu, H., Han, J., Chang, K.C.C.: PEBL: Positive Example-Based Learning for Web Page Classification Using SVM. In: ACM SIGKDD International Conference in Knowledge Discovery in Databases (KDD 2002). ACM Press, New York (2002)Google Scholar
  19. 19.
    Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Third IEEE International Conference on Data Mining, pp. 179–188 (2003)Google Scholar
  20. 20.
    Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: Eighteenth International Joint Conference on Artificial Intelligence, pp. 587–594 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yulan He
    • 1
  • Chenghua Lin
    • 1
  1. 1.School of Engineering, Computing and MathematicsUniversity of ExeterExeter

Personalised recommendations