Query-Adaptive Ranking with Support Vector Machines for Protein Homology Prediction

  • Yan Fu
  • Rong Pan
  • Qiang Yang
  • Wen Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6674)

Abstract

Protein homology prediction is a crucial step in template-based protein structure prediction. The functions that rank the proteins in a database according to their homologies to a query protein is the key to the success of protein structure prediction. In terms of information retrieval, such functions are called ranking functions, and are often constructed by machine learning approaches. Different from traditional machine learning problems, the feature vectors in the ranking-function learning problem are not identically and independently distributed, since they are calculated with regard to queries and may vary greatly in statistical characteristics from query to query. At present, few existing algorithms make use of the query-dependence to improve ranking performance. This paper proposes a query-adaptive ranking-function learning algorithm for protein homology prediction. Experiments with the support vector machine (SVM) used as the benchmark learner demonstrate that the proposed algorithm can significantly improve the ranking performance of SVMs in the protein homology prediction task.

Keywords

Protein homology prediction information retrieval ranking function machine learning support vector machine 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley-Longman, Harlow (1999)Google Scholar
  2. 2.
    Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. Journal of American Society for Information Sciences 27, 129–146 (1976)CrossRefGoogle Scholar
  3. 3.
    Fuhr, N.: Optimal polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Information Systems 7, 183–204 (1989)CrossRefGoogle Scholar
  4. 4.
    Cohen, W., Shapire, R., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research 10, 243–270 (1999)MathSciNetMATHGoogle Scholar
  5. 5.
    Joachims, T.: Optimizing Search Engines Using Clickthrough Data. In: 8th ACM Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM Press, New York (2002)Google Scholar
  6. 6.
    Baker, D., Sali, A.: Protein structure prediction and structural genomics. Science 294, 93–96 (2001)CrossRefGoogle Scholar
  7. 7.
    Zhang, Y., Skolnick, J.: The protein structure prediction problem could be solved using the current PDB library. Proc. Natl. Acad. Sci. USA 102, 1029–1034 (2005)CrossRefGoogle Scholar
  8. 8.
    Ginalski, K.: Comparative modeling for protein structure prediction. Current Opinion in Structural Biology 16, 172–177 (2006)CrossRefGoogle Scholar
  9. 9.
    Zhang, Y.: Progress and challenges in protein structure prediction. Current Opinion in Structural Biology 18, 342–348 (2008)CrossRefGoogle Scholar
  10. 10.
    Soding, J.: Protein homology detection by HMMCHMM comparison. Bioinformatics 2, 951–960 (2005)CrossRefGoogle Scholar
  11. 11.
    Teodorescu, O., Galor, T., Pillardy, J., Elber, R.: Enriching the sequence substitution matrix by structural information. Proteins: Structure, Function and Bioinformatics 54, 41–48 (2004)CrossRefGoogle Scholar
  12. 12.
    Cooper, W., Gey, F., Chen, A.: Information retrieval from the TIPSTER collection: an application of staged logistic regression. In: 1st NIST Text Retrieval Conference, pp. 73–88. National Institute for Standards and Technology, Washington, DC (1993)Google Scholar
  13. 13.
    Gey, F.: Inferring Probability of Relevance Using the Method of Logistic Regression. In: 17th Annual International ACM Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 222–231 (1994)Google Scholar
  14. 14.
    Nallapati, R.: Discriminative Models for Information Retrieval. In: 27th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 64–71. ACM Press, New York (2004)Google Scholar
  15. 15.
    Herbrich, R., Obermayer, K., Graepel, T.: Large margin rank boundaries for ordinal regression. In: Smola, A.J., Bartlett, P., Schölkopf, B., Schuurmans, C. (eds.) Advances in Large Margin Classifiers, pp. 115–132. MIT Press, Cambridge (2000)Google Scholar
  16. 16.
    Crammer, K., Singer, Y.: Pranking with ranking. In: Advances in Neural Information Processing Systems, vol. 14, pp. 641–647. MIT Press, Cambridge (2002)Google Scholar
  17. 17.
    Chapelle, O., Keerthi, S.S.: Efficient algorithms for ranking with SVMs. Information Retrieval Journal 13, 201–215 (2010)CrossRefGoogle Scholar
  18. 18.
    McFee, B., Lanckriet, G.: Metric Learning to Rank. In: 27th International Conference on Machine Learning, Haifa, Israel (2010)Google Scholar
  19. 19.
    Fu, Y., Sun, R., Yang, Q., He, S., Wang, C., Wang, H., Shan, S., Liu, J., Gao, W.: A Block-Based Support Vector Machine Approach to the Protein Homology Prediction Task in KDD Cup 2004. SIGKDD Explorations 6, 120–124 (2004)CrossRefGoogle Scholar
  20. 20.
    Fu, Y.: Machine Learning Based Bioinformation Retrieval. Ph.D. Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2007)Google Scholar
  21. 21.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefMATHGoogle Scholar
  22. 22.
    Foussette, C., Hakenjos, D., Scholz, M.: KDD-Cup 2004 - Protein Homology Task. SIGKDD Explorations 6, 128–131 (2004)CrossRefGoogle Scholar
  23. 23.
    Pfahringer, B.: The Weka Solution to the 2004 KDD Cup. SIGKDD Explorations 6, 117–119 (2004)CrossRefGoogle Scholar
  24. 24.
    Tang, Y., Jin, B., Zhang, Y.: Granular Support Vector Machines with Association Rules Mining for Protein Homology Prediction. Special Issue on Computational Intelligence Techniques in Bioinformatics, Artificial Intelligence in Medicine 35, 121–134 (2005)Google Scholar
  25. 25.
    Caruana, R., Joachims, T., Backstrom, L.: KDD Cup 2004: Results and Analysis. SIGKDD Explorations 6, 95–108 (2004)CrossRefGoogle Scholar
  26. 26.
    Tobi, D., Elber, R.: Distance dependent, pair potential for protein folding: Results from linear optimization. Proteins, Structure Function and Genetics 41, 16–40 (2000)Google Scholar
  27. 27.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 115–132. MIT Press, Cambridge (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yan Fu
    • 1
  • Rong Pan
    • 2
  • Qiang Yang
    • 3
  • Wen Gao
    • 4
  1. 1.Institute of Computing Technology and Key Lab of Intelligent Information ProcessingChinese Academy of SciencesBeijingChina
  2. 2.School of Information Science and TechnologySun Yat-sen UniversityGuangzhouChina
  3. 3.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyHong Kong, China
  4. 4.Institute of Digital MediaPeking UniversityBeijingChina

Personalised recommendations