Skip to main content

Protein Classification with Multiple Algorithms

  • Conference paper
Advances in Informatics (PCI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3746))

Included in the following conference series:

Abstract

Nowadays, the number of protein sequences being stored in central protein databases from labs all over the world is constantly increasing. From these proteins only a fraction has been experimentally analyzed in order to detect their structure and hence their function in the corresponding organism. The reason is that experimental determination of structure is labor-intensive and quite time-consuming. Therefore there is the need for automated tools that can classify new proteins to structural families. This paper presents a comparative evaluation of several algorithms that learn such classification models from data concerning patterns of proteins with known structure. In addition, several approaches that combine multiple learning algorithms to increase the accuracy of predictions are evaluated. The results of the experiments provide insights that can help biologists and computer scientists design high-performance protein classification systems of high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)

    Article  Google Scholar 

  2. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howem, K.L., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Res 28, 263–266 (2000)

    Article  Google Scholar 

  3. Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J., Wright, W.: PRINT-S: the database formerly known as PRINTS. Nucleic Acids Res 28, 225–227 (2000)

    Article  Google Scholar 

  4. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  5. Baldi, P.F., Brunak, S.: Bioinformatics: The Machine Learning Approach. The MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  6. Wang, D., Wang, X., Honavar, V., Dobbs, D.: Data-driven generation of decision trees for motif-based assignment of protein sequences to functional families. In: Proceedings of the Atlantic Symposium on Computational Biology, Genome Information Systems & Technology (2001)

    Google Scholar 

  7. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)

    Google Scholar 

  8. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)

    Google Scholar 

  9. Duad, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    Google Scholar 

  10. Bairoch, A., Prosite: A dictionary of protein sites and patterns – User Manual. Swiss Institute of Bioinformatics, Geneva (1999)

    Google Scholar 

  11. Dzeroski, S., Zenko, B.: Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning 54, 255–273 (2004)

    Article  MATH  Google Scholar 

  12. Brazdil, P.B., Soares, C., Da Costa, J.P.: Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Machine Learning 50, 251–277 (2003)

    Article  MATH  Google Scholar 

  13. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: International Conference on Machine Learning (2000)

    Google Scholar 

  14. Kalousis, A., Theoharis, T.: Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection. In: Intelligent Data Analysis (1999)

    Google Scholar 

  15. Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to meta-learning. In: ECML 2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (2000)

    Google Scholar 

  16. Keller, J., Paterson, I., Berrer, H.: An integrated concept for multi-criteria ranking of data mining algorithms. In: ECML 2000 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (2000)

    Google Scholar 

  17. Giacinto, G., Roli, F.: Adaptive selection of image classifiers. In: Proceedings of the 9th International Conference on Image Analysis and Processing, pp. 38–45 (1997)

    Google Scholar 

  18. Woods, K., Kegelmeyer, W.P., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 405–410 (1997)

    Article  Google Scholar 

  19. Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 66–75 (1994)

    Article  Google Scholar 

  20. Merz, C.J.: Dynamical selection of learning algorithms. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data: Artificial Intelligence and Statistics. Springer, Heidelberg (1995)

    Google Scholar 

  21. Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)

    Article  Google Scholar 

  22. Ting, K.M., Witten, I.H.: Issues in stacked generalization. Journal of Artificial Intelligence Research 10, 271–289 (1999)

    MATH  Google Scholar 

  23. Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective Voting of Heterogeneous Classifiers. In: Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, pp. 465–476 (2004)

    Google Scholar 

  24. Tsoumakas, G., Angelis, L., Vlahavas, I.: Selective Fusion of Heterogeneous Classifiers. Intelligent Data Analysis 9 (2005) (to appear)

    Google Scholar 

  25. Hatzidamianos, G., Diplaris, S., Athanasiadis, I., Mitkas, P.A.: GenMiner: A Data Mining Tool for Protein Analysis. In: Proceedings of the 9th Panhellenic Conference on Informatics, Thessaloniki, Greece (2003)

    Google Scholar 

  26. Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  27. Kohavi, R.: The power of decision tables. In: Proceedings of the 12th European Conference on Machine Learning, pp. 174–189 (1995)

    Google Scholar 

  28. Cohen, W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 115–123 (1995)

    Google Scholar 

  29. Witten, I., Frank, E.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning, pp. 144–151 (1998)

    Google Scholar 

  30. Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  31. Cleary, J., Trigg, L.: K*: An instance-based learner using an entropic distance measure. In: Proceedings of the 12th International Conference on Machine Learning, pp. 108–114 (1995)

    Google Scholar 

  32. John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  33. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  34. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I. (2005). Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_42

Download citation

  • DOI: https://doi.org/10.1007/11573036_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29673-7

  • Online ISBN: 978-3-540-32091-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics