Advertisement

Protein Classification with Multiple Algorithms

  • Sotiris Diplaris
  • Grigorios Tsoumakas
  • Pericles A. Mitkas
  • Ioannis Vlahavas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3746)

Abstract

Nowadays, the number of protein sequences being stored in central protein databases from labs all over the world is constantly increasing. From these proteins only a fraction has been experimentally analyzed in order to detect their structure and hence their function in the corresponding organism. The reason is that experimental determination of structure is labor-intensive and quite time-consuming. Therefore there is the need for automated tools that can classify new proteins to structural families. This paper presents a comparative evaluation of several algorithms that learn such classification models from data concerning patterns of proteins with known structure. In addition, several approaches that combine multiple learning algorithms to increase the accuracy of predictions are evaluated. The results of the experiments provide insights that can help biologists and computer scientists design high-performance protein classification systems of high quality.

Keywords

Classification Algorithm Weight Vote Sequential Minimal Optimization Classifier Selection Protein Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)CrossRefGoogle Scholar
  2. 2.
    Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howem, K.L., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Res 28, 263–266 (2000)CrossRefGoogle Scholar
  3. 3.
    Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J., Wright, W.: PRINT-S: the database formerly known as PRINTS. Nucleic Acids Res 28, 225–227 (2000)CrossRefGoogle Scholar
  4. 4.
    Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)zbMATHGoogle Scholar
  5. 5.
    Baldi, P.F., Brunak, S.: Bioinformatics: The Machine Learning Approach. The MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  6. 6.
    Wang, D., Wang, X., Honavar, V., Dobbs, D.: Data-driven generation of decision trees for motif-based assignment of protein sequences to functional families. In: Proceedings of the Atlantic Symposium on Computational Biology, Genome Information Systems & Technology (2001)Google Scholar
  7. 7.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)Google Scholar
  8. 8.
    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)Google Scholar
  9. 9.
    Duad, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)Google Scholar
  10. 10.
    Bairoch, A., Prosite: A dictionary of protein sites and patterns – User Manual. Swiss Institute of Bioinformatics, Geneva (1999)Google Scholar
  11. 11.
    Dzeroski, S., Zenko, B.: Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning 54, 255–273 (2004)zbMATHCrossRefGoogle Scholar
  12. 12.
    Brazdil, P.B., Soares, C., Da Costa, J.P.: Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Machine Learning 50, 251–277 (2003)zbMATHCrossRefGoogle Scholar
  13. 13.
    Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: International Conference on Machine Learning (2000)Google Scholar
  14. 14.
    Kalousis, A., Theoharis, T.: Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection. In: Intelligent Data Analysis (1999)Google Scholar
  15. 15.
    Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to meta-learning. In: ECML 2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (2000)Google Scholar
  16. 16.
    Keller, J., Paterson, I., Berrer, H.: An integrated concept for multi-criteria ranking of data mining algorithms. In: ECML 2000 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (2000)Google Scholar
  17. 17.
    Giacinto, G., Roli, F.: Adaptive selection of image classifiers. In: Proceedings of the 9th International Conference on Image Analysis and Processing, pp. 38–45 (1997)Google Scholar
  18. 18.
    Woods, K., Kegelmeyer, W.P., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 405–410 (1997)CrossRefGoogle Scholar
  19. 19.
    Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 66–75 (1994)CrossRefGoogle Scholar
  20. 20.
    Merz, C.J.: Dynamical selection of learning algorithms. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data: Artificial Intelligence and Statistics. Springer, Heidelberg (1995)Google Scholar
  21. 21.
    Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  22. 22.
    Ting, K.M., Witten, I.H.: Issues in stacked generalization. Journal of Artificial Intelligence Research 10, 271–289 (1999)zbMATHGoogle Scholar
  23. 23.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective Voting of Heterogeneous Classifiers. In: Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, pp. 465–476 (2004)Google Scholar
  24. 24.
    Tsoumakas, G., Angelis, L., Vlahavas, I.: Selective Fusion of Heterogeneous Classifiers. Intelligent Data Analysis 9 (2005) (to appear)Google Scholar
  25. 25.
    Hatzidamianos, G., Diplaris, S., Athanasiadis, I., Mitkas, P.A.: GenMiner: A Data Mining Tool for Protein Analysis. In: Proceedings of the 9th Panhellenic Conference on Informatics, Thessaloniki, Greece (2003)Google Scholar
  26. 26.
    Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  27. 27.
    Kohavi, R.: The power of decision tables. In: Proceedings of the 12th European Conference on Machine Learning, pp. 174–189 (1995)Google Scholar
  28. 28.
    Cohen, W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  29. 29.
    Witten, I., Frank, E.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning, pp. 144–151 (1998)Google Scholar
  30. 30.
    Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  31. 31.
    Cleary, J., Trigg, L.: K*: An instance-based learner using an entropic distance measure. In: Proceedings of the 12th International Conference on Machine Learning, pp. 108–114 (1995)Google Scholar
  32. 32.
    John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)Google Scholar
  33. 33.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)Google Scholar
  34. 34.
    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sotiris Diplaris
    • 1
  • Grigorios Tsoumakas
    • 2
  • Pericles A. Mitkas
    • 1
  • Ioannis Vlahavas
    • 2
  1. 1.Dept. of Electrical and Computer EngineeringAristotle University of ThessalonikiThessalonikiGreece
  2. 2.Dept. of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations