Protein Classification with Multiple Algorithms

Diplaris, Sotiris; Tsoumakas, Grigorios; Mitkas, Pericles A.; Vlahavas, Ioannis

doi:10.1007/11573036_42

Sotiris Diplaris¹⁸,
Grigorios Tsoumakas¹⁹,
Pericles A. Mitkas¹⁸ &
…
Ioannis Vlahavas¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3746))

Included in the following conference series:

Panhellenic Conference on Informatics

2304 Accesses
95 Citations

Abstract

Nowadays, the number of protein sequences being stored in central protein databases from labs all over the world is constantly increasing. From these proteins only a fraction has been experimentally analyzed in order to detect their structure and hence their function in the corresponding organism. The reason is that experimental determination of structure is labor-intensive and quite time-consuming. Therefore there is the need for automated tools that can classify new proteins to structural families. This paper presents a comparative evaluation of several algorithms that learn such classification models from data concerning patterns of proteins with known structure. In addition, several approaches that combine multiple learning algorithms to increase the accuracy of predictions are evaluated. The results of the experiments provide insights that can help biologists and computer scientists design high-performance protein classification systems of high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)
Article Google Scholar
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howem, K.L., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Res 28, 263–266 (2000)
Article Google Scholar
Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J., Wright, W.: PRINT-S: the database formerly known as PRINTS. Nucleic Acids Res 28, 225–227 (2000)
Article Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Baldi, P.F., Brunak, S.: Bioinformatics: The Machine Learning Approach. The MIT Press, Cambridge (2001)
MATH Google Scholar
Wang, D., Wang, X., Honavar, V., Dobbs, D.: Data-driven generation of decision trees for motif-based assignment of protein sequences to functional families. In: Proceedings of the Atlantic Symposium on Computational Biology, Genome Information Systems & Technology (2001)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)
Google Scholar
Duad, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Google Scholar
Bairoch, A., Prosite: A dictionary of protein sites and patterns – User Manual. Swiss Institute of Bioinformatics, Geneva (1999)
Google Scholar
Dzeroski, S., Zenko, B.: Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning 54, 255–273 (2004)
Article MATH Google Scholar
Brazdil, P.B., Soares, C., Da Costa, J.P.: Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Machine Learning 50, 251–277 (2003)
Article MATH Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: International Conference on Machine Learning (2000)
Google Scholar
Kalousis, A., Theoharis, T.: Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection. In: Intelligent Data Analysis (1999)
Google Scholar
Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to meta-learning. In: ECML 2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (2000)
Google Scholar
Keller, J., Paterson, I., Berrer, H.: An integrated concept for multi-criteria ranking of data mining algorithms. In: ECML 2000 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (2000)
Google Scholar
Giacinto, G., Roli, F.: Adaptive selection of image classifiers. In: Proceedings of the 9th International Conference on Image Analysis and Processing, pp. 38–45 (1997)
Google Scholar
Woods, K., Kegelmeyer, W.P., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 405–410 (1997)
Article Google Scholar
Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 66–75 (1994)
Article Google Scholar
Merz, C.J.: Dynamical selection of learning algorithms. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data: Artificial Intelligence and Statistics. Springer, Heidelberg (1995)
Google Scholar
Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Article Google Scholar
Ting, K.M., Witten, I.H.: Issues in stacked generalization. Journal of Artificial Intelligence Research 10, 271–289 (1999)
MATH Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective Voting of Heterogeneous Classifiers. In: Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, pp. 465–476 (2004)
Google Scholar
Tsoumakas, G., Angelis, L., Vlahavas, I.: Selective Fusion of Heterogeneous Classifiers. Intelligent Data Analysis 9 (2005) (to appear)
Google Scholar
Hatzidamianos, G., Diplaris, S., Athanasiadis, I., Mitkas, P.A.: GenMiner: A Data Mining Tool for Protein Analysis. In: Proceedings of the 9th Panhellenic Conference on Informatics, Thessaloniki, Greece (2003)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Kohavi, R.: The power of decision tables. In: Proceedings of the 12th European Conference on Machine Learning, pp. 174–189 (1995)
Google Scholar
Cohen, W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 115–123 (1995)
Google Scholar
Witten, I., Frank, E.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning, pp. 144–151 (1998)
Google Scholar
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Cleary, J., Trigg, L.: K*: An instance-based learner using an entropic distance measure. In: Proceedings of the 12th International Conference on Machine Learning, pp. 108–114 (1995)
Google Scholar
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54126, Thessaloniki, Greece
Sotiris Diplaris & Pericles A. Mitkas
Dept. of Informatics, Aristotle University of Thessaloniki, 54126, Thessaloniki, Greece
Grigorios Tsoumakas & Ioannis Vlahavas

Authors

Sotiris Diplaris
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Tsoumakas
View author publications
You can also search for this author in PubMed Google Scholar
Pericles A. Mitkas
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Vlahavas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Communication Engineering, University of Thessaly, Glavani 37, 382 21, Volos, Greece
Panayiotis Bozanis
Department of Computer and Communication Engineering, University of Thessaly, 382 21, Volos, Greece
Elias N. Houstis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I. (2005). Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_42

Download citation

DOI: https://doi.org/10.1007/11573036_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29673-7
Online ISBN: 978-3-540-32091-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics