Abstract
Proteins are the main building blocks of the cell, and perform almost all the functions related to cell activity. Despite the recent advances in Molecular Biology, the function of a large amount of proteins is still unknown. The use of algorithms able to induce classification models is a promising approach for the functional prediction of proteins, whose classes are usually organized hierarchically. Among the machine learning techniques that have been used in hierarchical classification problems, one may highlight the Decision Trees. This paper describes the main characteristics of hierarchical classification models for Bioinformatics problems and applies three hierarchical methods based on the use of Decision Trees to protein functional classification datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Freitas, A.A., Carvalho, A.C.P.F.: A Tutorial on Hierarchical Classification with Applications in Bioinformatics. In: Taniar, D. (ed.) Research and Trends in Data Mining Technologies and Applications, Idea Group, pp. 176–209 (2007)
Blake, J.: Gene Ontology(GO) Tutorial, [Online; accessed April 07, 2006] (2003), http://www.geneontology.org/teaching_resources/tutorials/2003_MBL_jblake.pdf
E. Nomenclature, of the IUPAC-IUB. p. 104, American Elsevier Pub. Co., New York, NY (1972)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education, New York (1997)
Sun, A., Lim, E.P., Ng, W.K.: Hierarchical text classification methods and their specification. Cooperative Internet Computing 256, 18 (2003)
Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 521–528. IEEE Computer Society Press, Washington, DC, USA (2001)
Jensen, L.J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Stærfeldt, H.H., Rapacki, K., Workman, C., Andersen, C.A.F., Knudsen, S., Krogh, A., Valencia, A., Brunak, S.: Prediction of human protein function from post-translational modifications and localization features. Journal of Molecular Biology 319(5), 1257–1265 (2002)
Riley, M.: Functions of the gene products of Escherichia coli. Microbiology and Molecular Biology Reviews 57(4), 862–952 (1993)
Weinert, W.R., Lopes, H.S.: Neural networks for protein classification. Applied Bioinformatics 3(1), 41–48 (2004)
Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., Tasumi, M.: The Protein Data Bank. A computer-based archival file for macromolecular structures. FEBS Journal 80(2), 319–324 (1977)
Clare, A., King, R.D.: Knowledge Discovery in Multi-label Phenotype Data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Jensen, L.J., Gupta, R., Stærfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)
Laegreid, A., Hvidsten, T.R., Midelfart, H., Komorowski, J., Sandvik, A.K.: Predicting Gene Ontology Biological Process From Temporal Gene Expression Patterns. Genome Research 13(5), 965–979 (2003)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Norwell, MA, USA (1992)
Mitchell, M.: An Introduction to Genetic Algorithms. Mit Press, Cambridge (1996)
Tu, K., Yu, H., Guo, Z., Li, X.: Learnability-based further prediction of gene functions in Gene Ontology. Genomics 84(6), 922–928 (2004)
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Holden, N., Freitas, A.A.: A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data. In: Proceedings of the 2005 IEEE Swarm Intelligence Symposium, pp. 100–107. IEEE Computer Society Press, Los Alamitos (2005)
Sousa, T., Silva, A., Neves, A.: Particle swarm based Data Mining Algorithms for classification tasks. Parallel Computing 30(5-6), 767–783 (2004)
Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation 6(4), 321–332 (2002)
Holden, N., Freitas, A.A.: Hierarchical Classification of G-Protein-Coupled Receptors with PSO/ACO Algorithm. In: Proceedings of the 2006 IEEE Swarm Intelligence Symposium, pp. 77–84. IEEE Computer Society Press, Los Alamitos (2006)
GPCRDB, Information system for G protein-coupled receptors (GPCR), [Online; accessed July 2006] (2006), http://www.gpcr.org/7tm/
Clare, A., King, R.D.: Predicting gene function in Saccharomyces cerevisiae. Bioinformatics 19(90002), 42–49 (2003)
Blockeel, H., Bruynooghe, M., Dzeroski, S., Ramon, J., Struyf, J.: Hierarchical multi-classification. In (MRDM 2002). Proceedings of the ACM SIGKDD 2002 Workshop on Multi-Relational Data Mining, pp. 21–35. ACM Press, New York (2002)
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63 (1998)
Filmore, D.: It’s a GPCR world. Modern drug discovery 1(17), 24–28 (2004)
Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al.: UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32, 115–119 (2004)
Interpro [Online; accessed July 2006] (2006), http://www.ebi.ac.uk/interpro/
McDowall, J.: InterPro: Exploring a Powerful Protein Diagnostic Tool. In: ECCB05, Tutorial, p. 14 (2005)
Venables, W.N., Smith, D.M.: The R Development Core Team, An introduction to R - version 2.4.1 (2006), http://cran.r-project.org/doc/manuals/R-intro.pdf
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Costa, E.P., Lorena, A.C., Carvalho, A.C.P.L.F., Freitas, A.A., Holden, N. (2007). Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-73731-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73730-8
Online ISBN: 978-3-540-73731-5
eBook Packages: Computer ScienceComputer Science (R0)