Skip to main content

Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4643))

Included in the following conference series:

Abstract

Proteins are the main building blocks of the cell, and perform almost all the functions related to cell activity. Despite the recent advances in Molecular Biology, the function of a large amount of proteins is still unknown. The use of algorithms able to induce classification models is a promising approach for the functional prediction of proteins, whose classes are usually organized hierarchically. Among the machine learning techniques that have been used in hierarchical classification problems, one may highlight the Decision Trees. This paper describes the main characteristics of hierarchical classification models for Bioinformatics problems and applies three hierarchical methods based on the use of Decision Trees to protein functional classification datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Freitas, A.A., Carvalho, A.C.P.F.: A Tutorial on Hierarchical Classification with Applications in Bioinformatics. In: Taniar, D. (ed.) Research and Trends in Data Mining Technologies and Applications, Idea Group, pp. 176–209 (2007)

    Google Scholar 

  2. Blake, J.: Gene Ontology(GO) Tutorial, [Online; accessed April 07, 2006] (2003), http://www.geneontology.org/teaching_resources/tutorials/2003_MBL_jblake.pdf

  3. E. Nomenclature, of the IUPAC-IUB. p. 104, American Elsevier Pub. Co., New York, NY (1972)

    Google Scholar 

  4. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  5. Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education, New York (1997)

    MATH  Google Scholar 

  6. Sun, A., Lim, E.P., Ng, W.K.: Hierarchical text classification methods and their specification. Cooperative Internet Computing 256, 18 (2003)

    Google Scholar 

  7. Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 521–528. IEEE Computer Society Press, Washington, DC, USA (2001)

    Google Scholar 

  8. Jensen, L.J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Stærfeldt, H.H., Rapacki, K., Workman, C., Andersen, C.A.F., Knudsen, S., Krogh, A., Valencia, A., Brunak, S.: Prediction of human protein function from post-translational modifications and localization features. Journal of Molecular Biology 319(5), 1257–1265 (2002)

    Article  Google Scholar 

  9. Riley, M.: Functions of the gene products of Escherichia coli. Microbiology and Molecular Biology Reviews 57(4), 862–952 (1993)

    Google Scholar 

  10. Weinert, W.R., Lopes, H.S.: Neural networks for protein classification. Applied Bioinformatics 3(1), 41–48 (2004)

    Article  Google Scholar 

  11. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., Tasumi, M.: The Protein Data Bank. A computer-based archival file for macromolecular structures. FEBS Journal 80(2), 319–324 (1977)

    Google Scholar 

  12. Clare, A., King, R.D.: Knowledge Discovery in Multi-label Phenotype Data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  14. Jensen, L.J., Gupta, R., Stærfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)

    Article  Google Scholar 

  15. Laegreid, A., Hvidsten, T.R., Midelfart, H., Komorowski, J., Sandvik, A.K.: Predicting Gene Ontology Biological Process From Temporal Gene Expression Patterns. Genome Research 13(5), 965–979 (2003)

    Article  Google Scholar 

  16. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Norwell, MA, USA (1992)

    Google Scholar 

  17. Mitchell, M.: An Introduction to Genetic Algorithms. Mit Press, Cambridge (1996)

    Google Scholar 

  18. Tu, K., Yu, H., Guo, Z., Li, X.: Learnability-based further prediction of gene functions in Gene Ontology. Genomics 84(6), 922–928 (2004)

    Article  Google Scholar 

  19. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)

    Article  Google Scholar 

  20. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  21. Holden, N., Freitas, A.A.: A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data. In: Proceedings of the 2005 IEEE Swarm Intelligence Symposium, pp. 100–107. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  22. Sousa, T., Silva, A., Neves, A.: Particle swarm based Data Mining Algorithms for classification tasks. Parallel Computing 30(5-6), 767–783 (2004)

    Article  Google Scholar 

  23. Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation 6(4), 321–332 (2002)

    Article  Google Scholar 

  24. Holden, N., Freitas, A.A.: Hierarchical Classification of G-Protein-Coupled Receptors with PSO/ACO Algorithm. In: Proceedings of the 2006 IEEE Swarm Intelligence Symposium, pp. 77–84. IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  25. GPCRDB, Information system for G protein-coupled receptors (GPCR), [Online; accessed July 2006] (2006), http://www.gpcr.org/7tm/

  26. Clare, A., King, R.D.: Predicting gene function in Saccharomyces cerevisiae. Bioinformatics 19(90002), 42–49 (2003)

    Article  Google Scholar 

  27. Blockeel, H., Bruynooghe, M., Dzeroski, S., Ramon, J., Struyf, J.: Hierarchical multi-classification. In (MRDM 2002). Proceedings of the ACM SIGKDD 2002 Workshop on Multi-Relational Data Mining, pp. 21–35. ACM Press, New York (2002)

    Google Scholar 

  28. Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 55–63 (1998)

    Google Scholar 

  29. Filmore, D.: It’s a GPCR world. Modern drug discovery 1(17), 24–28 (2004)

    Google Scholar 

  30. Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al.: UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32, 115–119 (2004)

    Article  Google Scholar 

  31. Interpro [Online; accessed July 2006] (2006), http://www.ebi.ac.uk/interpro/

  32. McDowall, J.: InterPro: Exploring a Powerful Protein Diagnostic Tool. In: ECCB05, Tutorial, p. 14 (2005)

    Google Scholar 

  33. Venables, W.N., Smith, D.M.: The R Development Core Team, An introduction to R - version 2.4.1 (2006), http://cran.r-project.org/doc/manuals/R-intro.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marie-France Sagot Maria Emilia M. T. Walter

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Costa, E.P., Lorena, A.C., Carvalho, A.C.P.L.F., Freitas, A.A., Holden, N. (2007). Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73731-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73730-8

  • Online ISBN: 978-3-540-73731-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics