Comparing Methods for Multilabel Classification of Proteins Using Machine Learning Techniques

  • Ricardo Cerri
  • Renato R. O. da Silva
  • André C. P. L. F. de Carvalho
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5676)


Multilabel classification is an important problem in bioinformatics and Machine Learning. In a conventional classification problem, examples belong to just one among many classes. When an example can simultaneously belong to more than one class, the classification problem is named multilabel classification problem. Protein function classification is a typical example of multilabel classification, since a protein may have more than one function. This paper describes the main characteristics of some multilabel classification methods and applies five methods to protein classification problems. For an experimental comparison of these methods, traditional machine learning techniques are used. The paper also compares different evaluation metrics used in multilabel problems.


Machine Learning Bioinformatics Multilabel Classification Proteins 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tsoumakas, G., Katakis, I.: Multi label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)CrossRefGoogle Scholar
  2. 2.
    Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS, vol. 2902, pp. 435–444. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Lauser, B., Hotho, A.: Automatic multi-label subject indexing in a multilingual environment. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 140–151. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Luo, X., Zincir-Heywood, N.A.: Evaluation of two systems on multi-class multi-label document classification. In: International Syposium on Methodologies for Intelligent Systems, pp. 161–169 (2005)Google Scholar
  5. 5.
    Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: IEEE International Conference on Granular Computing, vol. 2, pp. 718–721. The IEEE Computational Intelligence Society (2005)Google Scholar
  7. 7.
    Elisseeff, A.E., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press, Cambridge (2001)Google Scholar
  8. 8.
    Alves, R., Delgado, M., Freitas, A.: Multi-label hierarchical classification of protein functions with artificial immune systems. In: Advances in Bioinformatics and Computational Biology, pp. 1–12 (2008)Google Scholar
  9. 9.
    Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Karalic, A., Pirnat, V.: Significance level based multiple tree classification. Informatica 5 (1991)Google Scholar
  11. 11.
    Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)CrossRefGoogle Scholar
  12. 12.
    Shen, X., Boutell, M., Luo, J., Brown, C.: Multi-label machine learning and its application to semantic scene classification. In: International Symposium on Electronic Imaging, San Jose, CA, January 2004, pp. 18–22 (2004)Google Scholar
  13. 13.
    Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 406–417. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Saridis, G.: Parameter estimation: Principles and problems. Automatic Control, IEEE Transactions on 28(5), 634–635 (1983)CrossRefGoogle Scholar
  15. 15.
    Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. In: Machine Learning, pp. 135–168 (2000)Google Scholar
  16. 16.
    Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Advances in Knowledge Discovery and Data Mining, pp. 22–30 (2004)Google Scholar
  17. 17.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0Google Scholar
  18. 18.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)Google Scholar
  19. 19.
    Vapnik, V.N.: The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, Heidelberg (1999)Google Scholar
  20. 20.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  21. 21.
    Cohen, W.W.: Fast effective rule induction. In. Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  22. 22.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2-3), 131–163 (1997)CrossRefGoogle Scholar
  23. 23.
    Tsoumakas, G., Friberg, R., Spyromitros-Xioufis, E., Katakis, I., Vilcek, J.: Mulan software - java classes for multi-label classification (May 2008),
  24. 24.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)Google Scholar
  25. 25.
    Abdi, H.: Bonferroni and Sidak corrections for multiple comparisons. Encyclopedia of Measurement and Statistics, pp. 175–208. Sage, Thousand Oaks (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ricardo Cerri
    • 1
  • Renato R. O. da Silva
    • 1
  • André C. P. L. F. de Carvalho
    • 1
  1. 1.Instituto de Ciências Matemáticas e de Computação - ICMC/USP Avenida Trabalhador São-carlense – 400 – CentroSão Carlos - SPBrasil

Personalised recommendations