Towards an Ensemble Learning Strategy for Metagenomic Gene Prediction

  • Fabiana Goés
  • Ronnie Alves
  • Leandro Corrêa
  • Cristian Chaparro
  • Lucinéia Thom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8826)


Metagenomics is an emerging field in which the power of genome analysis is applied to entire communities of microbes. A large variety of classifiers has been developed for gene prediction though there is lack of an empirical evaluation regarding the core machine learning techniques implemented in these tools. In this work we present an empirical performance evaluation of classification strategies for metagenomic gene prediction. This comparison takes into account distinct supervised learning strategies: one lazy learner, two eager-learners and one ensemble learner. Though the performance of the four base classifiers was good, the ensemble-based strategy with Random Forest has achieved the overall best result.


Machine learning classification methods gene prediction metagenomics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wooley, J.C., Godzik, A., Friedberg, I.: A primer on metagenomics. PLoS Computational Biology 6(2), e1000667 (2010)Google Scholar
  2. 2.
    Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K., Hugenholtz, P.: A bioinformatician’s guide to metagenomics. Microbiology and Molecular Biology Reviews 72(4), 557–578 (2008)CrossRefGoogle Scholar
  3. 3.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques. Morgan kaufmann (2012)Google Scholar
  5. 5.
    Faceli, K.: Inteligência artificial: uma abordagem de aprendizado de máquina. Grupo Gen-LTC (2011)Google Scholar
  6. 6.
    Fickett, J.W.: Recognition of protein coding regions in dna sequences. Nucleic Acids Research 10(17), 5303–5318 (1982)CrossRefGoogle Scholar
  7. 7.
    Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30(19), 4103–4117 (2002)CrossRefGoogle Scholar
  8. 8.
    Hoff, K.J., Tech, M., Lingner, T., Daniel, R., Morgenstern, B., Meinicke, P.: Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9(1), 217 (2008)CrossRefGoogle Scholar
  9. 9.
    Liu, Y., Guo, J., Hu, G., Zhu, H.: Gene prediction in metagenomic fragments based on the svm algorithm. BMC Bioinformatics 14(suppl. 5), S12 (2013)Google Scholar
  10. 10.
    Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., Banfield, J.F.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428(6978), 37–43 (2004)CrossRefGoogle Scholar
  11. 11.
    Kuhn, M.: The caret package homepage (2010),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Fabiana Goés
    • 1
  • Ronnie Alves
    • 1
    • 2
    • 4
  • Leandro Corrêa
    • 1
  • Cristian Chaparro
    • 1
  • Lucinéia Thom
    • 3
  1. 1.PPGCCUniversidade Federal do ParáBelémBrazil
  2. 2.Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506Université Montpellier 2, Centre National de la Recherche ScientifiqueMontpellierFrance
  3. 3.PPGCUniversidade Federal do Rio Grande do SulPorto AlegreBrazil
  4. 4.Institut de Biologie ComputationnelleMontpellierFrance

Personalised recommendations