Abstract
Metagenomics is an emerging field in which the power of genome analysis is applied to entire communities of microbes. A large variety of classifiers has been developed for gene prediction though there is lack of an empirical evaluation regarding the core machine learning techniques implemented in these tools. In this work we present an empirical performance evaluation of classification strategies for metagenomic gene prediction. This comparison takes into account distinct supervised learning strategies: one lazy learner, two eager-learners and one ensemble learner. Though the performance of the four base classifiers was good, the ensemble-based strategy with Random Forest has achieved the overall best result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wooley, J.C., Godzik, A., Friedberg, I.: A primer on metagenomics. PLoS Computational Biology 6(2), e1000667 (2010)
Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K., Hugenholtz, P.: A bioinformatician’s guide to metagenomics. Microbiology and Molecular Biology Reviews 72(4), 557–578 (2008)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques. Morgan kaufmann (2012)
Faceli, K.: Inteligência artificial: uma abordagem de aprendizado de máquina. Grupo Gen-LTC (2011)
Fickett, J.W.: Recognition of protein coding regions in dna sequences. Nucleic Acids Research 10(17), 5303–5318 (1982)
Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30(19), 4103–4117 (2002)
Hoff, K.J., Tech, M., Lingner, T., Daniel, R., Morgenstern, B., Meinicke, P.: Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9(1), 217 (2008)
Liu, Y., Guo, J., Hu, G., Zhu, H.: Gene prediction in metagenomic fragments based on the svm algorithm. BMC Bioinformatics 14(suppl. 5), S12 (2013)
Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., Banfield, J.F.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428(6978), 37–43 (2004)
Kuhn, M.: The caret package homepage (2010), http://caret.r-forge.r-project.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Goés, F., Alves, R., Corrêa, L., Chaparro, C., Thom, L. (2014). Towards an Ensemble Learning Strategy for Metagenomic Gene Prediction. In: Campos, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2014. Lecture Notes in Computer Science(), vol 8826. Springer, Cham. https://doi.org/10.1007/978-3-319-12418-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-12418-6_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12417-9
Online ISBN: 978-3-319-12418-6
eBook Packages: Computer ScienceComputer Science (R0)