Towards an Ensemble Learning Strategy for Metagenomic Gene Prediction
Metagenomics is an emerging field in which the power of genome analysis is applied to entire communities of microbes. A large variety of classifiers has been developed for gene prediction though there is lack of an empirical evaluation regarding the core machine learning techniques implemented in these tools. In this work we present an empirical performance evaluation of classification strategies for metagenomic gene prediction. This comparison takes into account distinct supervised learning strategies: one lazy learner, two eager-learners and one ensemble learner. Though the performance of the four base classifiers was good, the ensemble-based strategy with Random Forest has achieved the overall best result.
KeywordsMachine learning classification methods gene prediction metagenomics
Unable to display preview. Download preview PDF.
- 1.Wooley, J.C., Godzik, A., Friedberg, I.: A primer on metagenomics. PLoS Computational Biology 6(2), e1000667 (2010)Google Scholar
- 4.Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques. Morgan kaufmann (2012)Google Scholar
- 5.Faceli, K.: Inteligência artificial: uma abordagem de aprendizado de máquina. Grupo Gen-LTC (2011)Google Scholar
- 9.Liu, Y., Guo, J., Hu, G., Zhu, H.: Gene prediction in metagenomic fragments based on the svm algorithm. BMC Bioinformatics 14(suppl. 5), S12 (2013)Google Scholar
- 11.Kuhn, M.: The caret package homepage (2010), http://caret.r-forge.r-project.org