Abstract
Is presented an ensemble of predictors of genes that focuses on improving the performance of traditional predictors when applied to metagenomes obtained by sequencing 454 and are characterized by very short reads. The proposed ensemble is based on the use of data mining techniques, such as decision trees and k-means, complemented by structural information of the sequence provided by the fractal dimension. The assembly obtained can overcome the performance from the best ab initio predictor in a proportion of 15 to 20%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Metzker, M.L.: Sequencing technologies - the next generation. Nature Reviews Genetics 11(1), 31–46 (2010)
Chaisson, M., Pevzner, P.: Short read fragment assembly of bacterial genomes. Genome Research 18(2), 324–330 (2008)
Delcher, A., Bratke, K., Powers, E., Salzberg, S.: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 1, 1–7 (2007)
Hyatt, D., Chen, G.-L., Locascio, P.F., Land, M.L., Larimer, F.W., Hauser, L.J.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)
Noguchi, H., Taniguchi, T., Itoh, T.: MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes 15(6), 387–396 (2008)
Tech, M., Merkl, R.: YACOP: Enhanced gene prediction obtained by a combination of existing methods. In Silico Biology 3(4), 441–451 (2003)
Kislyuk, A., Katz, L., Agrawal, S.: A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26(15), 1819–1826 (2010)
Badger, J.H., Olsen, G.J.: CRITICA: coding region identification tool invoking comparative analysis. Molecular Biology and Evolution 16(4), 512–524 (1999)
Guo, F.-B.: ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Research 31(6), 1780–1789 (2003)
Kang, S., Yang, S.-J., Kim, S., Bhak, J.: CONSORF: a consensus prediction system for prokaryotic coding sequences. Bioinformatics (Oxford, England) 23(22), 3088–3090 (2007)
Pearson, W.R., Wood, T., Zhang, Z., Miller, W.: Comparison of DNA sequences with protein sequences. Genomics 46(1), 24–36 (1997)
Borodovsky, M., McIninch, J.: GENMARK: parallel gene recognition for both DNA strands. Computers & Chemistry 17(2), 123–133 (1993)
Lukashin, A.V., Borodovsky, M.: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 26(4), 1107–1115 (1998)
Hulth, A.: Reducing false positives by expert combination in automatic keyword indexing. Recent Advances in Natural Language Processing III :.., 367–373 (2004)
Dietterichl, T.: Ensemble learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn., Cambridge, MA, pp. 1–8 (2002)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim: a sequencing simulator for genomics and metagenomics. PloS One 3(10), e3373 (2008)
Dietterich, T.G.: Machine-Learning Research. AI Magazine 18(4), 97–136 (1997)
Higuchi, T.: Relationship between the fractal dimension and the power law index for a time series: a numerical investigation. Physica D: Nonlinear Phenomena 46(2), 254–264 (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Díaz, N., Velazco, A.F.R., Márquez, C.A.O. (2014). Gene Predictors Ensemble for Complex Metagenomes. In: Castillo, L., Cristancho, M., Isaza, G., Pinzón, A., Rodríguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-01568-2_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01567-5
Online ISBN: 978-3-319-01568-2
eBook Packages: EngineeringEngineering (R0)