Skip to main content

Gene Predictors Ensemble for Complex Metagenomes

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 232))

Abstract

Is presented an ensemble of predictors of genes that focuses on improving the performance of traditional predictors when applied to metagenomes obtained by sequencing 454 and are characterized by very short reads. The proposed ensemble is based on the use of data mining techniques, such as decision trees and k-means, complemented by structural information of the sequence provided by the fractal dimension. The assembly obtained can overcome the performance from the best ab initio predictor in a proportion of 15 to 20%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Metzker, M.L.: Sequencing technologies - the next generation. Nature Reviews Genetics 11(1), 31–46 (2010)

    Article  Google Scholar 

  2. Chaisson, M., Pevzner, P.: Short read fragment assembly of bacterial genomes. Genome Research 18(2), 324–330 (2008)

    Article  Google Scholar 

  3. Delcher, A., Bratke, K., Powers, E., Salzberg, S.: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 1, 1–7 (2007)

    Google Scholar 

  4. Hyatt, D., Chen, G.-L., Locascio, P.F., Land, M.L., Larimer, F.W., Hauser, L.J.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)

    Article  Google Scholar 

  5. Noguchi, H., Taniguchi, T., Itoh, T.: MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes 15(6), 387–396 (2008)

    Article  Google Scholar 

  6. Tech, M., Merkl, R.: YACOP: Enhanced gene prediction obtained by a combination of existing methods. In Silico Biology 3(4), 441–451 (2003)

    Google Scholar 

  7. Kislyuk, A., Katz, L., Agrawal, S.: A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26(15), 1819–1826 (2010)

    Article  Google Scholar 

  8. Badger, J.H., Olsen, G.J.: CRITICA: coding region identification tool invoking comparative analysis. Molecular Biology and Evolution 16(4), 512–524 (1999)

    Article  Google Scholar 

  9. Guo, F.-B.: ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Research 31(6), 1780–1789 (2003)

    Article  Google Scholar 

  10. Kang, S., Yang, S.-J., Kim, S., Bhak, J.: CONSORF: a consensus prediction system for prokaryotic coding sequences. Bioinformatics (Oxford, England) 23(22), 3088–3090 (2007)

    Article  Google Scholar 

  11. Pearson, W.R., Wood, T., Zhang, Z., Miller, W.: Comparison of DNA sequences with protein sequences. Genomics 46(1), 24–36 (1997)

    Article  Google Scholar 

  12. Borodovsky, M., McIninch, J.: GENMARK: parallel gene recognition for both DNA strands. Computers & Chemistry 17(2), 123–133 (1993)

    Article  MATH  Google Scholar 

  13. Lukashin, A.V., Borodovsky, M.: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 26(4), 1107–1115 (1998)

    Article  Google Scholar 

  14. Hulth, A.: Reducing false positives by expert combination in automatic keyword indexing. Recent Advances in Natural Language Processing III :.., 367–373 (2004)

    Google Scholar 

  15. Dietterichl, T.: Ensemble learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn., Cambridge, MA, pp. 1–8 (2002)

    Google Scholar 

  16. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim: a sequencing simulator for genomics and metagenomics. PloS One 3(10), e3373 (2008)

    Google Scholar 

  17. Dietterich, T.G.: Machine-Learning Research. AI Magazine 18(4), 97–136 (1997)

    Google Scholar 

  18. Higuchi, T.: Relationship between the fractal dimension and the power law index for a time series: a numerical investigation. Physica D: Nonlinear Phenomena 46(2), 254–264 (1990)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nestor Díaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Díaz, N., Velazco, A.F.R., Márquez, C.A.O. (2014). Gene Predictors Ensemble for Complex Metagenomes. In: Castillo, L., Cristancho, M., Isaza, G., Pinzón, A., Rodríguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01568-2_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01567-5

  • Online ISBN: 978-3-319-01568-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics