Iterative Clustering Method for Metagenomic Sequences

  • Isis Bonet
  • Widerman Montoya
  • Andrea Mesa-Múnera
  • Juan Fernando Alzate
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8891)

Abstract

Metagenomics studies microbial DNA of environmental samples. The sequencing tools produce a set of genome fragments providing a challenge for metagenomics to associate them with the corresponding phylogenetic group. To solve this problem there are binning methods, which are classified into two sequencing categories: similarity and composition. This paper proposes an iterative clustering method, which aim at achieving a low sensitivity of clusters. The approach consists of iteratively run k-means reducing the training data in each step. Selection of data for next iteration depends on the result obtained in the previous, which is based on the compactness measure. The final performance clustering is evaluated according with the sensitivity of clusters. The results demonstrate that proposed model is better than the simple k-means for metagenome databases.

Keywords

Metagenomics clustering sequences binning k-means 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Council, N.R.: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. The National Academies Press (2007)Google Scholar
  2. 2.
    Wu, Y.-W., Ye, Y.: A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 535–549. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Reddy, R.M., Mohammed, M.H., Mande, S.S.: MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets. Genomics 103, 161–168 (2014)CrossRefGoogle Scholar
  4. 4.
    Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.: BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)CrossRefGoogle Scholar
  5. 5.
    McHardy, A.C., Martin, H.G., Tsirigos, A., Hugenholtz, P., Rigoutsos, I.: Accurate phylogenetic classification of variable-length DNA fragments. Nat. Meth. 4, 63–72 (2007)CrossRefGoogle Scholar
  6. 6.
    Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Research 17, 377–386 (2007)CrossRefGoogle Scholar
  7. 7.
    Chan, C.-K., Hsu, A., Halgamuge, S., Tang, S.-L.: Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 9, 215 (2008)CrossRefGoogle Scholar
  8. 8.
    Teeling, H., Waldmann, J., Lombardot, T., Bauer, M., Glockner, F.: TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163 (2004)CrossRefGoogle Scholar
  9. 9.
    Abe, T., Kanaya, S., Kinouchi, M., Ichiba, Y., Kozuki, T., Ikemura, T.: Informatics for Unveiling Hidden Genome Signatures. Genome Research 13, 693–702 (2003)CrossRefGoogle Scholar
  10. 10.
    Li, W., Fu, L., Niu, B., Wu, S., Wooley, J.: Ultrafast clustering algorithms for metagenomic sequence analysis. Briefings in Bioinformatics 13, 656–668 (2012)CrossRefGoogle Scholar
  11. 11.
    Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K., Hugenholtz, P.: A Bioinformatician’s Guide to Metagenomics. Microbiology and Molecular Biology Reviews 72, 557–578 (2008)CrossRefGoogle Scholar
  12. 12.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Statistics, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  13. 13.
    Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Isis Bonet
    • 1
  • Widerman Montoya
    • 1
  • Andrea Mesa-Múnera
    • 1
  • Juan Fernando Alzate
    • 2
  1. 1.Escuela de Ingeniería de AntioquiaEnvigadoColombia
  2. 2.Centro Nacional de Secuenciación Genómica-CNSG, Facultad de MedicinaUniversidad de AntioquiaColombia

Personalised recommendations