Iterative Clustering Method for Metagenomic Sequences

Bonet, Isis; Montoya, Widerman; Mesa-Múnera, Andrea; Alzate, Juan Fernando

doi:10.1007/978-3-319-13817-6_15

Isis Bonet²¹,
Widerman Montoya²¹,
Andrea Mesa-Múnera²¹ &
…
Juan Fernando Alzate²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8891))

1622 Accesses
2 Citations

Abstract

Metagenomics studies microbial DNA of environmental samples. The sequencing tools produce a set of genome fragments providing a challenge for metagenomics to associate them with the corresponding phylogenetic group. To solve this problem there are binning methods, which are classified into two sequencing categories: similarity and composition. This paper proposes an iterative clustering method, which aim at achieving a low sensitivity of clusters. The approach consists of iteratively run k-means reducing the training data in each step. Selection of data for next iteration depends on the result obtained in the previous, which is based on the compactness measure. The final performance clustering is evaluated according with the sensitivity of clusters. The results demonstrate that proposed model is better than the simple k-means for metagenome databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Council, N.R.: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. The National Academies Press (2007)
Google Scholar
Wu, Y.-W., Ye, Y.: A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 535–549. Springer, Heidelberg (2010)
Chapter Google Scholar
Reddy, R.M., Mohammed, M.H., Mande, S.S.: MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets. Genomics 103, 161–168 (2014)
Article Google Scholar
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.: BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)
Article Google Scholar
McHardy, A.C., Martin, H.G., Tsirigos, A., Hugenholtz, P., Rigoutsos, I.: Accurate phylogenetic classification of variable-length DNA fragments. Nat. Meth. 4, 63–72 (2007)
Article Google Scholar
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Research 17, 377–386 (2007)
Article Google Scholar
Chan, C.-K., Hsu, A., Halgamuge, S., Tang, S.-L.: Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 9, 215 (2008)
Article Google Scholar
Teeling, H., Waldmann, J., Lombardot, T., Bauer, M., Glockner, F.: TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163 (2004)
Article Google Scholar
Abe, T., Kanaya, S., Kinouchi, M., Ichiba, Y., Kozuki, T., Ikemura, T.: Informatics for Unveiling Hidden Genome Signatures. Genome Research 13, 693–702 (2003)
Article Google Scholar
Li, W., Fu, L., Niu, B., Wu, S., Wooley, J.: Ultrafast clustering algorithms for metagenomic sequence analysis. Briefings in Bioinformatics 13, 656–668 (2012)
Article Google Scholar
Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K., Hugenholtz, P.: A Bioinformatician’s Guide to Metagenomics. Microbiology and Molecular Biology Reviews 72, 557–578 (2008)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Statistics, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Escuela de Ingeniería de Antioquia, Envigado, Antioquia, Colombia
Isis Bonet, Widerman Montoya & Andrea Mesa-Múnera
Centro Nacional de Secuenciación Genómica-CNSG, Facultad de Medicina, Universidad de Antioquia, Colombia
Juan Fernando Alzate

Authors

Isis Bonet
View author publications
You can also search for this author in PubMed Google Scholar
Widerman Montoya
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Mesa-Múnera
View author publications
You can also search for this author in PubMed Google Scholar
Juan Fernando Alzate
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University College Cork, 011927, Cork, Ireland
Rajendra Prasath & Philip O’Reilly &
V.H.N.Senthikumara Nadar College, 626 001, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonet, I., Montoya, W., Mesa-Múnera, A., Alzate, J.F. (2014). Iterative Clustering Method for Metagenomic Sequences. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-13817-6_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13816-9
Online ISBN: 978-3-319-13817-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics