Identification of the clustering structure in microbiome data by density clustering on the Manhattan distance

Abstract

Clustering technology is a method for grouping data points into clusters containing a group of similar data points. In a real dataset such as microbiome data, the data points are presented as profiles or a probability distribution. These data points form the periphery of a cluster, making it difficult to identify the real clustering structure. In this study, we used density clustering on several distance measures to overcome this difficulty. Experiments using a real dataset indicated that the Manhattan distance is an appropriate distance measure for clustering analysis of microbiome data.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Cani P D. Gut microbiota and obesity: lessons from the microbiome. Brief Funct Genom, 2013, 12: 381–387

    Article  Google Scholar 

  2. 2

    DeWeerdt S. Microbiome: a complicated relationship status. Nature, 2014, 508: S61–S63

    Article  Google Scholar 

  3. 3

    Bornigen D, Morgan X C, Franzosa E A, et al. Functional profiling of the gut microbiome in disease-associated inflammation? Genom Med, 2013, 5: 65

    Article  Google Scholar 

  4. 4

    Caporaso J G, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Meth, 2010, 7: 335–336

    Article  Google Scholar 

  5. 5

    Gevers D, Pop M, Schloss P D, et al. Bioinformatics for the human microbiome project. PLoS Comput Biol, 2012, 8: e1002779

    Article  Google Scholar 

  6. 6

    Goodrich J K, Di Rienzi S C, Poole A C, et al. Conducting a microbiome study. Cell, 2014, 158: 250–262

    Article  Google Scholar 

  7. 7

    La Rosa P S, Shands B, Deych E, et al. Statistical object data analysis of taxonomic trees from human microbiome data. PLoS ONE, 2012, 7: e48996

    Article  Google Scholar 

  8. 8

    Arumugam M, Raes J, Pelletier E, et al. Enterotypes of the human gut microbiome. Nature, 2011, 473: 174–180

    Article  Google Scholar 

  9. 9

    Wang J, Linnenbrink M, Kunzel S, et al. Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice. Proc Nat Acad Sci USA, 2014, 111: E2703–E2710

    Article  Google Scholar 

  10. 10

    Viaene L, Thijs L, Jin Y, et al. Heritability and clinical determinants of serum indoxyl sulfate and p-cresyl sulfate, candidate biomarkers of the human microbiome enterotype. PLoS ONE, 2014, 9: e79682

    Article  Google Scholar 

  11. 11

    Knights D, Ward T L, McKinlay C E, et al. Rethinking “enterotypes”. Cell Host Microbe, 2014, 16: 433–437

    Article  Google Scholar 

  12. 12

    Chen X, Hu X H, Lim T Y, et al. Exploiting the functional and taxonomic structure of genomic data by probabilistic topic modeling. IEEE/ACM Trans Comput Biol Bioinform, 2012, 9: 980–991

    Article  Google Scholar 

  13. 13

    Gevers D, Knight R, Petrosino J F, et al. The Human Microbiome Project: a community resource for the healthy human microbiome, PLoS Biol, 2012, 10: e1001377

    Article  Google Scholar 

  14. 14

    Peterson J, Garges S, Giovanni M, et al. The NIH human microbiome project. Genome Res, 2009, 19: 2317–2323

    Article  Google Scholar 

  15. 15

    Aggarwal C C, Reddy C K. Data Clustering: Algorithms and Applications. Boca Raton: CRC Press, 2013

    Google Scholar 

  16. 16

    Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344: 1492–1496

    Article  Google Scholar 

  17. 17

    Kurzyński P, Kaszlikowski D. Information-theoretic metric as a tool to investigate nonclassical correlations. Phys Rev A, 2014, 89: 012103

    Article  Google Scholar 

  18. 18

    Lellouch L, Pavoine S, Jiguet F, et al. Monitoring temporal change of bird communities with dissimilarity acoustic indices. Meth Ecol Evol, 2014, 5: 495–505

    Article  Google Scholar 

  19. 19

    Simpson G. CRAN task view: analysis of ecological and environmental data. 2014. https://cran.r-project.org/web /views/Environmetrics.html

    Google Scholar 

  20. 20

    Bourguet D, Chaufaux J, Seguin M, et al. Frequency of alleles conferring resistance to Bt maize in French and US corn belt populations of the European corn borer, Ostrinia nubilalis. Theor Appl Genet, 2003, 106: 1225–1233

    Google Scholar 

  21. 21

    Allen V M, Tinker D B, Hinton M H, et al. Dispersal of micro-organisms in commercial defeathering systems. Brit Poult Sci, 2003, 44: 53–59

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tingting He.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jiang, X., Hu, X. & He, T. Identification of the clustering structure in microbiome data by density clustering on the Manhattan distance. Sci. China Inf. Sci. 59, 070104 (2016). https://doi.org/10.1007/s11432-016-5587-8

Download citation

Keywords

  • microbiome
  • information distance
  • data visualization
  • density clustering
  • microbial community