Clustering technology is a method for grouping data points into clusters containing a group of similar data points. In a real dataset such as microbiome data, the data points are presented as profiles or a probability distribution. These data points form the periphery of a cluster, making it difficult to identify the real clustering structure. In this study, we used density clustering on several distance measures to overcome this difficulty. Experiments using a real dataset indicated that the Manhattan distance is an appropriate distance measure for clustering analysis of microbiome data.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Cani P D. Gut microbiota and obesity: lessons from the microbiome. Brief Funct Genom, 2013, 12: 381–387
DeWeerdt S. Microbiome: a complicated relationship status. Nature, 2014, 508: S61–S63
Bornigen D, Morgan X C, Franzosa E A, et al. Functional profiling of the gut microbiome in disease-associated inflammation? Genom Med, 2013, 5: 65
Caporaso J G, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Meth, 2010, 7: 335–336
Gevers D, Pop M, Schloss P D, et al. Bioinformatics for the human microbiome project. PLoS Comput Biol, 2012, 8: e1002779
Goodrich J K, Di Rienzi S C, Poole A C, et al. Conducting a microbiome study. Cell, 2014, 158: 250–262
La Rosa P S, Shands B, Deych E, et al. Statistical object data analysis of taxonomic trees from human microbiome data. PLoS ONE, 2012, 7: e48996
Arumugam M, Raes J, Pelletier E, et al. Enterotypes of the human gut microbiome. Nature, 2011, 473: 174–180
Wang J, Linnenbrink M, Kunzel S, et al. Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice. Proc Nat Acad Sci USA, 2014, 111: E2703–E2710
Viaene L, Thijs L, Jin Y, et al. Heritability and clinical determinants of serum indoxyl sulfate and p-cresyl sulfate, candidate biomarkers of the human microbiome enterotype. PLoS ONE, 2014, 9: e79682
Knights D, Ward T L, McKinlay C E, et al. Rethinking “enterotypes”. Cell Host Microbe, 2014, 16: 433–437
Chen X, Hu X H, Lim T Y, et al. Exploiting the functional and taxonomic structure of genomic data by probabilistic topic modeling. IEEE/ACM Trans Comput Biol Bioinform, 2012, 9: 980–991
Gevers D, Knight R, Petrosino J F, et al. The Human Microbiome Project: a community resource for the healthy human microbiome, PLoS Biol, 2012, 10: e1001377
Peterson J, Garges S, Giovanni M, et al. The NIH human microbiome project. Genome Res, 2009, 19: 2317–2323
Aggarwal C C, Reddy C K. Data Clustering: Algorithms and Applications. Boca Raton: CRC Press, 2013
Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344: 1492–1496
Kurzyński P, Kaszlikowski D. Information-theoretic metric as a tool to investigate nonclassical correlations. Phys Rev A, 2014, 89: 012103
Lellouch L, Pavoine S, Jiguet F, et al. Monitoring temporal change of bird communities with dissimilarity acoustic indices. Meth Ecol Evol, 2014, 5: 495–505
Simpson G. CRAN task view: analysis of ecological and environmental data. 2014. https://cran.r-project.org/web /views/Environmetrics.html
Bourguet D, Chaufaux J, Seguin M, et al. Frequency of alleles conferring resistance to Bt maize in French and US corn belt populations of the European corn borer, Ostrinia nubilalis. Theor Appl Genet, 2003, 106: 1225–1233
Allen V M, Tinker D B, Hinton M H, et al. Dispersal of micro-organisms in commercial defeathering systems. Brit Poult Sci, 2003, 44: 53–59
About this article
Cite this article
Jiang, X., Hu, X. & He, T. Identification of the clustering structure in microbiome data by density clustering on the Manhattan distance. Sci. China Inf. Sci. 59, 070104 (2016). https://doi.org/10.1007/s11432-016-5587-8
- information distance
- data visualization
- density clustering
- microbial community