Noise-free latent block model for high dimensional data

Laclau, Charlotte; Brault, Vincent

doi:10.1007/s10618-018-0597-3

Noise-free latent block model for high dimensional data

Published: 15 November 2018

Volume 33, pages 446–473, (2019)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

372 Accesses
4 Citations
Explore all metrics

Abstract

Co-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both the observations and the variables of a given dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this paper, we tackle this issue by proposing a novel co-clustering model which assumes the existence of a noise cluster, that contains all irrelevant features. A variational expectation-maximization-based algorithm is derived for this task, where the automatic variable selection as well as the joint clustering of objects and variables are achieved via a Bayesian framework. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Notes

https://archive.ics.uci.edu/ml/datasets.html.
The datasets can be found here: https://github.com/laclauc/NFLB and the code will be available upon publication.
https://rosenberglab.stanford.edu/data/rosenbergEtAl2002/diversitydata.stru.
https://rosenberglab.stanford.edu/nativedata.html.

References

Baudry JP, Celeux G, Marin JM (2008) Selecting models focussing on the modeller purpose. In: COMPSTAT 2008, Springer, pp 337–348
Ben-David S, Haghtalab N (2014) Clustering in the presence of background noise. In: Proceedings of ICML, pp 280–288
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. PAMI 22(7):719–725
Article Google Scholar
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Article MathSciNet MATH Google Scholar
Brault V, Keribin C, Mariadassou M (2017) Consistency and asymptotic normality of latent blocks model estimators. arXiv preprint arXiv:1704.06629
Celeux G, Martin-Magniette ML, Maugis C, Raftery AE (2011) Letter to the editor: “a framework for feature selection in clustering”. J Am Stat Assoc 106:383
Article MATH Google Scholar
Cuesta-Albertos JA, Gordaliza A, Matràn C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576
Article MathSciNet MATH Google Scholar
Dave RN (1991) Characterization and detection of noise in clustering. Pattern Recognit Lett 12(11):657–664
Article Google Scholar
Dave RN (1993) Robust fuzzy clustering algorithms. In: [Proceedings 1993] Second IEEE international conference on fuzzy systems, vol 2, pp 1281–1286
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, AAAI Press, pp 226–231
Frühwirth-Schnatter S (2011) Dealing with label switching under model uncertainty. In: Mengersen KL, Robert CP, Titterington DM (eds) Mixtures: estimation and applications. Chap 10. Wiley, Hoboken, pp 213–239
Chapter Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345
Article MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2):89–109
Article MathSciNet MATH Google Scholar
Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recognit 36:463–473
Article MATH Google Scholar
Govaert G, Nadif M (2008) Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Stat Data Anal 52(6):3233–3245
Article MathSciNet MATH Google Scholar
Govaert G, Nadif M (2013) Co-clustering. Wiley, Hoboken
Book MATH Google Scholar
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Article Google Scholar
Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347
MathSciNet MATH Google Scholar
Keribin C, Brault V, Celeux G, Govaert G (2015) Estimation and selection for the latent block model on categorical data. Stat Comput 25(6):1201–1216
Article MathSciNet MATH Google Scholar
Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26:1154–1166
Article Google Scholar
Li M, Zhang L (2008) Multinomial mixture model with feature selection for text clustering. Knowl Based Syst 21(7):704–708
Article Google Scholar
Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with gaussian mixture models. Biometrics 65(3):701–709
Article MathSciNet MATH Google Scholar
Mirkin BG (1996) Mathematical classification and clustering. Nonconvex optimization and its applications. Kluwer academic publishers, Dordrecht
Book MATH Google Scholar
Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164
MATH Google Scholar
Patrikainen A, Meila M (2006) Comparing subspace clusterings. IEEE Trans Knowl Data Eng 18(7):902–916
Article Google Scholar
Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101:168–178
Article MathSciNet MATH Google Scholar
Robert V, Vasseur Y (2017) Comparing high dimensional partitions, with the co-clustering adjusted rand index. arXiv:1705.06760
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002) Genetic structure of human populations. Science 298(5602):2381–2385
Article Google Scholar
Wang S, Zhu J (2008) Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics 64(2):440–448
Article MathSciNet MATH Google Scholar
Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, Bedoya G, Rojas W, Parra MV, Molina JA, Gallo C, Mazzotti G, Poletti G, Hill K, Hurtado AM, Labuda D, Klitz W, Barrantes R, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Llop E, Rothhammer F, Excoffier L, Feldman MW, Rosenberg NA, Ruiz-Linares A (2007) Genetic variation and population structure in native Americans. PLoS Genet 3(11):e185
Article Google Scholar
Wang X, Kabán A (2005) Finding uninformative features in binary data. Intell Data Eng Autom Learn IDEAL 2005:40–47
Google Scholar
Wyse J, Friel N (2012) Block clustering with collapsed latent block models. Stat Comput 22(2):415–428
Article MathSciNet MATH Google Scholar
Wyse J, Friel N, Latouche P (2017) Inferring structure in bipartite networks using the latent blockmodel and exact ICL. Netw Sci 5(1):45–69. https://doi.org/10.1017/nws.2016.25
Article Google Scholar
Zhou H, Pan W, Shen X (2009) Penalized model-based clustering with unconstrained covariance matrices. Electron J Stat 3:1473–1496
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire Hubert Curien UMR 5516, CNRS, Institut d Optique Graduate School, University of Lyon, UJM-Saint-Etienne, 42023, Saint-Etienne, France
Charlotte Laclau
University of Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000, Grenoble, France
Vincent Brault

Authors

Charlotte Laclau
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Brault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charlotte Laclau.

Additional information

Responsible editor: Jesse Davis, Elisa Fromont, Derek Greene, Björn Bringmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laclau, C., Brault, V. Noise-free latent block model for high dimensional data. Data Min Knowl Disc 33, 446–473 (2019). https://doi.org/10.1007/s10618-018-0597-3

Download citation

Received: 10 December 2017
Accepted: 24 October 2018
Published: 15 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10618-018-0597-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise-free latent block model for high dimensional data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Noise-free latent block model for high dimensional data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation