Block clustering with collapsed latent block models

Wyse, Jason; Friel, Nial

doi:10.1007/s11222-011-9233-4

Block clustering with collapsed latent block models

Published: 05 May 2011

Volume 22, pages 415–428, (2012)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Jason Wyse¹ &
Nial Friel¹

528 Accesses
43 Citations
1 Altmetric
Explore all metrics

Abstract

We introduce a Bayesian extension of the latent block model for model-based block clustering of data matrices. Our approach considers a block model where block parameters may be integrated out. The result is a posterior defined over the number of clusters in rows and columns and cluster memberships. The number of row and column clusters need not be known in advance as these are sampled along with cluster memberhips using Markov chain Monte Carlo. This differs from existing work on latent block models, where the number of clusters is assumed known or is chosen using some information criteria. We analyze both simulated and real data to validate the technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Bozdogan, H.: Mixture-model cluster analysis using model selection criteria and a new information measure of complexity. In: Bozdogan, H. (ed.) Proceedings of the first US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, vol. 2, pp. 69–113. Kluwer Academic, Boston (1994)
Chapter Google Scholar
Brooks, S.P., Giudici, P., Roberts, G.O.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 65, 3–39 (2003)
Article MathSciNet MATH Google Scholar
Carpaneto, G., Martello, S., Toth, P.: Algorithms and codes for the assignment problem. Ann. Oper. Res. 13, 193–223 (1988)
Article MathSciNet Google Scholar
Carpaneto, G., Toth, P.: Algorithm 548: Solution of the assignment problem. ACM Trans. Math. Softw. 6, 104–111 (1980)
Article Google Scholar
Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixtures posterior distribution. J. Am. Stat. Assoc. 95, 957–979 (2000)
Article MathSciNet MATH Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB 2000 Proceedings, pp. 93–103 (2000)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Article MathSciNet MATH Google Scholar
Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 97, 12079–12084 (2000)
Article Google Scholar
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: Comparison of different approaches. Comput. Stat. Data Anal. 52, 3233–3245 (2008)
Article MathSciNet MATH Google Scholar
Green, P.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)
Article MathSciNet MATH Google Scholar
Green, P.J.: Trans-Dimensional Markov chain Monte Carlo. In: Green, P.J., Hjord, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems, pp. 179–198. Oxford University Press, Oxford (2003)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA 101, 5228–5235 (2004)
Article Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67, 123–129 (1972)
Article Google Scholar
Hartigan, J.A.: Bloc voting in the United States senate. J. Classif. 17, 29–49 (2000)
Article MathSciNet MATH Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)
Article MATH Google Scholar
Kaiser, S., Santamaria, R., Sill, M., Theron, R., Quintales, L., Leisch, F.: Biclust: BiCluster Algorithms. R package version 0.9.1. (2009)
Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral Biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 703–716 (2003)
Article Google Scholar
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Stat. Sin. 12, 61–86 (2002)
MathSciNet MATH Google Scholar
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2004)
Google Scholar
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–358. Kluwer Academic, Dordrecht (1998)
Google Scholar
Nobile, A.: Bayesian finite mixtures: a note on prior specification and posterior computation. Technical report. Department of Statistics, University of Glasgow (2005)
Nobile, A., Fearnside, A.T.: Bayesian finite mixtures with an unknown number of components: The allocation sampler. Stat. Comput. 17, 147–162 (2007)
Article MathSciNet Google Scholar
Phillips, D.B., Smith, A.F.M.: Bayesian model comparison via jump diffusions. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 215–239. Chapman & Hall, London (1996)
Google Scholar
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. B 59, 731–792 (1997)
Article MathSciNet MATH Google Scholar
Robert, C.P., Rydén, T., Titterington, D.M.: Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J. R. Stat. Soc. B 62, 57–76 (2000)
Article MATH Google Scholar
Roberts, G.O.: Markov chain concepts related to sampling algorithms. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 45–58. Chapman & Hall, London (1996)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Sheng, Q., Moreau, Y., Moor, B.D.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)
Article Google Scholar
Spiegelhalter, D.J., Best, N.G., Gilks, W.R., Inskip, H.: Hepatitis B: a case study in MCMC methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 21–44. Chapman & Hall, London (1996)
Google Scholar
Stephens, M.: Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods. Ann. Stat. 28, 40–74 (2000)
Article MathSciNet MATH Google Scholar
Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., Brown, P.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University (1999)
van Dijk, B., van Rosmalen, J., Paap, R.: A Bayesian approach to two-mode clustering. Technical report, Econometric Institute Report, Erasmus University Rotterdam (2009)
Wit, E., McClure, J.: Statistics for Microarrays: Design, Analysis and Inference. Wiley, New York (2004)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University College London, London, UK
Jason Wyse & Nial Friel

Authors

Jason Wyse
View author publications
You can also search for this author in PubMed Google Scholar
Nial Friel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Wyse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wyse, J., Friel, N. Block clustering with collapsed latent block models. Stat Comput 22, 415–428 (2012). https://doi.org/10.1007/s11222-011-9233-4

Download citation

Received: 13 April 2010
Accepted: 17 January 2011
Published: 05 May 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s11222-011-9233-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Block clustering with collapsed latent block models

Abstract

Access this article

Similar content being viewed by others

Fast and consistent algorithm for the latent block model

Clustering via Nonsymmetric Partition Distributions

A review on spectral clustering and stochastic block models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Block clustering with collapsed latent block models

Abstract

Access this article

Similar content being viewed by others

Fast and consistent algorithm for the latent block model

Clustering via Nonsymmetric Partition Distributions

A review on spectral clustering and stochastic block models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation