Skip to main content
Log in

Block clustering with collapsed latent block models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We introduce a Bayesian extension of the latent block model for model-based block clustering of data matrices. Our approach considers a block model where block parameters may be integrated out. The result is a posterior defined over the number of clusters in rows and columns and cluster memberships. The number of row and column clusters need not be known in advance as these are sampled along with cluster memberhips using Markov chain Monte Carlo. This differs from existing work on latent block models, where the number of clusters is assumed known or is chosen using some information criteria. We analyze both simulated and real data to validate the technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • Bozdogan, H.: Mixture-model cluster analysis using model selection criteria and a new information measure of complexity. In: Bozdogan, H. (ed.) Proceedings of the first US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, vol. 2, pp. 69–113. Kluwer Academic, Boston (1994)

    Chapter  Google Scholar 

  • Brooks, S.P., Giudici, P., Roberts, G.O.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 65, 3–39 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Carpaneto, G., Martello, S., Toth, P.: Algorithms and codes for the assignment problem. Ann. Oper. Res. 13, 193–223 (1988)

    Article  MathSciNet  Google Scholar 

  • Carpaneto, G., Toth, P.: Algorithm 548: Solution of the assignment problem. ACM Trans. Math. Softw. 6, 104–111 (1980)

    Article  Google Scholar 

  • Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixtures posterior distribution. J. Am. Stat. Assoc. 95, 957–979 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB 2000 Proceedings, pp. 93–103 (2000)

    Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 97, 12079–12084 (2000)

    Article  Google Scholar 

  • Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: Comparison of different approaches. Comput. Stat. Data Anal. 52, 3233–3245 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Green, P.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Green, P.J.: Trans-Dimensional Markov chain Monte Carlo. In: Green, P.J., Hjord, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems, pp. 179–198. Oxford University Press, Oxford (2003)

    Google Scholar 

  • Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA 101, 5228–5235 (2004)

    Article  Google Scholar 

  • Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67, 123–129 (1972)

    Article  Google Scholar 

  • Hartigan, J.A.: Bloc voting in the United States senate. J. Classif. 17, 29–49 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)

    Article  MATH  Google Scholar 

  • Kaiser, S., Santamaria, R., Sill, M., Theron, R., Quintales, L., Leisch, F.: Biclust: BiCluster Algorithms. R package version 0.9.1. (2009)

  • Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral Biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 703–716 (2003)

    Article  Google Scholar 

  • Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Stat. Sin. 12, 61–86 (2002)

    MathSciNet  MATH  Google Scholar 

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2004)

    Google Scholar 

  • Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–358. Kluwer Academic, Dordrecht (1998)

    Google Scholar 

  • Nobile, A.: Bayesian finite mixtures: a note on prior specification and posterior computation. Technical report. Department of Statistics, University of Glasgow (2005)

  • Nobile, A., Fearnside, A.T.: Bayesian finite mixtures with an unknown number of components: The allocation sampler. Stat. Comput. 17, 147–162 (2007)

    Article  MathSciNet  Google Scholar 

  • Phillips, D.B., Smith, A.F.M.: Bayesian model comparison via jump diffusions. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 215–239. Chapman & Hall, London (1996)

    Google Scholar 

  • Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. B 59, 731–792 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, C.P., Rydén, T., Titterington, D.M.: Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J. R. Stat. Soc. B 62, 57–76 (2000)

    Article  MATH  Google Scholar 

  • Roberts, G.O.: Markov chain concepts related to sampling algorithms. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 45–58. Chapman & Hall, London (1996)

    Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Sheng, Q., Moreau, Y., Moor, B.D.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)

    Article  Google Scholar 

  • Spiegelhalter, D.J., Best, N.G., Gilks, W.R., Inskip, H.: Hepatitis B: a case study in MCMC methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 21–44. Chapman & Hall, London (1996)

    Google Scholar 

  • Stephens, M.: Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods. Ann. Stat. 28, 40–74 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., Brown, P.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University (1999)

  • van Dijk, B., van Rosmalen, J., Paap, R.: A Bayesian approach to two-mode clustering. Technical report, Econometric Institute Report, Erasmus University Rotterdam (2009)

  • Wit, E., McClure, J.: Statistics for Microarrays: Design, Analysis and Inference. Wiley, New York (2004)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason Wyse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wyse, J., Friel, N. Block clustering with collapsed latent block models. Stat Comput 22, 415–428 (2012). https://doi.org/10.1007/s11222-011-9233-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-011-9233-4

Keywords

Navigation