Skip to main content
Log in

A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges

  • Invited Article
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Model-based co-clustering can be seen as a particularly important extension of model-based clustering. It allows for a significant reduction of both the number of rows (individuals) and columns (variables) of a data set in a parsimonious manner, and also allows interpretability of the resulting reduced data set since the meaning of the initial individuals and features is preserved. Moreover, it benefits from the rich statistical theory for both estimation and model selection. Many works have produced new advances on this topic in recent years, and this paper offers a general update of the related literature. In addition, we advocate two main messages, supported by specific research material: (1) co-clustering requires further research to fix some well-identified estimation issues, and (2) co-clustering is one of the most promising approaches for clustering in the (very) high-dimensional setting, which corresponds to the global trend in modern data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Some more parsimonious versions have also been defined (see Govaert and Nadif (2008)).

References

  • Abbe, E. (2017). Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1), 6446–6531.

    MathSciNet  Google Scholar 

  • Ailem, M., Role, F., & Nadif, M. (2017). Sparse Poisson latent block model for document clustering. IEEE Transactions on Knowledge and Data Engineering, 29(7), 563–1576.

    Google Scholar 

  • Ambroise, C., & Matias, C. (2012). New consistent and asymptotically normal parameter estimates for random-graph mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1), 3–35.

    MathSciNet  MATH  Google Scholar 

  • Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.

    MathSciNet  MATH  Google Scholar 

  • Baudry, J.-P. (2015). Estimation and model selection for model-based clustering with the conditional classification likelihood. Electronic Journal of Statistics, 9(1), 1041–1077.

    MathSciNet  MATH  Google Scholar 

  • Bellman, R. (1957). Dynamic Programming (1st ed.). Princeton, NJ, USA: Princeton University Press.

    MATH  Google Scholar 

  • Bergé, L. R., Bouveyron, C., Corneli, M., & Latouche, P. (2019). The latent topic block model for the co-clustering of textual interaction data. Computational Statistics & Data Analysis, 137, 247–270.

    MathSciNet  MATH  Google Scholar 

  • Bickel, P., Choi, D., Chang, X., Zhang, H., et al. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics, 41(4), 1922–1943.

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C. (2007). Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures for grouped data and behaviour of the EM algorithm. Scandinavian Journal of Statistics, 34(3), 569–586.

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C. (2017). Mixture models. In J.-J. Droesbeke, G. Saporta, & C. Thomas-Agnan (Eds.), Choix de modèles et agrégation. Technip.

    Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.

    Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models. Computational Statistics & Data Analysis, 41, 561–575.

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2011). Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. Journal of Statistical Planning and Inference, 140(11), 2991–3002.

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C., & Chrétien, S. (2003). Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM. Statistics & Probability Letters, 61, 373–382.

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C., & Jacques, J. (2015). Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm. Statistics and Computing, 26(5), 929–943.

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C., & Maugis, C. (2017). High-dimensional clustering. In J.-J. Droesbeke, G. Saporta, & C. Thomas-Agnan (Eds.), Choix de modèles et agrégation. Technip.

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Bock, H. (1979). Simultaneous clustering of objects and variables. Analyse des données et Informatique 187–203

  • Boutalbi, R., Labiod, L., & Nadif, M. (2020). Tensor latent block model for co-clustering. International Journal of Data Science and Analytics, 10, 161–175.

    Google Scholar 

  • Boutalbi, R., Labiod, L., & Nadif, M. (2022). Tensorclus: A python library for tensor (co)-clustering. Neurocomputing, 468(C), 464–468.

    Google Scholar 

  • Bouveyron, C., Bozzi, L., Jacques, J., & Jollois, F.-X. (2018). The functional latent block model for the co-clustering of electricity consumption curves. Journal of the Royal Statistical Society: Series C Applied Statistics, 67(4), 897–915.

    MathSciNet  Google Scholar 

  • Bouveyron, C., & Brunet, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis, 71, 52–78.

    MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Celeux, G., Murphy, T. B. and Raftery, A. (2019). Model-based clustering and classification for data science, Cambridge University Press

  • Bouveyron, C., Côme, E., & Jacques, J. (2015). The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9(4), 1726–1760.

    MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., & Jacques, J. (2011). Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification, 5(4), 281–300.

    MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Jacques, J., & Schmutz, A. (2021). funLBM: Model-based co-clustering of functional data. R package version, 2, 2

  • Bouveyron, C., Jacques, J., Schmutz, A., Simoes, F. and Bottini, S. (2021) Co-clustering of multivariate functional data for the analysis of air pollution in the south of France, Annals of Applied Statistics 16

  • Brault, V. (2014) Estimation et sélection de modèle pour le modèle des blocs latents, PhD thesis, Université Paris Sud

  • Brault, V., Celeux, G. and Keribin, C. (2014). Mise en œ uvre de l’échantillonneur de Gibbs pour le modèle des blocs latents. In: 46èmes Journées de Statistique de la SFdS

  • Brault, V., Keribin, C., & Mariadassou, M. (2020). Consistency and asymptotic normality of latent block model estimators. Electronic Journal of Statistics, 14(1), 1234–1268.

    MathSciNet  MATH  Google Scholar 

  • Brault, V., & Lomet, A. (2015). Revue des méthodes pour la classification jointe des lignes et des colonnes d’un tableau. Journal de la Société Française de Statistique, 156(3), 27–51.

    MathSciNet  MATH  Google Scholar 

  • Brault, V., & Mariadassou, M. (2015). Co-clustering through latent block model: A review. Journal de la Société Française de Statistique, 156(3), 120–139.

    MathSciNet  MATH  Google Scholar 

  • Carreira-Perpinán, M. A., & Renals, S. (2000). Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12(1), 141–152.

    Google Scholar 

  • Celeux, G., Chauveau, D., & Diebolt, J. (1996). Stochastic versions of the EM algorithm: An experimental study in the mixture case. Journal of Statistical Computation and Simulation, 55(4), 287–314.

    MATH  Google Scholar 

  • Celeux, G., & Diebolt, J. (1986). L’algorithme sem: un algorithme d’apprentissage probabiliste pour la reconnaissance de mélange de densités. Revue de statistique appliquée, 34(2), 35–52.

    MATH  Google Scholar 

  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.

    Google Scholar 

  • Celisse, A., Daudin, J.-J., & Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electronic Journal of Statistics, 6, 1847–1899.

    MathSciNet  MATH  Google Scholar 

  • Chao, G., Sun, S., & Bi, J. (2021). A survey on multiview clustering. IEEE Transactions on Artificial Intelligence, 2, 146–168.

    Google Scholar 

  • Charrad, M., Lechevallier, Y., Ahmed, M., & Saporta, G. (2009). Block clustering for web pages categorization. Intelligent Data Engineering and Automated Learning (pp. 260–267). Burgos: Springer.

  • Cheam, A. S. M., Marbac, M. and McNicholas, P. D. (2017). Model-based clustering for spatiotemporal data on air quality monitoring. Environmetrics 28(3)

  • Chen, X., Huang, J. Z., Wu, Q., & Yang, M. (2019). Subspace weighting co-clustering of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(2), 352–364.

    Google Scholar 

  • Cheng, H., & Liu, J. (2021). Concurrent brain parcellation and connectivity estimation via co-clustering of resting state fMRI data: A novel approach. Human brain mapping, 42(8), 2477–2489.

    MathSciNet  Google Scholar 

  • Chi, E. C., Gaines, B. R., Sun, W. W., Zhou, H., & Yang, J. (2020). Provable convex co-clustering of tensors. The Journal of Machine Learning Research, 21(1), 1–58.

    MathSciNet  MATH  Google Scholar 

  • Cho, H., & Dhillon, I. S. (2008). Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(3), 385–4004.

    Google Scholar 

  • Côme, E. and Jouvin, N. (2021). Greed: Clustering and model selection with the integrated classification likelihood. R package version 0.5.1

  • Corneli, M., Bouveyron, C., & Latouche, P. (2020). Co-clustering of ordinal data via latent continuous random variables and not missing at random entries. Journal of Computational and Graphical Statistics, 29(4), 771–785.

    MathSciNet  MATH  Google Scholar 

  • Darikwa, T. B., Manda, S. and Lesaoana, M. (2019). Assessing joint spatial autocorrelations between mortality rates due to cardiovascular conditions in South Africa. Geospatial Health 14(2)

  • Day, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika, 56, 463–474.

    MathSciNet  MATH  Google Scholar 

  • De Leeuw, J. and Michailidis, G. (1999). Block relaxation algorithms in statistics. Information Systems and Data Analysis, 308–325

  • Delaigle, A., & Hall, P. (2010). Defining probability density for a distribution of random functions. The Annals of Statistics, 38, 1171–1193.

    MathSciNet  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.

    MATH  Google Scholar 

  • Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, Association for Computing Machinery, New York, NY, USA, 269–274

  • Dhillon, I. S., Mallela, S. and Modha, D. S. (2003) Information-theoretic co-clustering. In: ‘the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD ’03’, pp. 89–98

  • Etienne, C., & Latifa, O. (2014). Model-based count series clustering for bike sharing system usage mining: A case study with the Vélib’system of paris. ACM Transactions on Intelligent Systems and Technology (TIST), 5(3), 1–21.

    Google Scholar 

  • Flake, G. W., Lawrence, S., Giles, C. L., & Coetzee, F. M. (2002). Self-organization and identification of web communities. Computer, 35(3), 66–70.

    Google Scholar 

  • Fop, M., & Murphy, T. B. (2018). Variable selection methods for model-based clustering. Statistics Surveys, 12, 18–65.

    MathSciNet  MATH  Google Scholar 

  • Fop, M., Smart, K. M. and Murphy, T. B. (2017). Variable selection for latent class analysis with application to low back pain diagnosis. The Annals of Applied Statistics, 2080–2110

  • Forbes, F., Arnaud, A., Lemasson, B., & Barbier, E. (2019). Component elimination strategies to fit mixtures of multiple scale distributions. ‘RSSDS 2019 - Research School on Statistics and Data Science’, 1150 of Communications in Computer and Information Science (pp. 81–95). Melbourne, Australia: Springer.

  • Frisch, G., Leger, J.-B. and Grandvalet, Y. (2021a). Co-clustering for fair recommendation. In: C. in Computer and I. Science, eds, Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021., 1524, Springer, Cham

  • Frisch, G., Leger, J.-B. and Grandvalet, Y. (2021b) SparseBM: A Python module for handling sparse graphs with block models. working paper or preprint

  • Frisch, G., Léger, J.-B. and Grandvalet, Y. (2022) Learning from missing data with the latent block model, Statistics and Computing 32(9)

  • Gallaugher, M., Biernacki, C. and McNicholas, P. (2022). Parameter-wise co-clustering for high-dimensional data, Computational Statistics, 1–23

  • George, T. B., Strawn, N. K. and Leviyang, S. (2021)Tree-based co-clustering identifies chromatin accessibility patterns associated with hematopoietic lineage structure, Frontiers in Genetics 12

  • George, T. and Merugu, S. (2005) A scalable collaborative filtering framework based on co-clustering. In: ‘Proceedings of the Fifth IEEE International Conference on Data Mining’, ICDM ’05, IEEE Computer Society, USA, 625–628

  • Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.

    MathSciNet  MATH  Google Scholar 

  • Goffinet, E., Lebbah, M., Azzag, H., Loïc, G., & Coutant, A. (2021). Non-parametric multivariate time series co-clustering model applied to driving-assistance systems validation. In: V. Lemaire, S. Malinowski, A. Bagnall, T. Guyet, R. Tavenard, & G. Ifrim (Eds.), Advanced Analytics and Learning on Temporal Data (pp. 71–87). Cham: Springer International Publishing.

  • Good, I. J. (1965). ‘Categorization of classification’, Mathematics and Computer Science in Biology and Medicine, pp. 115–125, London: Her Majesty’s stationery office.

  • Goodman, L. A. (1974). Exploratory latent structure models using both identifiable and unidentifiable models. Biometrika, 61, 215–231.

    MathSciNet  MATH  Google Scholar 

  • Govaert, G. (1983) Classification croisée, PhD thesis, Thèse d’état, Université Paris 6.

  • Govaert, G., & Nadif, M. (2008). Block clustering with Bernoulli mixture models: Comparison of different approaches. Computational Statistics & Data Analysis, 52(6), 3233–3245.

    MathSciNet  MATH  Google Scholar 

  • Govaert, G. and Nadif, M. (2013). Co-clustering, Wiley

  • Hasan, M. N., Rana, M. M., Begum, A. A., Rahman, M. and Mollah, M. N. H. (2018).‘Robust co-clustering to discover toxicogenomic biomarkers and their regulatory doses of chemical compounds using logistic probabilistic hidden variable model’. Frontiers in Genetics 9

  • Huang, S., Xu, Z., Tsang, I. W., & Kang, Z. (2020). Auto-weighted multi-view co-clustering with bipartite graphs. Information Sciences, 512, 18–30.

    MathSciNet  MATH  Google Scholar 

  • Ingrassia, S., & Rocci, R. (2007). Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Computational Statistics & Data Analysis, 51(11), 5339–5351.

    MathSciNet  MATH  Google Scholar 

  • Jacques, J., & Biernacki, C. (2018). Model-based co-clustering for ordinal data. Computational Statistics & Data Analysis, 123, 101–115.

    MathSciNet  MATH  Google Scholar 

  • Jacques, J., & Preda, C. (2013). Funclust: A curves clustering method using functional random variable density approximation. Neurocomputing, 112, 164–171.

    Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.

    Google Scholar 

  • Jin, C., Zhang, Y., Balakrishnan, S., Wainwright, M. and Jordan, M. (2016) Local maxima in the likelihood of gaussian mixture models: Structural results and algorithmic consequences. In: ‘Thirtieth Conference on Neural Information Processing Systems, NeurIPS 2016’

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley

  • Keribin, C. (2021). Cluster or co-cluster the nodes of oriented graphs? Journal de la Société Française de Statistique, 162(1), 46–69.

    MathSciNet  MATH  Google Scholar 

  • Keribin, C., Brault, V., Celeux, G., & Govaert, G. (2015). Estimation and selection for the latent block model on categorical data. Statistics and Computing, 25(6), 1201–1216.

    MathSciNet  MATH  Google Scholar 

  • Keribin, C., Brault, V., Celeux, G., Govaert, G. et al. (2012) Model selection for the binary latent block model. In: ‘Proceedings of COMPSTAT’, 2012

  • Keuper, M., Tang, S., Andres, B., Brox, T., & Schiele, B. (2020). Motion segmentation & multiple object tracking by correlation co-clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 140–153.

    Google Scholar 

  • Laclau, C., & Nadif, M. (2016). Hard and fuzzy diagonal co-clustering for document-term partitioning. Neurocomputing, 193(C), 133–147.

    Google Scholar 

  • Leger, J.-B., Barbillon, P., & Chiquet, J. (2020). blockmodels: Latent and stochastic block model estimation by a’V-EM’ algorithm. R package version, 1(1), 4.

    Google Scholar 

  • Li, G. (2020). Generalized co-clustering analysis via regularized alternating least squares. Computational Statistics & Data Analysis, 150, 106989.

    MathSciNet  MATH  Google Scholar 

  • Lian, C., Ruan, S., Denoeux, T., Li, H., & Vera, P. (2019). Joint tumor segmentation in PET-CT images using co-clustering and fusion based on belief functions. IEEE transactions on image processing, 28(2), 755–766.

    MathSciNet  MATH  Google Scholar 

  • Lomet, A., Govaert, G. and Grandvalet, Y. (2012a). Design of artificial data tables for co-clustering analysis, Technical report, Université de Technologie de Compiègne, France

  • Lomet, A., Govaert, G. and Grandvalet, Y. (2012b). Model selection in block clustering by the integrated classification likelihood. In: 20th International Conference on Computational Statistics (COMPSTAT 2012), Lymassol, France, pp. 519–530

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In: L. M. LeCam and J. Neyman, (Eds.), ‘Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability’, University of California Press, pp. 281–297

  • Madeira, S. C. and Oliveira, A. L. (2004). ‘Biclustering algorithms for biological data analysis : A survey’. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 24–45

  • Malsiner-Walli, G., Frühwirth-Schnatter, S., & Grün, B. (2016). Model-based clustering based on sparse finite Gaussian mixtures. Statistics and Computing, 26, 303–324.

    MathSciNet  MATH  Google Scholar 

  • Marbac, M., & Sedki, M. (2017). Variable selection for model-based clustering using the integrated complete-data likelihood. Statistics and Computing, 27, 1049–1063.

    MathSciNet  MATH  Google Scholar 

  • Marchello, G., Fresse, A., Corneli, M., & Bouveyron, C. (2022). Co-clustering of evolving count matrices with the dynamic latent block model: Application to pharmacovigilance. Statistics and Computing, 32(3), 1–22.

    MathSciNet  MATH  Google Scholar 

  • Mariadassou, M., & Matias, C. (2015). Convergence of the groups posterior distribution in latent or stochastic block models. Bernoulli, 21(1), 537–573.

    MathSciNet  MATH  Google Scholar 

  • Matias, C., & Robin, S. (2014). Modeling heterogeneity in random graphs through latent space models: A selective review. ESAIM: Proceedings and Surveys, 47, 55–74.

    MathSciNet  MATH  Google Scholar 

  • Maugis, C., Celeux, G., & Martin-Magniette, M.-L. (2009). Variable selection in model-based clustering: A general variable role modeling. Computational Statistics & Data Analysis, 53(11), 3872–3882.

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G. J., & Krishnam, T. (1997). The EM algorithm and extensions. New York: Wiley.

    Google Scholar 

  • McLachlan, G., & Peel, D. (2000). Finite mixture models. New-York: Wiley.

    MATH  Google Scholar 

  • McNicholas, P. (2016). ‘Model-based clustering’. Journal of Classification 33

  • McParland, D. and Gormley, C. (2013). Algorithms from and for nature and life: Studies in classification, data analysis, and knowledge organization, Springer, Switzerland, chapter Clustering Ordinal Data via Latent Variable Models, pp. 127–135

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of American Statistical Association, 66, 846–850.

    Google Scholar 

  • Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2), 195–239.

    MathSciNet  MATH  Google Scholar 

  • Robert, V. (2017). Classification croisee pour l’analyse de bases de donnees de grandes dimensions de pharmacovigilance, PhD thesis, Université Paris-Sud.

  • Robert, V. (2021). bikm1: Co-clustering adjusted Rand index and bikm1 procedure for contingency and binary data-sets. R package version 1.1.0

  • Robert, V., Celeux, G. and Keribin, C. (2015). Un modèle statistique pour la pharmacovigilance, in ‘47èmes Journées de Statistique de la SFdS’

  • Robert, V., Vasseur, Y., & Brault, V. (2021). Comparing high-dimensional partitions with the co-clustering adjusted Rand index. Journal of Classification, 38(1), 158–186.

    MathSciNet  MATH  Google Scholar 

  • Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878–1915.

    MathSciNet  MATH  Google Scholar 

  • Sedki, M., Celeux, G., & Maugis-Rabusseau, C. (2014). SelvarMix: A R package for variable selection in model-based clustering and discriminant analysis with a regularization approach. Inria: Research report.

    MATH  Google Scholar 

  • Selosse, M., Gourru, A., Jacques, J. and Velcin, J. (2019). Tri-clustering pour données de comptage. In: 51èmes Journées de Statistique de la SFdS

  • Selosse, M., Jacques, J., & Biernacki, C. (2020). Model-based co-clustering for mixed type data. Computational Statistics & Data Analysis, 144, 106866.

    MathSciNet  MATH  Google Scholar 

  • Selosse, M., Jacques, J., & Biernacki, C. (2020). ordinalClust: Ordinal data clustering, co-clustering and classification. R package version, 1(3), 5.

    Google Scholar 

  • Selosse, M., Jacques, J., & Biernacki, C. (2020). Textual data summarization using the self-organized co-clustering model. Pattern Recognition, 103, 107315.

    Google Scholar 

  • Selosse, M., Jacques, J., & Biernacki, C. (2021). mixedClust: Co-clustering of mixed type data. R package version, 1, 2.

    Google Scholar 

  • Selosse, M., Jacques, J., Biernacki, C., & Cousson-Gélie, F. (2019). Analyzing health quality survey using constrained co-clustering model for ordinal data and some dynamic implication. Journal of the Royal Statistical Society: Series C Applied Statistics, 68(5), 1327–1349.

    MathSciNet  Google Scholar 

  • Singh Bhatia, P., Iovleff, S., & Govaert, G. (2017). blockcluster: An R package for model-based co-clustering. Journal of Statistical Software, 76(9), 1–24.

    Google Scholar 

  • Sportisse, A., Marbac, M., Biernacki, C., Boyer, C., Celeux, G., Laporte, F. and Josse, J. (2021). ‘Model-based clustering with missing not at random data’

  • Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 62(4), 795–809.

    MathSciNet  MATH  Google Scholar 

  • Tokuda, T., Yoshimoto, J., Shimizu, Y., Okada, G., Takamura, M., Okamoto, Y., Yamawaki, S. and Doya, K. (2017). ‘Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions’, PLoS ONE 12

  • Ullah, S., Daud, H., Dass, S. C., Khan, H. N. and Khalil, A. (2017). Detecting space-time disease clusters with arbitrary shapes and sizes using a co-clustering approach. Geospatial Health 12(2)

  • Vandewalle, V., Preda, C. and Dabo-Niang, S. (2020). Clustering spatial functional data, In: J. Mateu and R. Giraldo, (Eds.), ‘Geostatistical Functional Data Analysis : Theory and Methods’, John Wiley and Sons, Chichester, UK

  • Vermunt, J. and Magidson, J. (2005). Technical guide for latent GOLD 4.0: Basic and advanced, Statistical Innovations Inc., Belmont, Massachusetts

  • Wang, X., Yu, G., Domeniconi, C., Wang, J., Yu, Z. and Zhang, Z. (2018). Multiple co-clusterings. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1308–1313

  • Wang, Y. R., & Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics, 45(2), 500–528.

    MathSciNet  MATH  Google Scholar 

  • Wyse, J., & Friel, N. (2012). Block clustering with collapsed latent block models. Statistics and Computing, 22, 415–428.

    MathSciNet  MATH  Google Scholar 

  • Wyse, J., Friel, N., & Latouche, P. (2017). Inferring structure in bipartite networks using the latent blockmodel and exact ICL. Network Science, 5(1), 45–69.

    Google Scholar 

  • Xu, D., & Jie Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2, 165–193.

    Google Scholar 

  • Xu, G., Zong, Y., Dolog, P., & Zhang, Y. (2010). Co-clustering analysis of weblogs using bipartite spectral projection approach. Knowledge-Based and Intelligent Information and Engineering Systems (pp. 398–407). Cardiff: Springer.

    Google Scholar 

  • Zeng, P., Wangwu, J. and Lin, Z. (2020). Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data. Briefings in Bioinformatics 22(4)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Biernacki.

Ethics declarations

The authors declare that this manuscript is original, has not been published before, and is not currently being considered for publication elsewhere. The authors declare no competing interests. The authors confirm that the manuscript has been read and approved by all named authors and that there are no other persons meeting the criteria for authorship but who are not listed. The authors further confirm that the order of authors listed in the manuscript has been approved by all of them. The authors confirm that they have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing, the authors confirm that they have followed the regulations of their institutions concerning intellectual property. The authors understand that the corresponding author is the sole contact for the editorial process (including editorial manager and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. The authors confirm that they have provided a current, correct email address which is accessible by the corresponding author. Finally, the authors declare that no data sets were generated or analyzed during the current study, since it is a review article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biernacki, C., Jacques, J. & Keribin, C. A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges. J Classif 40, 332–381 (2023). https://doi.org/10.1007/s00357-023-09441-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-023-09441-3

Keywords

Navigation