Abstract
Co-clustering means the simultaneous clustering of the rows and columns of a two-dimensional data table (biclustering, two-way clustering), in contrast to separately clustering the rows and the columns. Practical applications may be met, e.g., in economics, social sciences, bioinformatics, etc. Various co-clustering models, criteria, and algorithms have been proposed that differ with respect to the considered data types (real-valued, integers, binary data, contingency tables), and also the meaning of rows and columns (samples, variables, factors, time,...). This paper concentrates on the case where rows correspond to (independent) samples or objects, and columns to (typically dependent) variables. We emphasize that here, in general, different similarity or homogeneity concepts must be used for rows and columns. We propose two probabilistic co-clustering approaches: a situation where clusters of objects and of variables refer to two different distribution parameters, and a situation where clusters of ‘highly correlated’ variables (by regression to a latent class-specific factor) are crossed with object clusters that are distinguished by additive effects only. We emphasize here the classical ‘classification approach’, where maximum likelihood criteria are optimized by generalized alternating k-means type algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering. Algorithms and applications. Boca Raton, Florida: CRC Press, Taylor & Francis.
Alfò, M., Martella, F., & Vichi, M. (2008). Biclustering of gene expression data by an extension of mixtures of factor analyzers. The International Journal of Biostatistics, 4(1), Article 3.
Arabie, P., Schleutermann, S., Daws, J., & Hubert, L. (1988). Marketing applications of sequencing and partitioning on nonsymmetric and/or two-mode matrices. In W. Gaul & M. Schader (Eds.), Data, expert knowledge and decisions (pp. 215–224). Heidelberg: Springer Verlag.
Baier, D., Gaul, W., & Schader, M. (1997). Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In R. Klar, & O. Opitz (Eds.), Classification and knowledge organization. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 9, pp. 557–566). Berlin, Germany: Springer.
Basu, S., Davidson, I., & Wagstaff, K. L. (2009). Constrained clustering. Boca Raton, Florida: Chapman & Hall/CRC, Francis & Taylor.
Bocci, L., Vicari, D., & Vichi, M. (2006). A mixture model for the classification of three-way proximity data. Computational Statistics and Data Analysis, 50, 1625–1654.
Bock, H.-H. (1974). Automatische Klassifikation. Göttingen: Vandenhoeck & Ruprecht.
Bock, H.-H. (1980). Simultaneous clustering of objects and variables. In R. Tomassone, M. Amirchahy, & D. Néel (Eds.), Analyse de données et informatique (pp. 187–203). Le Chesnay, France: INRIA.
Bock, H.-H. (1996). Probability models and hypothesis testing in partitioning cluster analysis. In P. Arabie, L. J. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 377–453). Singapore: World Scientific.
Bock, H.-H. (2003). Two-way clustering for contingency tables: Maximizing a dependence measure. In M. Schader, W. Gaul, M. Vichi (Eds.), Between data science and applied data analysis. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 24, pp. 143–154). Berlin, Germany: Springer.
Bock, H.-H. (2004). Convexity-based clustering criteria: Theory, algorithms, and applications in statistics. Statistical Methods & Applications, 12, 293–314.
Bock, H.-H. (2016). Probabilistic two-way clustering approaches with emphasis on the maximum interaction criterion. Archives of Data Science, Series A, 1(1), 3–20.
Cariou, V., & Wilderjahns, T. (2019). Constrained three-way clustering around latent variables approach. Paper presented at the 16th conference of the International Federation of Classification Societies (IFCS-2019), Thessaloniki, Greece, 28 August 2019.0
Charrad, M., & Ben Ahmed, M. (2011). Simultaneous clustering: A survey. In S. O. Kuznetsov, et al. (Eds.), Pattern recognition and data mining, LNCS 6744 (pp. 370–375). Heidelberg: Springer Verlag.
Chavent, M., Liquet, B., Kuentz-Simonet, V., & Saracco, J. (2012). ClustOfVar: An R package for the clustering of variables. Journal of Statistical Software, 50, 1–16.
Cheng, Y., & Church, G. M. (2000). Biclustering of expression data. In Proceedings 8th international conference on intelligent systems for molecular biology (pp. 93–103).
Cho, H., & Dhillon, I. S. (2008). Co-clustering of human cancer microarrays using minimum sum-squared residue co-clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(3), 385–400.
Gaul, W., & Schader, M. (1996). A new algorithm for two-mode clustering. In H. -H. Bock & W. Polasek (Eds.), Data analysis and information systems. Statistical and conceptual approaches. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 7, pp. 15–23). Heidelberg, Germany: Springer.
Dhillon, I. S. (2001). Co-clustering documents and words using bipartite graph partitioning. In Proceedings of 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01 (pp. 269–274). New York: ACM.
Govaert, G. (1995). Simultaneous clustering of rows and columns. Control and Cybernetics, 24(4), 437–458.
Govaert, G., & Nadif, M. (2003). Clustering with block mixture models. Pattern Recognition, 36(2), 463–473.
Govaert, G., & Nadif, M. (2008). Block clustering with Bernoulli mixture models: Comparison of different approaches. Computational Statistics and Data Analysis, 52(6), 3233–3245.
Govaert, G., & Nadif, M. (2013). Co-clustering. Chichester, UK: Wiley.
Govaert, G., & Nadif, M. (2018). Mutual information, phi-squared and model-based co-clustering for contingency tables. Advances in Data Analysis and Classification, 12, 455–488.
Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2016). Handbook of cluster analysis. Boca Raton, Florida: CRC Press, Taylor & Francis.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, New Jersey: Prentice Hall.
Kiers, H. A. L., Vicari, D., & Vichi, M. (2005). Simultaneous classification and multidimensional scaling with external information. Psychometrika, 70, 433–460.
Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE Transaction on Computational Biology and Bioinformatics, 1(1), 24–45.
Martella, F., & Vichi, M. (2012). Clustering microarray data using model-based double k-means. Journal of Applied Statistics, 39(9), 1853–1869.
Martella, F., Alfò, M., & Vichi, M. (2010). Hierarchical mixture models for biclustering in microarray data. Statistical Modelling, 11(6), 489–505.
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). Hoboken, New Jersey: Wiley.
Miyamoto, S., Ichihashi, H., & Honda, K. (2008). Algorithms for fuzzy clustering. Heidelberg: Springer Verlag.
Pontes, B., Giràldez, R., & Aguilar-Ruiz, J. S. (2015). Biclustering on expression data: A review. ScienceDirect, 57, 163–180.
Rocci, R., & Vichi, M. (2008). Two-mode partitioning. Computational Statistics and Data Analysis, 52, 1984–2003.
Salah, A., & Nadif, M. (2019). Directional co-clustering. Advances in Data Analysis and Classification, 13(3), 591–620.
Schepers, J., & Hofmans, J. (2009). TwoMP: A MATLAB graphical user interface for two-mode partitioning. Behavioral Research Methods, 41, 507–514.
Schepers, J., Van Mechelen, I., & Ceulemans, E. (2006). Three-mode partitioning. Computational Statistics and Data Analysis, 51, 1623–1642.
Schepers, J., Bock, H.-H., & Van Mechelen, I. (2013). Maximal interaction two-mode clustering. Journal of Classification, 34(1), 49–75.
Turner, H. L., Bailey, T. C., Krzanowski, W. J., & Hemmingway, C. A. (2005). Biclustering models for structured microarray data. IEEE Tansactions on Computational Biology and Bioinformatics, 2(4), 316–329.
Van Mechelen, I., Bock, H.-H., & De Boeck, P. (2004). Two-mode clustering methods: A structured overview. Statistical Methods in Medical Research, 13, 363–394.
Vichi, M. (2001). Double k-means clustering for simultaneous classification of objects and variables. In S. Borra, M. Rocci, M. Vichi, & M. Schader (Eds.), Advances in classification and data analysis19, 43–52. Heidelberg: Springer.
Vichi, M., Rocci, R., & Kiers, H. A. L. (2007). Simultaneous component and clustering models for three way data: Within and between approaches. Journal of Classification, 24, 71–98.
Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics, Simulation and Computation, 32(4), 1131–1150.
Wilderjans, T. F., & Cariou, C. (2016). CLV3W: A clustering around variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food Quality and Preference, 47, 45–53.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Bock, HH. (2020). Co-Clustering for Object by Variable Data Matrices. In: Imaizumi, T., Nakayama, A., Yokoyama, S. (eds) Advanced Studies in Behaviormetrics and Data Science. Behaviormetrics: Quantitative Approaches to Human Behavior, vol 5. Springer, Singapore. https://doi.org/10.1007/978-981-15-2700-5_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-2700-5_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2699-2
Online ISBN: 978-981-15-2700-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)