Co-Clustering for Object by Variable Data Matrices

Bock, Hans-Hermann

doi:10.1007/978-981-15-2700-5_1

Hans-Hermann Bock⁵

Part of the book series: Behaviormetrics: Quantitative Approaches to Human Behavior ((BQAHB,volume 5))

607 Accesses
5 Citations

Abstract

Co-clustering means the simultaneous clustering of the rows and columns of a two-dimensional data table (biclustering, two-way clustering), in contrast to separately clustering the rows and the columns. Practical applications may be met, e.g., in economics, social sciences, bioinformatics, etc. Various co-clustering models, criteria, and algorithms have been proposed that differ with respect to the considered data types (real-valued, integers, binary data, contingency tables), and also the meaning of rows and columns (samples, variables, factors, time,...). This paper concentrates on the case where rows correspond to (independent) samples or objects, and columns to (typically dependent) variables. We emphasize that here, in general, different similarity or homogeneity concepts must be used for rows and columns. We propose two probabilistic co-clustering approaches: a situation where clusters of objects and of variables refer to two different distribution parameters, and a situation where clusters of ‘highly correlated’ variables (by regression to a latent class-specific factor) are crossed with object clusters that are distinguished by additive effects only. We emphasize here the classical ‘classification approach’, where maximum likelihood criteria are optimized by generalized alternating k-means type algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering. Algorithms and applications. Boca Raton, Florida: CRC Press, Taylor & Francis.
Google Scholar
Alfò, M., Martella, F., & Vichi, M. (2008). Biclustering of gene expression data by an extension of mixtures of factor analyzers. The International Journal of Biostatistics, 4(1), Article 3.
Google Scholar
Arabie, P., Schleutermann, S., Daws, J., & Hubert, L. (1988). Marketing applications of sequencing and partitioning on nonsymmetric and/or two-mode matrices. In W. Gaul & M. Schader (Eds.), Data, expert knowledge and decisions (pp. 215–224). Heidelberg: Springer Verlag.
Google Scholar
Baier, D., Gaul, W., & Schader, M. (1997). Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In R. Klar, & O. Opitz (Eds.), Classification and knowledge organization. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 9, pp. 557–566). Berlin, Germany: Springer.
Google Scholar
Basu, S., Davidson, I., & Wagstaff, K. L. (2009). Constrained clustering. Boca Raton, Florida: Chapman & Hall/CRC, Francis & Taylor.
MATH Google Scholar
Bocci, L., Vicari, D., & Vichi, M. (2006). A mixture model for the classification of three-way proximity data. Computational Statistics and Data Analysis, 50, 1625–1654.
Article MathSciNet MATH Google Scholar
Bock, H.-H. (1974). Automatische Klassifikation. Göttingen: Vandenhoeck & Ruprecht.
Google Scholar
Bock, H.-H. (1980). Simultaneous clustering of objects and variables. In R. Tomassone, M. Amirchahy, & D. Néel (Eds.), Analyse de données et informatique (pp. 187–203). Le Chesnay, France: INRIA.
Google Scholar
Bock, H.-H. (1996). Probability models and hypothesis testing in partitioning cluster analysis. In P. Arabie, L. J. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 377–453). Singapore: World Scientific.
Google Scholar
Bock, H.-H. (2003). Two-way clustering for contingency tables: Maximizing a dependence measure. In M. Schader, W. Gaul, M. Vichi (Eds.), Between data science and applied data analysis. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 24, pp. 143–154). Berlin, Germany: Springer.
Google Scholar
Bock, H.-H. (2004). Convexity-based clustering criteria: Theory, algorithms, and applications in statistics. Statistical Methods & Applications, 12, 293–314.
Article MathSciNet MATH Google Scholar
Bock, H.-H. (2016). Probabilistic two-way clustering approaches with emphasis on the maximum interaction criterion. Archives of Data Science, Series A, 1(1), 3–20.
Google Scholar
Cariou, V., & Wilderjahns, T. (2019). Constrained three-way clustering around latent variables approach. Paper presented at the 16th conference of the International Federation of Classification Societies (IFCS-2019), Thessaloniki, Greece, 28 August 2019.0
Google Scholar
Charrad, M., & Ben Ahmed, M. (2011). Simultaneous clustering: A survey. In S. O. Kuznetsov, et al. (Eds.), Pattern recognition and data mining, LNCS 6744 (pp. 370–375). Heidelberg: Springer Verlag.
Google Scholar
Chavent, M., Liquet, B., Kuentz-Simonet, V., & Saracco, J. (2012). ClustOfVar: An R package for the clustering of variables. Journal of Statistical Software, 50, 1–16.
Article Google Scholar
Cheng, Y., & Church, G. M. (2000). Biclustering of expression data. In Proceedings 8th international conference on intelligent systems for molecular biology (pp. 93–103).
Google Scholar
Cho, H., & Dhillon, I. S. (2008). Co-clustering of human cancer microarrays using minimum sum-squared residue co-clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(3), 385–400.
Article Google Scholar
Gaul, W., & Schader, M. (1996). A new algorithm for two-mode clustering. In H. -H. Bock & W. Polasek (Eds.), Data analysis and information systems. Statistical and conceptual approaches. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 7, pp. 15–23). Heidelberg, Germany: Springer.
Google Scholar
Dhillon, I. S. (2001). Co-clustering documents and words using bipartite graph partitioning. In Proceedings of 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01 (pp. 269–274). New York: ACM.
Google Scholar
Govaert, G. (1995). Simultaneous clustering of rows and columns. Control and Cybernetics, 24(4), 437–458.
MATH Google Scholar
Govaert, G., & Nadif, M. (2003). Clustering with block mixture models. Pattern Recognition, 36(2), 463–473.
Google Scholar
Govaert, G., & Nadif, M. (2008). Block clustering with Bernoulli mixture models: Comparison of different approaches. Computational Statistics and Data Analysis, 52(6), 3233–3245.
Article MathSciNet MATH Google Scholar
Govaert, G., & Nadif, M. (2013). Co-clustering. Chichester, UK: Wiley.
Book MATH Google Scholar
Govaert, G., & Nadif, M. (2018). Mutual information, phi-squared and model-based co-clustering for contingency tables. Advances in Data Analysis and Classification, 12, 455–488.
Article MathSciNet MATH Google Scholar
Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2016). Handbook of cluster analysis. Boca Raton, Florida: CRC Press, Taylor & Francis.
MATH Google Scholar
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, New Jersey: Prentice Hall.
MATH Google Scholar
Kiers, H. A. L., Vicari, D., & Vichi, M. (2005). Simultaneous classification and multidimensional scaling with external information. Psychometrika, 70, 433–460.
Article MathSciNet MATH Google Scholar
Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE Transaction on Computational Biology and Bioinformatics, 1(1), 24–45.
Article Google Scholar
Martella, F., & Vichi, M. (2012). Clustering microarray data using model-based double k-means. Journal of Applied Statistics, 39(9), 1853–1869.
Google Scholar
Martella, F., Alfò, M., & Vichi, M. (2010). Hierarchical mixture models for biclustering in microarray data. Statistical Modelling, 11(6), 489–505.
Article MathSciNet MATH Google Scholar
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). Hoboken, New Jersey: Wiley.
Book MATH Google Scholar
Miyamoto, S., Ichihashi, H., & Honda, K. (2008). Algorithms for fuzzy clustering. Heidelberg: Springer Verlag.
MATH Google Scholar
Pontes, B., Giràldez, R., & Aguilar-Ruiz, J. S. (2015). Biclustering on expression data: A review. ScienceDirect, 57, 163–180.
Google Scholar
Rocci, R., & Vichi, M. (2008). Two-mode partitioning. Computational Statistics and Data Analysis, 52, 1984–2003.
Article MathSciNet MATH Google Scholar
Salah, A., & Nadif, M. (2019). Directional co-clustering. Advances in Data Analysis and Classification, 13(3), 591–620.
Article MathSciNet MATH Google Scholar
Schepers, J., & Hofmans, J. (2009). TwoMP: A MATLAB graphical user interface for two-mode partitioning. Behavioral Research Methods, 41, 507–514.
Article Google Scholar
Schepers, J., Van Mechelen, I., & Ceulemans, E. (2006). Three-mode partitioning. Computational Statistics and Data Analysis, 51, 1623–1642.
Article MathSciNet MATH Google Scholar
Schepers, J., Bock, H.-H., & Van Mechelen, I. (2013). Maximal interaction two-mode clustering. Journal of Classification, 34(1), 49–75.
Article MathSciNet MATH Google Scholar
Turner, H. L., Bailey, T. C., Krzanowski, W. J., & Hemmingway, C. A. (2005). Biclustering models for structured microarray data. IEEE Tansactions on Computational Biology and Bioinformatics, 2(4), 316–329.
Article Google Scholar
Van Mechelen, I., Bock, H.-H., & De Boeck, P. (2004). Two-mode clustering methods: A structured overview. Statistical Methods in Medical Research, 13, 363–394.
Article MathSciNet MATH Google Scholar
Vichi, M. (2001). Double k-means clustering for simultaneous classification of objects and variables. In S. Borra, M. Rocci, M. Vichi, & M. Schader (Eds.), Advances in classification and data analysis19, 43–52. Heidelberg: Springer.
Google Scholar
Vichi, M., Rocci, R., & Kiers, H. A. L. (2007). Simultaneous component and clustering models for three way data: Within and between approaches. Journal of Classification, 24, 71–98.
Article MathSciNet MATH Google Scholar
Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics, Simulation and Computation, 32(4), 1131–1150.
Article MathSciNet MATH Google Scholar
Wilderjans, T. F., & Cariou, C. (2016). CLV3W: A clustering around variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food Quality and Preference, 47, 45–53.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistics, RWTH Aachen University, Aachen, Germany
Hans-Hermann Bock

Authors

Hans-Hermann Bock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans-Hermann Bock .

Editor information

Editors and Affiliations

School of Management and Information Sciences, Tama University, Tokyo, Japan
Tadashi Imaizumi
Graduate School of Management, Tokyo Metropolitan University, Tokyo, Japan
Atsuho Nakayama
School of Business, Aoyama Gakuin University, Tokyo, Japan
Satoru Yokoyama

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bock, HH. (2020). Co-Clustering for Object by Variable Data Matrices. In: Imaizumi, T., Nakayama, A., Yokoyama, S. (eds) Advanced Studies in Behaviormetrics and Data Science. Behaviormetrics: Quantitative Approaches to Human Behavior, vol 5. Springer, Singapore. https://doi.org/10.1007/978-981-15-2700-5_1

Download citation

DOI: https://doi.org/10.1007/978-981-15-2700-5_1
Published: 18 April 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2699-2
Online ISBN: 978-981-15-2700-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics