Skip to main content

Co-Clustering for Object by Variable Data Matrices

  • Chapter
  • First Online:
Advanced Studies in Behaviormetrics and Data Science

Part of the book series: Behaviormetrics: Quantitative Approaches to Human Behavior ((BQAHB,volume 5))

Abstract

Co-clustering means the simultaneous clustering of the rows and columns of a two-dimensional data table (biclustering, two-way clustering), in contrast to separately clustering the rows and the columns. Practical applications may be met, e.g., in economics, social sciences, bioinformatics, etc. Various co-clustering models, criteria, and algorithms have been proposed that differ with respect to the considered data types (real-valued, integers, binary data, contingency tables), and also the meaning of rows and columns (samples, variables, factors, time,...). This paper concentrates on the case where rows correspond to (independent) samples or objects, and columns to (typically dependent) variables. We emphasize that here, in general, different similarity or homogeneity concepts must be used for rows and columns. We propose two probabilistic co-clustering approaches: a situation where clusters of objects and of variables refer to two different distribution parameters, and a situation where clusters of ‘highly correlated’ variables (by regression to a latent class-specific factor) are crossed with object clusters that are distinguished by additive effects only. We emphasize here the classical ‘classification approach’, where maximum likelihood criteria are optimized by generalized alternating k-means type algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering. Algorithms and applications. Boca Raton, Florida: CRC Press, Taylor & Francis.

    Google Scholar 

  2. Alfò, M., Martella, F., & Vichi, M. (2008). Biclustering of gene expression data by an extension of mixtures of factor analyzers. The International Journal of Biostatistics, 4(1), Article 3.

    Google Scholar 

  3. Arabie, P., Schleutermann, S., Daws, J., & Hubert, L. (1988). Marketing applications of sequencing and partitioning on nonsymmetric and/or two-mode matrices. In W. Gaul & M. Schader (Eds.), Data, expert knowledge and decisions (pp. 215–224). Heidelberg: Springer Verlag.

    Google Scholar 

  4. Baier, D., Gaul, W., & Schader, M. (1997). Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In R. Klar, & O. Opitz (Eds.), Classification and knowledge organization. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 9, pp. 557–566). Berlin, Germany: Springer.

    Google Scholar 

  5. Basu, S., Davidson, I., & Wagstaff, K. L. (2009). Constrained clustering. Boca Raton, Florida: Chapman & Hall/CRC, Francis & Taylor.

    MATH  Google Scholar 

  6. Bocci, L., Vicari, D., & Vichi, M. (2006). A mixture model for the classification of three-way proximity data. Computational Statistics and Data Analysis, 50, 1625–1654.

    Article  MathSciNet  MATH  Google Scholar 

  7. Bock, H.-H. (1974). Automatische Klassifikation. Göttingen: Vandenhoeck & Ruprecht.

    Google Scholar 

  8. Bock, H.-H. (1980). Simultaneous clustering of objects and variables. In R. Tomassone, M. Amirchahy, & D. Néel (Eds.), Analyse de données et informatique (pp. 187–203). Le Chesnay, France: INRIA.

    Google Scholar 

  9. Bock, H.-H. (1996). Probability models and hypothesis testing in partitioning cluster analysis. In P. Arabie, L. J. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 377–453). Singapore: World Scientific.

    Google Scholar 

  10. Bock, H.-H. (2003). Two-way clustering for contingency tables: Maximizing a dependence measure. In M. Schader, W. Gaul, M. Vichi (Eds.), Between data science and applied data analysis. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 24, pp. 143–154). Berlin, Germany: Springer.

    Google Scholar 

  11. Bock, H.-H. (2004). Convexity-based clustering criteria: Theory, algorithms, and applications in statistics. Statistical Methods & Applications, 12, 293–314.

    Article  MathSciNet  MATH  Google Scholar 

  12. Bock, H.-H. (2016). Probabilistic two-way clustering approaches with emphasis on the maximum interaction criterion. Archives of Data Science, Series A, 1(1), 3–20.

    Google Scholar 

  13. Cariou, V., & Wilderjahns, T. (2019). Constrained three-way clustering around latent variables approach. Paper presented at the 16th conference of the International Federation of Classification Societies (IFCS-2019), Thessaloniki, Greece, 28 August 2019.0

    Google Scholar 

  14. Charrad, M., & Ben Ahmed, M. (2011). Simultaneous clustering: A survey. In S. O. Kuznetsov, et al. (Eds.), Pattern recognition and data mining, LNCS 6744 (pp. 370–375). Heidelberg: Springer Verlag.

    Google Scholar 

  15. Chavent, M., Liquet, B., Kuentz-Simonet, V., & Saracco, J. (2012). ClustOfVar: An R package for the clustering of variables. Journal of Statistical Software, 50, 1–16.

    Article  Google Scholar 

  16. Cheng, Y., & Church, G. M. (2000). Biclustering of expression data. In Proceedings 8th international conference on intelligent systems for molecular biology (pp. 93–103).

    Google Scholar 

  17. Cho, H., & Dhillon, I. S. (2008). Co-clustering of human cancer microarrays using minimum sum-squared residue co-clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(3), 385–400.

    Article  Google Scholar 

  18. Gaul, W., & Schader, M. (1996). A new algorithm for two-mode clustering. In H. -H. Bock & W. Polasek (Eds.), Data analysis and information systems. Statistical and conceptual approaches. Studies in Classification, Data Analysis, and Knowledge Organization (vol. 7, pp. 15–23). Heidelberg, Germany: Springer.

    Google Scholar 

  19. Dhillon, I. S. (2001). Co-clustering documents and words using bipartite graph partitioning. In Proceedings of 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01 (pp. 269–274). New York: ACM.

    Google Scholar 

  20. Govaert, G. (1995). Simultaneous clustering of rows and columns. Control and Cybernetics, 24(4), 437–458.

    MATH  Google Scholar 

  21. Govaert, G., & Nadif, M. (2003). Clustering with block mixture models. Pattern Recognition, 36(2), 463–473.

    Google Scholar 

  22. Govaert, G., & Nadif, M. (2008). Block clustering with Bernoulli mixture models: Comparison of different approaches. Computational Statistics and Data Analysis, 52(6), 3233–3245.

    Article  MathSciNet  MATH  Google Scholar 

  23. Govaert, G., & Nadif, M. (2013). Co-clustering. Chichester, UK: Wiley.

    Book  MATH  Google Scholar 

  24. Govaert, G., & Nadif, M. (2018). Mutual information, phi-squared and model-based co-clustering for contingency tables. Advances in Data Analysis and Classification, 12, 455–488.

    Article  MathSciNet  MATH  Google Scholar 

  25. Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2016). Handbook of cluster analysis. Boca Raton, Florida: CRC Press, Taylor & Francis.

    MATH  Google Scholar 

  26. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, New Jersey: Prentice Hall.

    MATH  Google Scholar 

  27. Kiers, H. A. L., Vicari, D., & Vichi, M. (2005). Simultaneous classification and multidimensional scaling with external information. Psychometrika, 70, 433–460.

    Article  MathSciNet  MATH  Google Scholar 

  28. Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE Transaction on Computational Biology and Bioinformatics, 1(1), 24–45.

    Article  Google Scholar 

  29. Martella, F., & Vichi, M. (2012). Clustering microarray data using model-based double k-means. Journal of Applied Statistics, 39(9), 1853–1869.

    Google Scholar 

  30. Martella, F., Alfò, M., & Vichi, M. (2010). Hierarchical mixture models for biclustering in microarray data. Statistical Modelling, 11(6), 489–505.

    Article  MathSciNet  MATH  Google Scholar 

  31. McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). Hoboken, New Jersey: Wiley.

    Book  MATH  Google Scholar 

  32. Miyamoto, S., Ichihashi, H., & Honda, K. (2008). Algorithms for fuzzy clustering. Heidelberg: Springer Verlag.

    MATH  Google Scholar 

  33. Pontes, B., Giràldez, R., & Aguilar-Ruiz, J. S. (2015). Biclustering on expression data: A review. ScienceDirect, 57, 163–180.

    Google Scholar 

  34. Rocci, R., & Vichi, M. (2008). Two-mode partitioning. Computational Statistics and Data Analysis, 52, 1984–2003.

    Article  MathSciNet  MATH  Google Scholar 

  35. Salah, A., & Nadif, M. (2019). Directional co-clustering. Advances in Data Analysis and Classification, 13(3), 591–620.

    Article  MathSciNet  MATH  Google Scholar 

  36. Schepers, J., & Hofmans, J. (2009). TwoMP: A MATLAB graphical user interface for two-mode partitioning. Behavioral Research Methods, 41, 507–514.

    Article  Google Scholar 

  37. Schepers, J., Van Mechelen, I., & Ceulemans, E. (2006). Three-mode partitioning. Computational Statistics and Data Analysis, 51, 1623–1642.

    Article  MathSciNet  MATH  Google Scholar 

  38. Schepers, J., Bock, H.-H., & Van Mechelen, I. (2013). Maximal interaction two-mode clustering. Journal of Classification, 34(1), 49–75.

    Article  MathSciNet  MATH  Google Scholar 

  39. Turner, H. L., Bailey, T. C., Krzanowski, W. J., & Hemmingway, C. A. (2005). Biclustering models for structured microarray data. IEEE Tansactions on Computational Biology and Bioinformatics, 2(4), 316–329.

    Article  Google Scholar 

  40. Van Mechelen, I., Bock, H.-H., & De Boeck, P. (2004). Two-mode clustering methods: A structured overview. Statistical Methods in Medical Research, 13, 363–394.

    Article  MathSciNet  MATH  Google Scholar 

  41. Vichi, M. (2001). Double k-means clustering for simultaneous classification of objects and variables. In S. Borra, M. Rocci, M. Vichi, & M. Schader (Eds.), Advances in classification and data analysis19, 43–52. Heidelberg: Springer.

    Google Scholar 

  42. Vichi, M., Rocci, R., & Kiers, H. A. L. (2007). Simultaneous component and clustering models for three way data: Within and between approaches. Journal of Classification, 24, 71–98.

    Article  MathSciNet  MATH  Google Scholar 

  43. Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics, Simulation and Computation, 32(4), 1131–1150.

    Article  MathSciNet  MATH  Google Scholar 

  44. Wilderjans, T. F., & Cariou, C. (2016). CLV3W: A clustering around variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food Quality and Preference, 47, 45–53.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans-Hermann Bock .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bock, HH. (2020). Co-Clustering for Object by Variable Data Matrices. In: Imaizumi, T., Nakayama, A., Yokoyama, S. (eds) Advanced Studies in Behaviormetrics and Data Science. Behaviormetrics: Quantitative Approaches to Human Behavior, vol 5. Springer, Singapore. https://doi.org/10.1007/978-981-15-2700-5_1

Download citation

Publish with us

Policies and ethics