Abstract
Most classical approaches for two-mode clustering of a data matrix are designed to attain homogeneous row by column clusters (blocks, biclusters), that is, biclusters with a small variation of data values within the blocks. In contrast, this article deals with methods that look for a biclustering with a large interaction between row and column clusters. Thereby an aggregated, condensed representation of the existing interaction structure is obtained, together with corresponding row and column clusters, which both allow a parsimonious visualization and interpretation. In this paper we provide a statistical justification, in terms of a probabilistic model, for a two-mode interaction clustering criterion that has been proposed by Bock (1980). Furthermore, we show that maximization of this criterion is equivalent to minimizing the classical least-squares two-mode partitioning criterion for the double-centered version of the data matrix. The latter implies that the interaction clustering criterion can be optimized by applying classical two-mode partitioning algorithms. We illustrate the usefulness of our approach for the case of an empirical data set from personality psychology and we compare this method with other biclustering approaches where interactions play a role.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
BAIER, D., GAUL, W., and SCHADER, M. (1997), “Two-Mode Overlapping Clustering with Applications to Simultaneous Benefit Segmentation and Market Structuring”, in Classification and Knowledge Organization, eds. R. Klar and O. Opitz, Berlin: Springer, pp. 557–566.
BANFIELD, J., and RAFTERY, A. (1993), “Model-Based Gaussian and Non-Gaussian Clustering, Biometrics, 49, 803–821.
BOCK, H-H. (1968), “Statistische Modelle für die Einfache und Doppelte Klassifikation von Normalverteilten Beobachtungen [Statistical Models for the One-Way and Two-Way Classification of Normally Distributed Observations], Ph. D. thesis, Albert-Ludwigs-Universität zu Freiburg, Germany.
BOCK, H-H. (1980), “Simultaneous Clustering of Objects and Variables”, in Analyse de Données et Informatique. Cours de la Commission des Communautés Européennes à Fontainebleau, 19-30 Mars 1979, eds. R. Tomassone, M. Amirchhay, and D. Néel, Le Chesnay, France: Institut National de Recherche en Informatique et en Automatique (INRIA), pp. 187–203.
BOCK, H-H. (1996), “Probabilistic Models in Cluster Analysis”, Computational Statistics and Data Analysis, 23, 5–28.
CARROLL, J., and ARABIE, P. (1980), “Multidimensional Scaling”, Annual Review of Psychology, 31, 607–649.
CASPI, A., and MOFFITT, T. (2006), “Gene-Environment Interactions in Psychiatry: Joining Forces with Neuroscience”, Nature Reviews Neuroscience, 7, 583–590.
CASTILLO, W., and TREJOS, J. (2002), “Two-Mode Partitioning: Review of Methods and Application of Tabu Search”, in Classification, Clustering, and Related Topics. Recent Advances and Applications. Studies in Classification, Data Analysis, and Knowledge Organization, eds. K. Jajuga, A. Sokolowski, and H-H. Bock, Heidelberg, Germany: Springer-Verlag, pp. 43–51.
CEULEMANS, E., and KIERS, H. (2006), “Selecting Among Three-Mode Principal Component Models of Different Types and Complexities: A Numerical Convex Hull Based Method”, British Journal of Mathematical and Statistical Psychology, 59, 133–150.
CHENG, Y., and CHURCH, G. (2000), “Biclustering of Expression Data”, in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 93–103.
CHO, H., DHILLON, I., GUAN, A., and SRA, S. (2004), “Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data”, in Proceedings of the 4th SIAM International Conference on Knowledge Discovery and Data Mining, pp. 124–125.
CORSTEN, L., and DENIS, J. (1990), “Structuring Interacion in Two-Way Tables by Clustering”, Biometrics, 46, 207–215.
FORKMAN, J., and PIEPHO, H.-P. (2014), “Parametric Bootstrap Methods for Testing Multiplicative Terms in GGE and AMMI Models”, Biometrics, 70, 639–647.
GABRIEL, K. (1971), “The Biplot Graphic Display of Matrices with Application to Principal Component Analysis”, Biometrika, 58, 453–467.
GAUCH, H. (2006), “Statistical Analysis of Yield Trials by AMMI and GGE”, Crop Science, 46, 1488–1500.
GAUCH, H., PIEPHO, H.-P., and ANNICCHIARICO, P. (2008), “Statistical Analysis of Yield Trials by AMMI and GGE: Further Considerations”, Crop Science, 48, 866–889.
GAUL, W., and SCHADER, M. (1996), “A New Algorithm for Two-Mode Clustering”, in Data Analysis and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization, eds. H-H. Bock and W. Polasek, Berlin, Germany: Springer, pp. 15–23.
GEISER, C., LITSON, K., BISHOP, J., KELLER, B., BURNS, G., SERVERA, M., and SHIFFMAN, S. (2015), “Analyzing Person, Situation and Person X Situation Interaction Effects: Latent State-Trait Models for the Combination of Random and Fixed Situations”, Psychological Methods, 20, 165–192.
GOLLOB, H. (1968), “A Statistical Model Which Combines Features of Factor Analytic and Analysis of Variance Techniques”, Psychometrika, 33, 73–115.
GOVAERT, G., and NADIF, M. (2013), Co-Clustering, Chichester, UK: Wiley.
GOWER, J., and HAND, D. (1996), Biplots, London, UK: Chapman & Hall.
HANSOHM, J. (2001), “Two-Mode Clustering with Genetic Algorithms”, in Classification, Automation, and New Media. Studies in Classification, Data Analysis, and Knowledge Organization, eds. W. Gaul and G. Ritter, Berlin, Germany: Springer, pp. 87–93.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193–218.
HUNTER, D. (2005), “Gene-Environment Interactions in Human Diseases”, Nature Reviews Genetics, 6, 287–298.
IOVLEFF, S., and SINGH BHATIA, P. (2015), “blockcluster: Coclustering Package for Binary, Categorical, Contingency and Continuous Data-Sets”, R package version 4.0.2, https://CRAN.R-project.org/package=blockcluster.
KIERS, H. (2004), “Clustering All Three Modes of Three-Mode Data: Computational Posibilities and Problems”, in Proceedings in Computational Statistics, ed. J.Antoch, Heidelberg, Germany: Springer, pp. 303–313.
MADEIRA, S., and OLIVEIRA, A. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey”, IEEE Transactions on Computational Biology and Bioinformatics, 1, 24–45.
MCLACHLAN, G. (1982), “The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis”, in Handbook of Statistics (Vol.2), eds. P.R. Krishnaiah and L.N. Kanal, Amsterdam: North-Holland, pp. 199–208.
MISCHEL, W., and SHODA, Y. (1995), “A Cognitive-Affective System Theory of Personality: Reconceptualizing Situations, Dispositions, Dynamics, and Invariance in Personality Structure”, Psychological Review, 102, 246–268.
MISCHEL,W., and SHODA, Y. (1998), “Reconciling Processing Dynamics and Personality Dispositions”, Annual Review of Psychology, 49, 229–258.
MOFFITT, T., CASPI, A., and RUTTER, M. (2006), “Measured Gene-Environment Interactions in Psychopathology: Concepts, Research Strategies, and Implications for Research, Intervention, and Public Understanding of Genetics”, Perspectives on Psychological Science, 1, 5–27.
NATIONAL INSTITUTE OF ENVIRONMENTAL HEALTH SCIENCES (2016), “Gene-Environment Interaction”, retrieved November 1, 2016 from http://www.niehs.nih.gov/health/topics/science/gene-env/.
PIEPHO, H.-P. (1997), “Analyzing Genotype-Environment Data by Mixed Models with Multiplicative Terms”, Biometrics, 53, 761–766.
PIEPHO, H.-P. (1999), “Fitting a Regression Model for Genotype by Environment Data on Heading Dates in Grasses by Methods for Nonlinear Mixed Models”, Biometrics, 55, 1120–1128.
QUINTIENS, G. (1999), “Een Interactionistische Benadering van Individuele Verschillen in Helpen en Laten Helpen [An Interactionist Approach to Individual Differences in Helping and Allowing to Help]”, unpublished master’s thesis, KULeuven, Belgium.
ROCCI, R., and VICHI,M. (2008), “Two-Mode Multi-Partitioning”, Computational Statistics and Data Analysis, 52, 1984–2003.
SCHEPERS, J., CEULEMANS, E., and VAN MECHELEN, I. (2008), “Selecting Among Multi-Mode Partitioning Models of Different Complexities: A Comparison of Four Model Selection Criteria”, Journal of Classification, 25, 67–85.
SCHEPERS, J., and HOFMANS, J. (2009), “TwoMP: A MATLAB Graphical User Interface for Two-Mode Partitioning”, Behavior Research Methods, 41, 507–514.
SCHEPERS, J., VAN MECHELEN, I., and CEULEMANS, E. (2006), “Three-Mode Partitioning”, Computational Statistics and Data Analysis, 51, 1623–1642.
SHAFII, B., and PRICE, W. (1998), “Analysis of Genotype-by-Environment Interaction Using the Additive Main Effects and Multiplicative Interaction Model and Stability Estimates, Journal of Agricultural, Biological, and Environmental Statistics, 3, 335–345.
SHODA, Y., WILSON, N., CHEN, J., GILMORE, A., and SMITH, R. (2013), “Cognitive-Affective Processing System Analysis of Intra-Individual Dynamics in Collaborative Therapeutic Assessment: Translating Basic Theory and Research into Clinical Applications”, Journal of Personality, 81, 554–1568.
SHODA, Y., WILSON, N., WHITSETT, D., LEE-DUSSUD, J., and ZAYAS, V. (2015), “The Person as a Cognitive Affective Processing System: Quantitative Idiography as an Integral Component of Cumulative Science”, in APA Handbook of Personality and Social Psychology: Vol.4. Personality Processes and Individal Differences, eds. M. Mikulincer and P. Shaver, American Psychological Association APA, Washington, pp. 491–513.
STEINLEY, D. (2004), “Properties of the Hubert-Arabie Adjusted Rand Index”, Psychological Methods, 9, 386–396.
TANAY, A., SHARAN, R., and SHAMIR, R. (2005), “Biclustering Algorithms: A Survey”, in Handbook of Computational Molecular Biology, ed. S. Aluru, Boca Raton: Chapman and Hall/CRC.
VAN MECHELEN, I., BOCK, H-H., and DE BOECK, P. (2004), “Two-Mode Clustering Methods: A Structured Overview”, Statistical Methods in Medical Research, 13, 363–394.
VAN ROSMALEN, J., GROENEN, P., TREJOS, J., and CASTILLO, W. (2009), “Optimization Strategies for Two-Mode Partitioning, Journal of Classification, 26, 155–181.
VICHI, M. (2001), “Double K-Means Clustering for Simultaneous Classification of Objects and Variables”, in Advances in Classification and Data Analysis, eds. S. Borra, R. Rocci, M. Vichi, and M. Schader, Berlin Heidelberg: Springer, pages 43–52.
WILDERJANS, T., CEULEMANS, E., and MEERS, K. (2013), “CHull: A Generic Convex Hull Based Model Selection Method”, Behavior Research Methods, 45, 1–15.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Schepers, J., Bock, HH. & Van Mechelen, I. Maximal Interaction Two-Mode Clustering. J Classif 34, 49–75 (2017). https://doi.org/10.1007/s00357-017-9226-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-017-9226-x