ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data

Bocci, Laura; Vicari, Donatella

doi:10.1007/s11336-019-09686-1

ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data

Published: 13 September 2019

Volume 84, pages 941–985, (2019)
Cite this article

Psychometrika Aims and scope Submit manuscript

256 Accesses
3 Citations
Explore all metrics

Abstract

In the context of three-way proximity data, an INDCLUS-type model is presented to address the issue of subject heterogeneity regarding the perception of object pairwise similarity. A model, termed ROOTCLUS, is presented that allows for the detection of a subset of objects whose similarities are described in terms of non-overlapping clusters (ROOT CLUSters) common across all subjects. For the other objects, Individual partitions, which are subject specific, are allowed where clusters are linked one-to-one to the Root clusters. A sound ALS-type algorithm to fit the model to data is presented. The novel method is evaluated in an extensive simulation study and illustrated with empirical data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

Notes

The relative loss is defined here as the ratio of the raw loss to the total sum of squares of the data.
The MATLAB code is available online on SpringerLink with the article.
Given two matrices $\mathbf {A}$ and $\mathbf {B}$ with the same number J of columns, the Khatri–Rao product of $\mathbf {A}$ and $\mathbf {B}$ is the column-wise Kronecker product, i.e., $\mathbf {A} |\otimes | \mathbf {B} = (\mathbf {a}_1 \otimes \mathbf {b}_1, \ldots , \mathbf {a}_j \otimes \mathbf {b}_j, \ldots , \mathbf {a}_J \otimes \mathbf {b}_J )$ where $\mathbf {a}_j$ and $\mathbf {b}_j$ are the j-th ($j=1,\ldots ,J$) column of $\mathbf {A}$ and $\mathbf {B}$, respectively, and $\otimes $ denotes the Kronecker product.
The Kappa coefficient (KC) between two binary matrices is equal to the proportion of agreement between the two matrices (i.e., the proportion of the corresponding cells having the same values), corrected for chance (Wilderjans et al. 2012):
$$\begin{aligned} KC=\frac{(p_{00} + p_{11}) - (p_{0.}p_{.0} + p_{1.}p_{.1})}{1 - (p_{0.}p_{.0} + p_{1.}p_{.1})}, \end{aligned}$$
with $p_{00}$$(p_{11})$ being the proportion of corresponding cells that both are zero (one) and $p_{0.}$ and $p_{1.}$ ($p_{.0}$ and $p_{.1}$) the marginal proportion of zero- and one-cells for the first (second) matrix. Note that $p_{00}+p_{11}$ equals the (uncorrected) proportion of corresponding cells that have the same value.
Note that $w_2$ is missing here because the Root cluster $R_2$ is a singleton and the diagonal entries of the similarity matrices are not fitted in this application.

References

Bocci, L., & Vicari, D. (2017). GINDCLUS: Generalized INDCLUS with external information. Psychometrika, 82, 355–381.
Article Google Scholar
Bocci, L., Vicari, D., & Vichi, M. (2006). A mixture model for the classification of three-way proximity data. Computational Statistics & Data Analysis, 50, 1625–1654.
Article Google Scholar
Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.
Google Scholar
Carroll, J. D., & Arabie, P. (1983). INDCLUS: An individual differences generalization of ADCLUS model and the MAPCLUS algorithm. Psychometrika, 48, 157–169.
Article Google Scholar
Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-generalization of the Eckart–Young decomposition. Psychometrika, 35, 283–319.
Article Google Scholar
Chaturvedi, A., & Carroll, J. D. (2006). CLUSCALE (CLUstering and multidimensional SCAL[E]ing): A three-way hybrid model incorporating clustering and multidimensional scaling structure. Journal of Classification, 23, 269–299.
Article Google Scholar
Chaturvedi, A. J., & Carroll, J. D. (1994). An alternating combinatorial optimization approach to fitting the INDCLUS and generalized INDCLUS models. Journal of Classification, 11, 155–170.
Article Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
De Leeuw, J. (1994). Block-relaxation algorithms in statistics. In H. H. Bock, W. Lenski, & M. M. Richter (Eds.), Information systems and data analysis (pp. 308–325). Berlin: Springer.
Chapter Google Scholar
Giordani, P., & Kiers, H. A. L. (2012). FINDCLUS: Fuzzy INdividual Differences CLUStering. Journal of Classification, 29, 170–198.
Article Google Scholar
Gordon, A. D., & Vichi, M. (1998). Partitions of Partitions. Journal of Classification, 15, 265–285.
Article Google Scholar
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Hubert, L. J., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with MATLAB. Philadelphia: SIAM.
Book Google Scholar
Kiers, H. A. L. (1997). A modification of the SINDCLUS algorithm for fitting the ADCLUS and INDCLUS models. Journal of Classification, 14, 297–310.
Article Google Scholar
Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems. Englewood Cliffs: Prentice Hall.
Google Scholar
McDonald, R. P. (1980). A simple comprehensive model for the analysis of covariance structures: Some remarks on applications. British Journal of Mathematical and Statistical Psychology, 33, 161–183.
Article Google Scholar
Mirkin, B. G. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4, 7–31.
Article Google Scholar
Rao, C. R., & Mitra, S. (1971). Generalized inverse of matrices and its applications. New York: Wiley.
Google Scholar
Rocci, R., & Vichi, M. (2008). Two-mode multi-partitioning. Computational Statistics & Data Analysis, 52, 1984–2003.
Article Google Scholar
Shepard, R. N., & Arabie, P. (1979). Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86, 87–123.
Article Google Scholar
Schepers, J., Ceulemans, E., & Van Mechelen, I. (2008). Selecting among multi-mode partitioning models of different complexities: A comparison of four model selection criteria. Journal of Classification, 25, 67–85.
Article Google Scholar
Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling. London: Academic Press.
Google Scholar
Vicari, D., & Vichi, M. (2009). Structural classification analysis of three-way dissimilarity data. Journal of Classification, 26, 121–154.
Article Google Scholar
Vichi, M. (1999). One mode classification of a three-way data set. Journal of Classification, 16, 27–44.
Article Google Scholar
Wedel, M., & DeSarbo, W. S. (1998). Mixtures of (constrained) ultrametric trees. Psychometrika, 63, 419–443.
Article Google Scholar
Wilderjans, T. F., Depril, D., & Van Mechelen, I. (2012). Block-relaxation approaches for fitting the INDCLUS model. Journal of Classification, 29, 277–296.
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to the Associate Editor and referees for their valuable comments and suggestions which greatly improved the presentation and content of the first version.

Author information

Authors and Affiliations

Department of Communication and Social Research, Sapienza University of Rome, Rome, Italy
Laura Bocci
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Donatella Vicari

Authors

Laura Bocci
View author publications
You can also search for this author in PubMed Google Scholar
Donatella Vicari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donatella Vicari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 279 KB)

Appendix

In order to solve the constrained problem (14) in those cases when the diagonal entries of the H similarity matrices $\mathbf {S}_{h}$ are not of interest, the steps 1 to 4 of the ALS-type algorithm presented in Sect. 4 can be modified straightforwardly as follows.

Since only the off-diagonal elements of matrices $\mathbf {S}_{h}$ ($h=1,\ldots ,H$) need to be considered, the loss function (14) becomes

$$\begin{aligned} f_{\text {off}}(\mathbf {P}, \mathbf {W}, \mathbf {M}_h, \mathbf {V}_h, c_h)=\; F_{\text {off}}(\mathbf {P}, \mathbf {W}, \mathbf {M}_h, \mathbf {V}_h, c_h) +\lambda G, \end{aligned}$$

(23)

where

$$\begin{aligned} \begin{aligned}&F_{\text {off}}(\mathbf {P}, \mathbf {W}, \mathbf {M}_h, \mathbf {V}_h, c_h) = \\&\quad \frac{\sum _{h=1}^{H} \left\| \mathbf {S}_h - \mathbf {P} \mathbf {W} \mathbf {P}^\prime - \bigl (\mathbf {P}+\mathbf {M}_{h} \bigr ) \mathbf {V}_h \bigl (\mathbf {P}+\mathbf {M}_{h} \bigr )^\prime - c_{h} \mathbf {1}_N \mathbf {1}_{N}^{\prime } \right\| _{\text {off}}^2}{\sum _{h=1}^{H} \left\| \mathbf {S}_h \right\| _{\text {off}}^2} \end{aligned} \end{aligned}$$

(24)

and $\left\| \mathbf {Z} \right\| _{\text {off}}^2 = \sum _{x=1}^{X}\sum _{\begin{array}{c} y=1\; ;\; \end{array}{y\ne x}}^{Y} z_{xy}^2$.

In step 1, the loss function (23), instead of (14), is minimized over $\mathbf {P}$ and $\mathbf {M}_h$ ($h=1,\ldots ,H$).

In steps 2 to 4, all the rows of $\mathbf {s}_{h}$, $\mathbf {T}$, $\mathbf {Q}_{h}$ and $\mathbf {1}_{N^2}$ in model (15), corresponding to the diagonal entries of the matrices in (14), need to be left out. Such reduced structures are obtained as follows:

$$\begin{aligned} {\tilde{\mathbf {s}}}_h&= \mathbf {s}_h \odot \mathbf {d} \qquad (h=1,\ldots ,H) \,, \end{aligned}$$

(25)

$$\begin{aligned} {\tilde{\mathbf {T}}}&= \mathbf {T} \odot \mathbf {D}\,, \end{aligned}$$

(26)

$$\begin{aligned} {\tilde{\mathbf {Q}}}_h&= \mathbf {Q}_h \odot \mathbf {D} \qquad (h=1,\ldots ,H)\,, \end{aligned}$$

(27)

$$\begin{aligned} {\tilde{\mathbf {1}}}_{N^2}&= \mathbf {1}_{N^2} \odot \mathbf {d} \,, \end{aligned}$$

(28)

where $\odot $ denotes the Hadamard product, $\mathbf {d}$ is the column vector of size $N^2$ of the vectorized matrix $\big (\mathbf {1}_N \mathbf {1}_N^\prime - \mathbf {I}_N\big )$, being $\mathbf {I}_N$ the identity matrix of size N, and $\mathbf {D}$ is the $N^2 \times J$ matrix having all its columns equal to $\mathbf {d}$.

Therefore, model (15) is rewritten in terms of (25)–(28) and Steps 2, 3 and 4 modified accordingly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bocci, L., Vicari, D. ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data. Psychometrika 84, 941–985 (2019). https://doi.org/10.1007/s11336-019-09686-1

Download citation

Received: 04 July 2018
Revised: 13 August 2019
Published: 13 September 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11336-019-09686-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (zip 279 KB)

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (zip 279 KB)

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation