Skip to main content

Advertisement

Log in

ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In the context of three-way proximity data, an INDCLUS-type model is presented to address the issue of subject heterogeneity regarding the perception of object pairwise similarity. A model, termed ROOTCLUS, is presented that allows for the detection of a subset of objects whose similarities are described in terms of non-overlapping clusters (ROOT CLUSters) common across all subjects. For the other objects, Individual partitions, which are subject specific, are allowed where clusters are linked one-to-one to the Root clusters. A sound ALS-type algorithm to fit the model to data is presented. The novel method is evaluated in an extensive simulation study and illustrated with empirical data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The relative loss is defined here as the ratio of the raw loss to the total sum of squares of the data.

  2. The MATLAB code is available online on SpringerLink with the article.

  3. Given two matrices \(\mathbf {A}\) and \(\mathbf {B}\) with the same number J of columns, the Khatri–Rao product of \(\mathbf {A}\) and \(\mathbf {B}\) is the column-wise Kronecker product, i.e., \(\mathbf {A} |\otimes | \mathbf {B} = (\mathbf {a}_1 \otimes \mathbf {b}_1, \ldots , \mathbf {a}_j \otimes \mathbf {b}_j, \ldots , \mathbf {a}_J \otimes \mathbf {b}_J )\) where \(\mathbf {a}_j\) and \(\mathbf {b}_j\) are the j-th (\(j=1,\ldots ,J\)) column of \(\mathbf {A}\) and \(\mathbf {B}\), respectively, and \(\otimes \) denotes the Kronecker product.

  4. The Kappa coefficient (KC) between two binary matrices is equal to the proportion of agreement between the two matrices (i.e., the proportion of the corresponding cells having the same values), corrected for chance (Wilderjans et al. 2012):

    $$\begin{aligned} KC=\frac{(p_{00} + p_{11}) - (p_{0.}p_{.0} + p_{1.}p_{.1})}{1 - (p_{0.}p_{.0} + p_{1.}p_{.1})}, \end{aligned}$$

    with \(p_{00}\)\((p_{11})\) being the proportion of corresponding cells that both are zero (one) and \(p_{0.}\) and \(p_{1.}\) (\(p_{.0}\) and \(p_{.1}\)) the marginal proportion of zero- and one-cells for the first (second) matrix. Note that \(p_{00}+p_{11}\) equals the (uncorrected) proportion of corresponding cells that have the same value.

  5. Note that \(w_2\) is missing here because the Root cluster \(R_2\) is a singleton and the diagonal entries of the similarity matrices are not fitted in this application.

References

  • Bocci, L., & Vicari, D. (2017). GINDCLUS: Generalized INDCLUS with external information. Psychometrika, 82, 355–381.

    Article  Google Scholar 

  • Bocci, L., Vicari, D., & Vichi, M. (2006). A mixture model for the classification of three-way proximity data. Computational Statistics & Data Analysis, 50, 1625–1654.

    Article  Google Scholar 

  • Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.

    Google Scholar 

  • Carroll, J. D., & Arabie, P. (1983). INDCLUS: An individual differences generalization of ADCLUS model and the MAPCLUS algorithm. Psychometrika, 48, 157–169.

    Article  Google Scholar 

  • Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-generalization of the Eckart–Young decomposition. Psychometrika, 35, 283–319.

    Article  Google Scholar 

  • Chaturvedi, A., & Carroll, J. D. (2006). CLUSCALE (CLUstering and multidimensional SCAL[E]ing): A three-way hybrid model incorporating clustering and multidimensional scaling structure. Journal of Classification, 23, 269–299.

    Article  Google Scholar 

  • Chaturvedi, A. J., & Carroll, J. D. (1994). An alternating combinatorial optimization approach to fitting the INDCLUS and generalized INDCLUS models. Journal of Classification, 11, 155–170.

    Article  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • De Leeuw, J. (1994). Block-relaxation algorithms in statistics. In H. H. Bock, W. Lenski, & M. M. Richter (Eds.), Information systems and data analysis (pp. 308–325). Berlin: Springer.

    Chapter  Google Scholar 

  • Giordani, P., & Kiers, H. A. L. (2012). FINDCLUS: Fuzzy INdividual Differences CLUStering. Journal of Classification, 29, 170–198.

    Article  Google Scholar 

  • Gordon, A. D., & Vichi, M. (1998). Partitions of Partitions. Journal of Classification, 15, 265–285.

    Article  Google Scholar 

  • Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Hubert, L. J., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with MATLAB. Philadelphia: SIAM.

    Book  Google Scholar 

  • Kiers, H. A. L. (1997). A modification of the SINDCLUS algorithm for fitting the ADCLUS and INDCLUS models. Journal of Classification, 14, 297–310.

    Article  Google Scholar 

  • Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • McDonald, R. P. (1980). A simple comprehensive model for the analysis of covariance structures: Some remarks on applications. British Journal of Mathematical and Statistical Psychology, 33, 161–183.

    Article  Google Scholar 

  • Mirkin, B. G. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4, 7–31.

    Article  Google Scholar 

  • Rao, C. R., & Mitra, S. (1971). Generalized inverse of matrices and its applications. New York: Wiley.

    Google Scholar 

  • Rocci, R., & Vichi, M. (2008). Two-mode multi-partitioning. Computational Statistics & Data Analysis, 52, 1984–2003.

    Article  Google Scholar 

  • Shepard, R. N., & Arabie, P. (1979). Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86, 87–123.

    Article  Google Scholar 

  • Schepers, J., Ceulemans, E., & Van Mechelen, I. (2008). Selecting among multi-mode partitioning models of different complexities: A comparison of four model selection criteria. Journal of Classification, 25, 67–85.

    Article  Google Scholar 

  • Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling. London: Academic Press.

    Google Scholar 

  • Vicari, D., & Vichi, M. (2009). Structural classification analysis of three-way dissimilarity data. Journal of Classification, 26, 121–154.

    Article  Google Scholar 

  • Vichi, M. (1999). One mode classification of a three-way data set. Journal of Classification, 16, 27–44.

    Article  Google Scholar 

  • Wedel, M., & DeSarbo, W. S. (1998). Mixtures of (constrained) ultrametric trees. Psychometrika, 63, 419–443.

    Article  Google Scholar 

  • Wilderjans, T. F., Depril, D., & Van Mechelen, I. (2012). Block-relaxation approaches for fitting the INDCLUS model. Journal of Classification, 29, 277–296.

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Associate Editor and referees for their valuable comments and suggestions which greatly improved the presentation and content of the first version.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donatella Vicari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 279 KB)

Appendix

Appendix

In order to solve the constrained problem (14) in those cases when the diagonal entries of the H similarity matrices \(\mathbf {S}_{h}\) are not of interest, the steps 1 to 4 of the ALS-type algorithm presented in Sect. 4 can be modified straightforwardly as follows.

Since only the off-diagonal elements of matrices \(\mathbf {S}_{h}\) (\(h=1,\ldots ,H\)) need to be considered, the loss function (14) becomes

$$\begin{aligned} f_{\text {off}}(\mathbf {P}, \mathbf {W}, \mathbf {M}_h, \mathbf {V}_h, c_h)=\; F_{\text {off}}(\mathbf {P}, \mathbf {W}, \mathbf {M}_h, \mathbf {V}_h, c_h) +\lambda G, \end{aligned}$$
(23)

where

$$\begin{aligned} \begin{aligned}&F_{\text {off}}(\mathbf {P}, \mathbf {W}, \mathbf {M}_h, \mathbf {V}_h, c_h) = \\&\quad \frac{\sum _{h=1}^{H} \left\| \mathbf {S}_h - \mathbf {P} \mathbf {W} \mathbf {P}^\prime - \bigl (\mathbf {P}+\mathbf {M}_{h} \bigr ) \mathbf {V}_h \bigl (\mathbf {P}+\mathbf {M}_{h} \bigr )^\prime - c_{h} \mathbf {1}_N \mathbf {1}_{N}^{\prime } \right\| _{\text {off}}^2}{\sum _{h=1}^{H} \left\| \mathbf {S}_h \right\| _{\text {off}}^2} \end{aligned} \end{aligned}$$
(24)

and \(\left\| \mathbf {Z} \right\| _{\text {off}}^2 = \sum _{x=1}^{X}\sum _{\begin{array}{c} y=1\; ;\; \end{array}{y\ne x}}^{Y} z_{xy}^2\).

In step 1, the loss function (23), instead of (14), is minimized over \(\mathbf {P}\) and \(\mathbf {M}_h\) (\(h=1,\ldots ,H\)).

In steps 2 to 4, all the rows of \(\mathbf {s}_{h}\), \(\mathbf {T}\), \(\mathbf {Q}_{h}\) and \(\mathbf {1}_{N^2}\) in model (15), corresponding to the diagonal entries of the matrices in (14), need to be left out. Such reduced structures are obtained as follows:

$$\begin{aligned} {\tilde{\mathbf {s}}}_h&= \mathbf {s}_h \odot \mathbf {d} \qquad (h=1,\ldots ,H) \,, \end{aligned}$$
(25)
$$\begin{aligned} {\tilde{\mathbf {T}}}&= \mathbf {T} \odot \mathbf {D}\,, \end{aligned}$$
(26)
$$\begin{aligned} {\tilde{\mathbf {Q}}}_h&= \mathbf {Q}_h \odot \mathbf {D} \qquad (h=1,\ldots ,H)\,, \end{aligned}$$
(27)
$$\begin{aligned} {\tilde{\mathbf {1}}}_{N^2}&= \mathbf {1}_{N^2} \odot \mathbf {d} \,, \end{aligned}$$
(28)

where \(\odot \) denotes the Hadamard product, \(\mathbf {d}\) is the column vector of size \(N^2\) of the vectorized matrix \(\big (\mathbf {1}_N \mathbf {1}_N^\prime - \mathbf {I}_N\big )\), being \(\mathbf {I}_N\) the identity matrix of size N, and \(\mathbf {D}\) is the \(N^2 \times J\) matrix having all its columns equal to \(\mathbf {d}\).

Therefore, model (15) is rewritten in terms of (25)–(28) and Steps 2, 3 and 4 modified accordingly.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bocci, L., Vicari, D. ROOTCLUS: Searching for “ROOT CLUSters” in Three-Way Proximity Data. Psychometrika 84, 941–985 (2019). https://doi.org/10.1007/s11336-019-09686-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-019-09686-1

Keywords

Navigation