Abstract
In many research domains different pieces of information are collected regarding the same set of objects. Each piece of information constitutes a data block, and all these (coupled) blocks have the object mode in common. When analyzing such data, an important aim is to obtain an overall picture of the structure underlying the whole set of coupled data blocks. A further challenge consists of accounting for the differences in information value that exist between and within (i.e., between the objects of a single block) data blocks. To tackle these issues, analysis techniques may be useful in which all available pieces of information are integrated and in which at the same time noise heterogeneity is taken into account. For the case of binary coupled data, however, only methods exist that go for a simultaneous analysis of all data blocks but that do not account for noise heterogeneity. Therefore, in this paper, the SIMCLAS model, being a Hierarchical Classes model for the simultaneous analysis of coupled binary two-way matrices, is presented. In this model, noise heterogeneity between and within the data blocks is accounted for by downweighting entries from noisy blocks/objects within a block. In a simulation study it is shown that (1) the SIMCLAS technique recovers the underlying structure of coupled data to a very large extent, and (2) the SIMCLAS technique outperforms a Hierarchical Classes technique in which all entries contribute equally to the analysis (i.e., noise homogeneity within and between blocks). The latter is also demonstrated in an application of both techniques to empirical data on categorization of semantic concepts.
Similar content being viewed by others
Notes
In the remainder of this paper, the terms ‘matrix’ and ‘block’ are used interchangeably.
A value of π n larger than 0.50 is not realistic, because this implies that the model entries may differ from their corresponding data entries with a probability that is above chance.
Note that allowing π to differ among the columns of each data block would yield a special case of block-homogeneous SIMCLAS (i.e., the case in which each data block consists of one variable only). Note further that it may not be a good idea to allow π to be different for each data element, because it would be impossible to estimate such a model (i.e., there are more parameters to fit than there are data points).
When π 1>π 2, then \(c_{1} = \log (\frac{\pi_{1}}{1-\pi_{1}}) > c_{2} = \log (\frac{\pi_{2}}{1-\pi_{2}})\), with c 1 and c 2 representing, respectively, the contribution of entries from D 1 and D 2 to the likelihood, which has to be maximized. Note that c 1 and c 2 are negative when π n ≤0.50, which implies that |c 1|<|c 2|. As such, entries from more noisy blocks (i.e., larger π n and smaller |c n |) imply a smaller decrease in the likelihood than entries from less noisy blocks (i.e., smaller π n and larger |c n |), resulting in entries from more noisy blocks being downweighted.
The kappa coefficient κ between two dichotomous variables can be computed as follows:
$$ \kappa= \frac{(p_{00} + p_{11}) - (p_{0.}p_{.0} + p_{1.}p_{.1}) }{1 - (p_{0.}p_{.0} + p_{1.}p_{.1})}, $$(7)with p 00 (p 11) the proportion of zero-agreements (one-agreements) and p 0. and p 1. (p .0 and p .1) the marginal proportion of zeros and ones for the first (second) variable.
References
Aarts, E.H.L., Korst, J.H.M., & van Laarhoven, P.J.M. (1997). Simulated annealing. In E.H.L. Aarts & J.K. Lenstra (Eds.), Local search in combinatorial optimization (pp. 91–120). Chichester: Wiley.
Barbut, M., & Monjardet, B. (1970). Ordre et classification : Algèbre et combinatoire. Paris: Hachette.
Birkhoff, G. (1940). Lattice theory. Providence: Am. Math. Soc.
Ceulemans, E., & Storms, G. (2010). Detecting intra- and inter-categorical structure in semantic concepts using hiclas. Acta Psychologica, 133, 296–304.
Ceulemans, E., & Van Mechelen, I. (2003). Uniqueness of n-way n-mode hierarchical classes models. Journal of Mathematical Psychology, 47, 259–264.
Ceulemans, E., & Van Mechelen, I. (2004). Tucker2 hierarchical classes analysis. Psychometrika, 69, 375–399.
Ceulemans, E., & Van Mechelen, I. (2005). Hierarchical classes models for three-way three-mode binary data: Interrelations and model selection. Psychometrika, 70, 461–480.
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2003). Tucker3 hierarchical classes analysis. Psychometrika, 68, 413–433.
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis: An evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377–391.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
De Boeck, P., & Rosenberg, S. (1988). Hierarchical classes: Model and data analysis. Psychometrika, 53, 361–381.
De Deyne, S., Verheyen, S., Ameel, E., Vanpaemel, W., Dry, M., Voorspoels, W., & Storms, G. (2008). Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavioral Research Methods, 40, 1030–1048.
Haggard, E.A. (1958). Intraclass correlation and the analysis of variance. New York: Dryden.
Kiers, H.A.L. (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14, 105–122.
Kiers, H.A.L., & ten Berge, J.M.F. (1989). Alternating least squares algorithms for simultaneous components analysis with equal component weight matrices for all populations. Psychometrika, 54, 467–473.
Kiers, H.A.L., & ten Berge, J.M.F. (1994). Hierarchical relations between methods for simultaneous component analysis and a technique for rotation to a simple simultaneous structure. British Journal of Mathematical & Statistical Psychology, 47, 109–126.
Kirk, R.E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Belmont: Brooks/Cole.
Kirkpatrick, S., Gelatt, C.D.J., & Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Leenen, I., & Van Mechelen, I. (2001). An evaluation of two algorithms for hierarchical classes analysis. Journal of Classification, 18, 57–80.
Leenen, I., Van Mechelen, I., De Boeck, P., & Rosenberg, S. (1999). indclas: A three-way hierarchical classes model. Psychometrika, 64, 9–24.
Leenen, I., Van Mechelen, I., Gelman, A., & De Knop, S. (2008). Bayesian hierarchical classes analysis. Psychometrika, 73, 39–64.
Millsap, R.E., & Meredith, W. (1988). Component analysis in cross-sectional and longitudinal data. Psychometrika, 53, 123–134.
ten Berge, J.M.F., Kiers, H.A.L., & van der Stel, V. (1992). Simultaneous components analysis. Statistica Applicata, 4, 377–392.
Timmerman, M.E., & Kiers, H.A.L. (2003). Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences. Psychometrika, 68, 105–121.
Van Deun, K., Smilde, A.K., van der Werf, M.J., Kiers, H.A.L., & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10, 246.
Van Mechelen, I., De Boeck, P., & Rosenberg, S. (1995). The conjunctive model of hierarchical classes. Psychometrika, 60, 505–521.
Van Mechelen, I., & Smilde, A.K. (2009). A generic model for data fusion. Paper presented at the 6th meeting of TRICAP (Three-way methods in chemistry and psychology), June 14–19, Vall de Núria, Spain.
Van Mechelen, I., & Smilde, A.K. (2010). A generic linked-mode decomposition model for data fusion. Chemometrics and Intelligent Laboratory Systems, 104, 83–94.
Wilderjans, T.F., Ceulemans, E., & Van Mechelen, I. (2008). The chic model: global model for coupled binary data. Psychometrika, 73, 729–751.
Wilderjans, T.F., Ceulemans, E., Van Mechelen, I., & van den Berg, R.A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical & Statistical Psychology, 64, 277–290.
Author information
Authors and Affiliations
Corresponding author
Additional information
The first author is a Research Assistant of the Fund for Scientific Research (FWO)—Flanders (Belgium). The research reported in this paper was partially supported by the Research Council of K.U. Leuven (GOA/2005/04 and EF/2005/07, ‘SymBioSys’) and by IWT-Flanders (SBO 60045, ‘Bioframe’). We would like to thank Gert Storms and his collaborators for providing us with an interesting data set.
Appendix: Simulated Annealing to Estimate the Bundle Matrices, Conditional on the Noise Parameters
Appendix: Simulated Annealing to Estimate the Bundle Matrices, Conditional on the Noise Parameters
To estimate, in Step 2 of the SIMCLAS algorithm (see section The SIMCLAS algorithm), the binary bundle matrices A and B n that maximize the loss function, conditionally upon the noise parameters, a simulated annealing procedure is adopted. Simulated annealing is a local search technique that implies a walk through the solution space. In particular, a chain of solutions, consisting of several subchains, is generated by each time creating a candidate solution based on the current solution. Next, the loss function values of the current and the candidate solution are compared. If the candidate solution has a better loss function value f, it is accepted, which implies that the current solution is replaced by the candidate solution. If the candidate solution, however, has a worse loss function value, it is accepted with a probability that depends on its relative quality (i.e., the difference in loss function value f between the current solution and the candidate one) and the current temperature, a quantity that controls the acceptance probability. At the end of each subchain the temperature is decreased. Subchains are generated until a prespecified subchain stop criterion is met. Finally, the best encountered solution in the chain is retained.
Based on the results of a pilot study and on the SA implementations that have been used for other Hierarchical Classes models (see, Ceulemans et al. 2007), we implemented the procedure for generating a single SA chain (see Algorithm A1 for pseudo-code) in the SIMCLAS algorithm as follows:
-
1.
An initial solution S current and associated initial loss value L current is obtained by replacing the P columns of each bundle matrix by P data vectors sampled at random (i.e., for A, column vectors are drawn from the different D n, whereas for each B n, row vectors are chosen from the corresponding D n).
-
2.
The initial temperature T initial is obtained by running a subchain of solutions and accepting all solutions; subsequently, the average increase in the likelihood function across those links in which worse solutions are accepted, is divided by ln(0.8); as such, during the first subchains in which the algorithm is still far from the optimal solution, worse solutions are accepted with a high probability (see Kirkpatrick et al. 1983; Aarts et al. 1997; Ceulemans et al. 2007).
-
3.
A candidate solution S trial, and associated loss value L trial, is obtained from the current solution S current by altering the value of a randomly chosen cell of a randomly chosen bundle matrix, with each cell of each bundle matrix having the same probability of being changed.
-
4.
A worse candidate solution is accepted if: p<exp((L trial−L current)/T current), with p being a number generated from a uniform (0,1) distribution.
-
5.
A subchain stops (1) if the number of generated solutions i gen equals the maximum number of solutions \(\mathit{CL} = ((I+\sum_{n=1}^{N} J_{n})\times2^{P})\times5\), or (2) if the number of accepted solutions i acc equals CL×0.10.
-
6.
At the end of each subchain, the temperature is decreased by a factor α=0.90, implying a smaller acceptance probability for worse solutions: T current=0.9×T current.
-
7.
An SA chain stops when (1) the current temperature becomes smaller than T stop=0.000001, or (2) the number of subsequent subchains i id with an identical loss value L current for the last accepted solution in each subchain (i.e., L current=L previous) equals \(\mathit {max}_{i_{\mathrm{id}}}\), which is set to five.
-
8.
The retained solution is the best encountered solution S best across all subchains.
To lower the risk of ending in a suboptimal solution (i.e., local optimum), a multi-start procedure may be advised, which consists of running 100 SA chains, each time with a different initial solution and initial temperature (see Steps 1 and 2), and retaining the best encountered solution across all chains (see Ceulemans et al. 2007).
Rights and permissions
About this article
Cite this article
Wilderjans, T.F., Ceulemans, E. & Van Mechelen, I. The SIMCLAS Model: Simultaneous Analysis of Coupled Binary Data Matrices with Noise Heterogeneity Between and Within Data Blocks. Psychometrika 77, 724–740 (2012). https://doi.org/10.1007/s11336-012-9275-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9275-3