Journal of Classification

, Volume 29, Issue 3, pp 297–320 | Cite as

Lowdimensional Additive Overlapping Clustering

  • Dirk Depril
  • Iven Van Mechelen
  • Tom F. Wilderjans
Article

Abstract

To reveal the structure underlying two-way two-mode object by variable data, Mirkin (1987) has proposed an additive overlapping clustering model. This model implies an overlapping clustering of the objects and a reconstruction of the data, with the reconstructed variable profile of an object being a summation of the variable profiles of the clusters it belongs to. Grasping the additive (overlapping) clustering structure of object by variable data may, however, be seriously hampered in case the data include a very large number of variables. To deal with this problem, we propose a new model that simultaneously clusters the objects in overlapping clusters and reduces the variable space; as such, the model implies that the cluster profiles and, hence, the reconstructed data profiles are constrained to lie in a lowdimensional space. An alternating least squares (ALS) algorithm to fit the new model to a given data set will be presented, along with a simulation study and an illustrative example that makes use of empirical data.

Keywords

Additive overlapping clustering Dimensional reduction Alternating least squares algorithm Two-way two-mode data Object by variable data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ANDERSON, T. (1951), “Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions,” The Annals of Mathematical Statistics 22, 327–351.MATHCrossRefGoogle Scholar
  2. ARABIE, P., and HUBERT, L. (1994), “Cluster Analysis in Marketing Research,” in Handbook of Marketing Research, ed. R. Bagozzi, Oxford: Blackwell, pp. 160–189.Google Scholar
  3. BERKOWITZ, L. (1989), “Frustration-aggression Hypothesis: Examination and Reformulation,” Psychological Bulletin 106, 59–73.CrossRefGoogle Scholar
  4. BOCK, H.-H. (1987), “On the interface Between Cluster Analysis, Principal Component Analysis and Multidimensional Scaling,” in Multivariate Statistical Modeling and Data Analysis: Proceedings of the Advanced Symposium on Multivariate Modeling and Data Analysis May 15–16, 1986, eds. H. Bozdogan and A. Gupta, Dordrecht, The Netherlands: Reidel Publishing Company, pp. 17–34.Google Scholar
  5. CARROLL, J. D., and CHATURVEDI, A. (1995), “A General Approach to Clustering and Multidimensional Scaling of Two-way, Three-way or Higher-way Data,” in Geometric Representations of Perceptual Phenomena: Papers in honor of Tarow Indow on his 70th birthday, eds. D. R. Luce, M. D’Zmura, D. Hoffman, G. J. Iverson, and K. A. Romney, Mahwah, New Jersey: Lawrence Erlbaum Associates, pp. 295–318.Google Scholar
  6. CATTELL, R. B. (1966), “TheMeaning and Strategic Use of Factor Analysis,” in Handbook of Multivariate Experimental Psychology, ed. R. B. Cattell, Chicago: Rand McNally, pp. 174–243.Google Scholar
  7. CEULEMANS, E., and KIERS, H. A. L. (2006), “Selecting Among Three-mode Principal Component Models of Different Types and Complexities: A Numerical Convex Hull Based Method,” British Journal of Mathematical and Statistical Psychology 59, 133–150.MathSciNetCrossRefGoogle Scholar
  8. CEULEMANS, E., TIMMERMAN, M. E., and KIERS, H. A. L. (2011), “The CHull Procedure for Selecting Among Multilevel Component Solutions,” Chemometrics and Intelligent Laboratory Systems 106, 12–20.CrossRefGoogle Scholar
  9. CEULEMANS, E., and VAN MECHELEN, I. (2005), “Hierarchical Classes Models for Three-way Three-mode Binary Data: Interrelations and Model Selection,” Psychometrika 70, 461–480.MathSciNetCrossRefGoogle Scholar
  10. CEULEMANS, E., and VAN MECHELEN, I. (2004), “Tucker2 Hierarchical Classes Analysis,” Psychometrika 69, 375–399.MathSciNetCrossRefGoogle Scholar
  11. CEULEMANS, E., VAN MECHELEN, I., and LEENEN, I. (2007), “The Local Minima Problem in Hierarchical Classes Analysis: An Evaluation of a Simulated Annealing Algorithm and Various Multistart Procedures,” Psychometrika 72, 377–391.MathSciNetMATHCrossRefGoogle Scholar
  12. CEULEMANS, E., VAN MECHELEN, I., and LEENEN, I. (2003), “Tucker3 Hierarchical Classes Analysis,” Psychometrika 68, 413–433.MathSciNetCrossRefGoogle Scholar
  13. CHANG, W.-C. (1983), “On Using Principal Components Before Separating a Mixture of Two Multivariate Normal Distributions,” Applied Statistics 32, 267–275.MATHCrossRefGoogle Scholar
  14. CHATURVEDI, A., and CARROLL, J. D. (1994), “An Alternating Combinatorial Optimization Approach to Fitting the INDCLUS and Generalized INDCLUS Models,” Journal of Classification 11, 155–170.MATHCrossRefGoogle Scholar
  15. COHEN, J., and COHEN, P. (1983), Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (2nd ed.), Hillsdale, NJ: Erlbaum.Google Scholar
  16. DEPRIL, D., VAN MECHELEN, I., and MIRKIN, B. G. (2008), “Algorithms for Additive Clustering of Rectangular Data Tables,” Computational Statistics and Data Analysis 52, 4923–4938.MathSciNetMATHCrossRefGoogle Scholar
  17. DE SOETE, G., and CARROLL, J. D. (1994), “K-means Clustering an a Low-dimensional Euclidean Space,” in New Approaches in Classification and Data Analysis, eds. E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, and B. Burtschy, Berlin, Germany: Springer-Verlag, pp. 212–219.Google Scholar
  18. EVERITT, B. (1977), “Cluster Analysis,” in The Analysis of Survey Data, Vol. 1: Exploring Data Structures, eds. C. A. O’Muircheartaig and C. Payne, London: Wiley, pp. 63–88.Google Scholar
  19. HUBERT, L. J., ARABIE, P., and HESSON-MCINNES, M. (1992), “Multidimensional Scaling in the City-block Metric - A Combinatorial Approach,” Journal of Classification 9, 211–236.CrossRefGoogle Scholar
  20. KRZANOWSKI,W. (1979), “Between-groups Comparison of Principal Components,” Journal of the American Statistical Association 74, 703–707.MathSciNetMATHCrossRefGoogle Scholar
  21. KUPPENS, P., and VAN MECHELEN, I. (2007), “Determinants of the Anger Appraisals of Threatened Self-esteem, Other-blame, and Frustration,” Cognition and Emotion 21, 56–77.CrossRefGoogle Scholar
  22. KUPPENS, P., VANMECHELEN, I., and SMITS, D. J.M. (2003), “The Appraisal Basis of Anger: Specificity, Necessity and Sufficiency of Components,” Emotion 3, 254–269.CrossRefGoogle Scholar
  23. LEE, D. D., and SEUNG, S. H. (2001), “Algorithms for Non-negativeMatrix Factorization,” Advances in Neural Information Processing Systems 13, 556–562.Google Scholar
  24. LEE, D. D., and SEUNG, S. H. (1999), “Learning the Parts of Objects by Non-negative Matrix Factorization,” Nature 401, 788–791.CrossRefGoogle Scholar
  25. MIRKIN, B. G. (1987), “Method of Principal Cluster Analysis,” Automation and Remote Control 48, 1379–1386.MATHGoogle Scholar
  26. ROCCI, R., and VICHI,M. (2005), “Three-mode Component Analysis with Crisp or Fuzzy Partition of Units,” Psychometrika 70, 715–736.MathSciNetCrossRefGoogle Scholar
  27. SCHEPERS, J., CEULEMANS, E., and VAN MECHELEN, I. (2008), “Selecting Among Multi-mode Partitioning Models of Different Complexities: A Comparison of Four Model Selection Criteria,” Journal of Classification 25, 67–85.MathSciNetMATHCrossRefGoogle Scholar
  28. SHEPARD, R. N., and ARABIE, P. (1979), “Additive Clustering Representations of Similarities as Combinations of Discrete Overlapping Properties,” Psychological Review 86, 87–123.CrossRefGoogle Scholar
  29. SPIELBERGER, C. D., JOHNSON, E. H., RUSSELL, S. F., CRANE, J. C., JACOBS, G. A., and WORDEN, T. J. (1985), “The Experience and Expression of Anger: Construction and Validation of an Anger Expression Scale,” in Anger and Hostility in Cardiovascular and Behavioral Disorders, eds. M. A. Chesney and R. H. Rosenman, New York: Hemisphere, pp. 5–30.Google Scholar
  30. STEINLEY, D. (2003), “Local Optima in K-means Clustering: What You Don’t Know May Hurt You,” Psychological Methods 8, 294–304.CrossRefGoogle Scholar
  31. STEINLEY, D., and BRUSCO, M. J. (2007), “Intializing K-means Batch Clustering: A Critical Evaluation of Several Techniques,” Journal of Classification 24, 99–121.MathSciNetMATHCrossRefGoogle Scholar
  32. STOICA, P., and VIBERG, M. (1996), “Maximum Likelihood Parameter and Rank Estimation in Reduced-Rank Multivariate Linear Regressions,” IEEE Transactions on Signal Processing 44, 3096–3078.Google Scholar
  33. TRYON, R. C., and BAILY, D. E. (1970), Cluster Analysis, New York: McGraw-Hill.Google Scholar
  34. VICHI, M., and KIERS, H. A. L. (2001), “FactorialK-means Analysis for Two-Way Data,” Computational Statistics and Data Analysis 37, 49–64.MathSciNetMATHCrossRefGoogle Scholar
  35. VICHI, M., ROCCI, R., and KIERS, H. A. L. (2007), “Simultaneous Component and Clustering Models for Three-Way Data: Within and Between Approaches,” Journal of Classification 24, 71–98.MathSciNetMATHCrossRefGoogle Scholar
  36. WILDERJANS, T. F., CEULEMANS, E., and KUPPENS, P. (2012), “Clusterwise HICLAS: A Generic Modeling Strategy to Trace Similarities and Differences in Multi-Block Binary Data,” Behavior Research Methods, 44, 532–545.CrossRefGoogle Scholar
  37. WILDERJANS, T. F., CEULEMANS, E., and MEERS, K. (in press), “CHull: A Generic Convex Hull Based Model Selection Method,” Behavior Research Methods.Google Scholar
  38. WILDERJANS, T. F., CEULEMANS, E., and VAN MECHELEN, I. (in press), “The SIMCLAS Model: Simultaneous Analysis of Coupled Binary Data Matrices with Noise Heterogeneity Between and Within Data Blocks,” Psychometrika.Google Scholar
  39. WILDERJANS, T. F., CEULEMANS, E., VAN MECHELEN, I., and DEPRIL, D. (2011), “ADPROCLUS: A Graphical User Interface for Fitting Additive Profile Clustering Models to Object by Variable Data Matrices,” Behavior Research Methods 43, 56–65.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Dirk Depril
    • 1
  • Iven Van Mechelen
    • 2
  • Tom F. Wilderjans
    • 2
  1. 1.suAzio ConsultingAntwerpBelgium
  2. 2.Faculty of Psychology and Educational SciencesKU LeuvenLeuvenBelgium

Personalised recommendations