Abstract
For multivariate categorical data, it is important to detect both clustering structures and low dimensions such that clusters are discriminated. This is because it is easy to interpret the features of clusters through the estimated low dimensions. It is sure that these existing methods for dimensional reduction clustering are useful to achieve such purpose; however, the interpretation sometimes becomes complicated due to the sign of the estimated parameters. Thus, we propose new dimensional reduction clustering with non-negativity constraints for all parameters. The proposed method has several advantages. First, when the features of clusters are interpreted, it is easier to interpret the clusters since effects of sign should not be considered. In addition, from the non-negativity and orthogonality constraints, the estimated components become perfect simple structure, which is interpretable descriptions. Second, we showed that the clustering results are not inferior to these existing methods through the simulations, although the constraints for the proposed method are strong.
Similar content being viewed by others
References
Adachi, K. (2000). Growth curve representation and clustering under optimal scaling of repeated choice data. Behaviormetrika, 27, 15–32.
Adachi, K., & Murakami, T. (2011). Hikeiryoutahenryoukaisekihou (in Japanese). Japan: Asakurasyoten.
Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In Bagozzi, R. P. (Ed.) Advanced Methods of Marketing Research (pp. 160–189). Oxford: Blackwell.
Benzecri, J.P. (1979). Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire. Cahiers de l’Analyse des Donnees, 4, 377–378.
Bernaad, C.A., & Jennrich, R.I. (2003). Orthomax rotation and perfect simple structure. Psychometrika, 68, 585–588.
Bergami, M., & Bagozzi, R.P. (2000). Self-categorization, affective commitment and group selfesteem as distinct aspects of social identity in the organization. British Journal of Social Psychology, 39(4), 555–577.
Carroll, J.D., Green, P.E., Schaffer, C.M. (1986). Interpoint distance comparisons in correspondence analysis. Journal of Marketing Research, 22, 271–281.
De Soete, G., & Carroll, J.D. (1994). K-means clustering in low-dimensional Euclidean space. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., Burtschy, B. (Eds.) New Approaches in Classification and Data Analysis (pp. 212–219). Heidelberg: Springer.
Ding, C., He, X., Simon, H. (2005). Orthogonal nonnagative matrix tri-factorizations for clustering. In Proceedings of 12th ACM SIGKDD International Conference Knowledge Discovery and Data Mining (KDD) (pp. 126–135).
Ding, C., Li, T., Peng, W., Park, H. (2006). Orthogonal nonnagative matrix tri-factorizations for clustering. In Proceedings of SIAM Data Mining Conference (pp. 606–610).
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data with categorical values. Journal of Data Mining and Knowledge Discovery, 2, 283–304.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Hwang, H., & Takane, Y. (2004). Generalized structured component analysis. Psychometrika, 69(1), 81–99.
Hwang, H., Dillon, W.R., Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents. Psychometrika, 71, 161–171.
Hwang, H., Dillon, W.R., Takane, Y. (2010). Fuzzy cluster multiple correspondence analysis. Behaviormetrika, 67, 215–228.
Iodice D’Enza, A., & Paulumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 28(2), 789–807.
Lee, D.D., & Seung, H.S. (1999). Learning the parts of objects with nonnegative matrix factorization. Nature, 401, 788–791.
Lee, D.D., & Seung, H.S. (2001). Algorithm for non-negative matrix factorization. In NIPS.
Li, S., Hou, X., Zhang, H., Cheng, Q. (2001). Learning spatially localized, parts-based representation. Proceedings of IEEE Conference Computer Vision and Pattern Recognition (pp. 207–212).
Li, T., & Ding, C. (2006). The relationsships among various nonnegative matrix factorization methods for clustering. Proceedings of IEEE Sixth International Conference and Data Mining (pp. 362–371).
Macqueen, J. (1967). Some methods for classification and analysis of multivariate observations. Fifth berkeley symposium on mathematics, statistics and probability (pp. 281–297). University of California Press.
Milligan, G.W., & Cooper, M.C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5, 181–204.
Mitsuhiro, M., & Yadohisa, H. (2015). Reduced k-means clustering with MCA in low-dimensional space. Computational Statistics, 30, 463–475.
Rocci, R., Gattone, S.A., Vichi, M. (2011). A new dimension reduction method: factor discriminant k-means. Journal of Classification, 28, 210–226.
Timmerman, M.E., Ceulemans, E., Kiers, H.A.L., Vichi, M. (2010). Factorial and reduced k-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.
Van Buuren, S., & Heiser, W.J. (1989). Clustering N objects into K groups under optimal scaling of variables. Psychometrika, 54, 699–706.
Van De Velden, M., Iodice D’Enza, A., Palumbo, F. (2017). Cluster correspondence analysis. Psychometrika, 82(1), 158–185.
Vichi, M., & Kiers, H.A.L. (2001). Factorial k-means analysis for two-way data. Computational Staitstics & Data Analysis, 37, 49–64.
Wang, J. (2010). Consistent selection of the number of clusters via crossvalidation. Biometrika, 97, 893–904.
Yamamoto, M., & Hayashi, K. (2015). Clustering of multivariate binary data with dimension reduction via L1-regularized likelihood maximization. Pattern Recognition, 48, 3959–3968.
Acknowledgments
We appreciate the editer, and reviewers for the useful comments. This work was supported by JSPS KAKENHI Grant Number JP40782818.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tanioka, K., Yadohisa, H. Simultaneous Method of Orthogonal Non-metric Non-negative Matrix Factorization and Constrained Non-hierarchical Clustering. J Classif 36, 73–93 (2019). https://doi.org/10.1007/s00357-018-9284-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9284-8