rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Kampert, Maarten M.; Meulman, Jacqueline J.; Friedman, Jerome H.

doi:10.1007/s00357-017-9240-z

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Open access
Published: 03 November 2017

Volume 34, pages 514–547, (2017)
Cite this article

Download PDF

You have full access to this open access article

Journal of Classification Aims and scope Submit manuscript

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Download PDF

Maarten M. Kampert¹,
Jacqueline J. Meulman^1,2 &
Jerome H. Friedman²

1070 Accesses
5 Citations
Explore all metrics

Abstract

rCOSA is a software package interfaced to the R language. It implements statistical techniques for clustering objects on subsets of attributes in multivariate data. The main output of COSA is a dissimilarity matrix that one can subsequently analyze with a variety of proximity analysis methods. Our package extends the original COSA software (Friedman and Meulman, 2004) by adding functions for hierarchical clustering methods, least squares multidimensional scaling, partitional clustering, and data visualization. In the many publications that cite the COSA paper by Friedman and Meulman (2004), the COSA program is actually used only a small number of times. This can be attributed to the fact that this original implementation is not very easy to install and use. Moreover, the available software is out-of-date. Here, we introduce an up-to-date software package and a clear guidance for this advanced technique. The software package and related links are available for free at: https://github.com/mkampert/rCOSA.

Article PDF

Clustering: an R library to facilitate the analysis and comparison of cluster algorithms

Article Open access 17 December 2022

Hierarchical Means Clustering

Article Open access 23 September 2022

Pairwise Data Clustering Accompanied by Validation and Visualisation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

AITCHISON, J. (1986), The Statistical Analysis of Compositional Data, London: Chapman and Hall.
Book MATH Google Scholar
AMORIM, R.C. (2015), “Feature Relevance inWard’s Hierarchical Clustering Using the L_p Norm”, Journal of Classification, 32, 46–62.
Article MathSciNet MATH Google Scholar
ANDREWS, J.L., and MCNICHOLAS, P.D. (2014), “Variable Selection for Clustering and Classification”, Journal of Classification, 31(2), 136–153.
Article MathSciNet MATH Google Scholar
BOUVEYRON, C., and BRUNET, C. (2012), “Simultaneous Model-Based Clustering and Visualization in the Fisher Discriminative Subspace”, Statistics and Computing, 22(1), 301–324.
Article MathSciNet MATH Google Scholar
DAMIAN, D., ORESICS, M., VERHEIJ, E., MEULMAN, J. J., FRIEDMAN, J., ADOURIAN, A., MOREL, N., SMILDE, A., and VAN DER GREEF, J. (2007), “Applications of a New Subspace Clustering Algorithm (COSA) in Medical Systems Biology”, Metabolomics, 3(1), 69–77.
Article Google Scholar
DE LEEUW, J., and HEISER, W.J. (1982), “Theory of Multidimensional Scaling”, in Handbook of Statistics (Vol. 2), eds. P. Krishnaiah and L. Kanal, Amsterdam, The Netherlands: North-Holland, pp. 285–316.
DE SARBO, W., CARROLL, J., CLARCK, L., and GREEN, P. (1984), “Synthesized Clustering: A Method for Amalgamating Clustering Bases with Differential Weighting of Variables”, Psychometrika, 49, 57–78.
Article MathSciNet Google Scholar
DE SOETE, G. (1985), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting”, Journal of Classification, 5, 101–104.
Article Google Scholar
DE SOETE, G., DE SARBO, W., and CARROLL, J. (1985), “Optimal Variable Weighting for Hierarchical Clustering: Analternating Least-Squares Algorithm”, Journal of Classification, 2, 173–192.
Article MATH Google Scholar
FRIEDMAN, J.H., and MEULMAN, J.J. (2004), “Clustering Objects on Subsets of Attributes”, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(part 4), 815–849.
GOWER, J.C. (1966), “Some Distance Properties of Latent Roots and Vector Methods Used in Multivariate Analysis”, Biometrika, 53, 325–338.
Article MathSciNet MATH Google Scholar
HEISER, W.J. (1995), ‘Convergent Computation by Iterative Majorization: Theory and Applications in Multidimensional Data Analysis”, in Recent Advances in Descriptive Multivariate Analysis, ed. W. Krzanowski, Oxford: Oxford University Press, pp. 157–189.
Google Scholar
JAIN, A. (2010), “Data Clustering: 50 Years Beyond K-Means”, Pattern Recognition Letters, 31(8), 651–666.
Article Google Scholar
KOHONEN, T. (2001), Self Organizing Maps, Berlin, Heidelberg: Springer Verlag.
Book MATH Google Scholar
LEE, J., LENDASSE, A., and VERLEYSEN, M. (2004), “Nonlinear Projection with Curvilinear Distaces: Isomap Versus Curvilinear Distance Analysis”, Neurocomputing, 57, 49–76.
Article Google Scholar
MEULMAN, J.J. (1986), A Distance Approach to Nonlinear Multivariate Analysis, Leiden: DSWO Press.
Google Scholar
MEULMAN, J. (1992), “The Integration of Multidimensional Scaling and Multivariate Analysis with Optimal Transformations”, Psychometrika, 57, 539–565.
Article Google Scholar
R CORE TEAM (2014), “R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing”, Vienna, Austria, www.Rproject.org/.
SAMMON, J.J. (1969), “A Nonlinear Mapping for Data Structure Analysis”, IEEE Transactions on Computers, C-18, 401–409.
Article Google Scholar
SEBESTYEN, G.S. (1962), Decision-Making Processes in Pattern Recognition, New York: The Macmillan Company.
Google Scholar
STEINLEY, D., and BRUSCO, M. (2008), “Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures”, Psychometrika, 73(1), 46–62.
Article MathSciNet MATH Google Scholar
SZEPANNEK, G. (2013), “orclus: ORCLUS Subspace Clustering”, R package version 0.2-5, CRAN.R-project.org/package=orclus.
TORGERSON, W. (1952), “Multidimensional Scaling: I. Theory and Method”, Psychometrika, 17, 713–726.
Article MathSciNet MATH Google Scholar
WARD JR, J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 58(301), 236–244.
Article MathSciNet Google Scholar
WILLIAMS, G., HUANG, J.Z., CHEN, X., WANG, Q., and XIAO, L. (2014), “wskm: Weighted k-Means Clustering”, R Package Version 1.4.19, CRAN.Rproject.org/package=wskm.
WITTEN, D.M., and TIBSHIRANI, R. (2010), “A Framework for Feature Selection in Clustering”, Journal of the American Statistical Association, 105(2), 713–726.
Article MathSciNet MATH Google Scholar
YOUNG, F., and HOUSEHOLDER, A. (1938), “Discussion of a Set of Points in Terms of Their Mutual Distances”, Psychometrika, 3, 19–22.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical Institute, Leiden University, Niels Bohrweg 1, 23333 CA, Leiden, The Netherlands
Maarten M. Kampert & Jacqueline J. Meulman
Stanford University, Stanford, CA, USA
Jacqueline J. Meulman & Jerome H. Friedman

Authors

Maarten M. Kampert
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline J. Meulman
View author publications
You can also search for this author in PubMed Google Scholar
Jerome H. Friedman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maarten M. Kampert.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Kampert, M.M., Meulman, J.J. & Friedman, J.H. rCOSA: A Software Package for Clustering Objects on Subsets of Attributes. J Classif 34, 514–547 (2017). https://doi.org/10.1007/s00357-017-9240-z

Download citation

Published: 03 November 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00357-017-9240-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Abstract

Article PDF

Similar content being viewed by others

Clustering: an R library to facilitate the analysis and comparison of cluster algorithms

Hierarchical Means Clustering

Pairwise Data Clustering Accompanied by Validation and Visualisation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Abstract

Article PDF

Similar content being viewed by others

Clustering: an R library to facilitate the analysis and comparison of cluster algorithms

Hierarchical Means Clustering

Pairwise Data Clustering Accompanied by Validation and Visualisation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation