Abstract
Flexibility of cluster analysis is sometimes understood as the robustness of final partition of objects to the changes in the list of diagnostic variables—deleting some from the list or adding some. In this paper, we propose a procedure which makes possible to calculate a distance matrix on the basis of different subsets of variables, but the selection of variables is somehow unified. The procedure starts with the classical standardization of each variable. Before the calculation of a distance between two objects, we eliminate the variables with the largest absolute value in the first object and in the second object. If by chance the same variable is pointed for elimination for both objects, the next variable with the largest absolute value (for both objects) should be eliminated. With this procedure, each element of the distance matrix is based on the same number of variables, but the variables can be different. As an example, a data set of 17 variables describing human smart society characteristics for 28 European Union countries is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Drewnowski, J.: The Level of Living Index. UNSRISD, Report No 4, Geneva (1966)
Gallaugher, M.P.B., Tang, Y., McNicholas, P.D.: Flexible clustering with a sparse mixture of generalized hyperbolic distributions (2019). arXiv:1903.05054 [stat.ME]
Hahmann, M., Volk, P.B., Rosenthal, F., Habich, D., Lehner, W.: How to control clustering results? flexible clustering aggregation. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.F. (eds.) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol. 5772. Springer, Berlin (2009)
Lance, G.N., Williams, W.T.: A general theory of classification sorting strategies. I. Hierarchical systems. Comput. J. 9(4), 373–380 (1967)
Ni, J., Tong, H., Fan, W., Zhang, X.: Flexible clustering and robust multi-network clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2015, pp. 835–844. Association for Computing Machinery (2015)
Tang, Y., Browne, R.P., McNicholas, P.D.: Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions. Stat 7(1), e177 (2018)
Acknowledgements
The project is financed by the Ministry of Science and Higher Education in Poland under the programme “Regional Initiative of Excellence” 2019–2022 project number 015/RID/2018/19 total funding amount 10 721 040.00 PLN, and research fund granted to the Faculty of Management at Cracow University of Economics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sokołowski, A., Markowska, M. (2021). Flexible Clustering. In: Chadjipadelis, T., Lausen, B., Markos, A., Lee, T.R., Montanari, A., Nugent, R. (eds) Data Analysis and Rationality in a Complex World. IFCS 2019. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-60104-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-60104-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60103-4
Online ISBN: 978-3-030-60104-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)