Abstract
Many recently proposed subspace clustering methods suffer from two severe problems. First, the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters. Second, the clustering results are often sensitive to input parameters. In this paper, a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations. This algorithm first filters out redundant attributes by computing the Gini coefficient. To evaluate the correlation of every two non-redundant attributes, the relation matrix of non-redundant attributes is constructed based on the relation function of two dimensional united Gini coefficients. After applying an overlapping clustering algorithm on the relation matrix, the candidate of all interesting subspaces is achieved. Finally, all subspace clusters can be derived by clustering on interesting subspaces. Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters, but also is insensitive to input parameters.
Similar content being viewed by others
References
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data. Washington: ACM Press, 1998: 94–105
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 2005, 11(1): 5–33
Cheng C H, Fu A W, Zhang Y. Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. USA: ACM Press, 1999: 84–93
Goil S, Nagesh H S, Choudhary A. MAFIA: efficient and scalable subspace clustering for very large data sets. Technique Report No. CPDC-TR-9906-010. Center for Parallel and Distributed Computing, Dept. of Electrical and Computer Engineering, Northwestern University: Evanston, IL, 1999
Procopiuc C M, Johes M, Agarwal P K, et al. A Monte Carlo algorithm for fast projective clustering. In: Proceedings of ACM SIGMOD International Conference on Management of Data. Madison: ACM Press, 2002: 418–427
Huang Z, Ng M, Rong H. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657–668
Kriegel H, Kröger P, Renz M, et al. A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of 5th IEEE International Conference on Data Mining. New Orleans: IEEE Press, 2005: 250–257
Author information
Authors and Affiliations
Corresponding author
Additional information
__________
Translated from Journal of Beijing University of Posts and Telecommunications, 2007, 30(3): 1–5 [译自: 北京邮电大学学报]
About this article
Cite this article
Niu, K., Zhang, S. & Chen, J. Subspace clustering through attribute clustering. Front. Electr. Electron. Eng. China 3, 44–48 (2008). https://doi.org/10.1007/s11460-008-0010-x
Issue Date:
DOI: https://doi.org/10.1007/s11460-008-0010-x