ISPA 2005: Parallel and Distributed Processing and Applications pp 655-661 | Cite as
Clustering Mixed Type Attributes in Large Dataset
Abstract
Clustering is a widely used technique in data mining, now there exists many clustering algorithms, but most existing clustering algorithms either are limited to handle the single attribute or can handle both data types but are not efficient when clustering large data sets. Few algorithms can do both well. In this paper, we propose a clustering algorithm CFIKP that can handle large datasets with mixed type of attributes. We first use CF *-tree to pre-cluster datasets. After the dense regions are stored in leaf nodes, then we look every dense region as a single point and use an improved k-prototype to cluster such dense regions. Experiments show that the CFIKP algorithm is very efficient in clustering large datasets with mixed type of attributes.
Keywords
Data Mining Cluster Algorithm Large Dataset Leaf Node Mixed TypePreview
Unable to display preview. Download preview PDF.
References
- 1.MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Pro. 5th Berkeley Symp. Math. Statist, Pro., vol. 1, pp. 128–297 (1967)Google Scholar
- 2.Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovering 2, 283–304 (1998)CrossRefGoogle Scholar
- 3.Ng, R., Han, J.: Efficient and effective clustering method for spatial data mining. In: Pro. 1994 Int. Conf. Very Large Data Bases, pp. 144–155 (1994)Google Scholar
- 4.Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clustering in large spatial database with noise. In: Proc. 1996 Int. Conf. Knowledge Discovering and Data Mining, pp. 266–231 (1996)Google Scholar
- 5.Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. ACM-SIGKDD Int. Conf. Managament of Data, pp. 103–114 (1996)Google Scholar
- 6.Chiu, T., Fang, D.P., Chen, J., Wang, Y.: A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment. In: Proc. ACM-SIGKDD int. conf. Knowledge discovery and data mining (KDD 2001), pp. 263–268 (2001)Google Scholar
- 7.Chen, P., Wang, Y.: An Efficient clustering algorithm for categorical and mixed typed attributes. Computer Engineering and Application (1), 190–191 (2004)Google Scholar