Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes
In this paper, a decision cluster forest classification model is proposed for high dimensional data with multiple classes. A decision cluster forest (DCF) consists of a set of decision cluster trees, in which the leaves of each tree are clusters labeled with the same class that determines the class of new objects falling in the clusters. By recursively calling a variable weighting k-means algorithm, a decision cluster tree can be generated from a subset of the training data that contains the objects in the same class. The set of m decision cluster trees grown from the subsets of m classes constitute the decision cluster forest. Anderson-Darling test is used to determine the stopping condition of tree growing. A DCF classification (DCFC) model is selected from all leaves of the m decision cluster trees in the forest. A series of experiments on both synthetic and real data sets have shown that the DCFC model performed better in accuracy and scalability than the single decision cluster tree method and the methods of k-NN, decision tree and SVM. This new model is particularly suitable for large, high dimensional data with many classes.
KeywordsClustering classification W-k-means forest
Unable to display preview. Download preview PDF.
- 1.Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining? In: KDD 2006 panel report. SIGKDD Explorations, vol. 8, pp. 70–77 (2006)Google Scholar
- 4.Jing, L., Huang, J., Ng, M.K., Rong, H.: A feature weighting approach to building classification models by interactive clustering. In: Torra, V., Narukawa, Y. (eds.) MDAI 2004. LNCS (LNAI), vol. 3131, pp. 284–294. Springer, Heidelberg (2004)Google Scholar
- 8.Kyriakopoulou, A., Kalamboukis, T.: Text classification using clustering. In: ECML-PKDD Discovery Challenge Workshop Proceedings (2006)Google Scholar
- 10.Mui, J., Fu, K.: Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence 2, 429–443 (1980)Google Scholar
- 11.Huang, Z., Ng, M., Lin, T., Cheung, D.: An interactive approach to building classification models by clustering and cluster validation. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 23–28. Springer, Heidelberg (2000)Google Scholar
- 13.Blockeel, H., Raedt, L., Ramong, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)Google Scholar