Advertisement

Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes

  • Yan Li
  • Edward Hung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)

Abstract

In this paper, a decision cluster forest classification model is proposed for high dimensional data with multiple classes. A decision cluster forest (DCF) consists of a set of decision cluster trees, in which the leaves of each tree are clusters labeled with the same class that determines the class of new objects falling in the clusters. By recursively calling a variable weighting k-means algorithm, a decision cluster tree can be generated from a subset of the training data that contains the objects in the same class. The set of m decision cluster trees grown from the subsets of m classes constitute the decision cluster forest. Anderson-Darling test is used to determine the stopping condition of tree growing. A DCF classification (DCFC) model is selected from all leaves of the m decision cluster trees in the forest. A series of experiments on both synthetic and real data sets have shown that the DCFC model performed better in accuracy and scalability than the single decision cluster tree method and the methods of k-NN, decision tree and SVM. This new model is particularly suitable for large, high dimensional data with many classes.

Keywords

Clustering classification W-k-means forest 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining? In: KDD 2006 panel report. SIGKDD Explorations, vol. 8, pp. 70–77 (2006)Google Scholar
  2. 2.
    Li, Y., Hung, E., Chung, K., Huang, J.: Building a decision cluster classification model by a variable weighting k-means method. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 337–347. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Huang, J., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 657–668 (2005)CrossRefGoogle Scholar
  4. 4.
    Jing, L., Huang, J., Ng, M.K., Rong, H.: A feature weighting approach to building classification models by interactive clustering. In: Torra, V., Narukawa, Y. (eds.) MDAI 2004. LNCS (LNAI), vol. 3131, pp. 284–294. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)zbMATHGoogle Scholar
  6. 6.
    Anderson, T.W., Darling, D.A.: Asymptotic theory of certain ”goodness-of-fit” criteria based on stochastic processes. The Annals of Mathematical Statistics 23, 193–212 (1952)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Stephens, M.A.: Edf statistics for goodness of fit and some comparisons. Journal of the American Statistical Association 69, 730–737 (1974)CrossRefGoogle Scholar
  8. 8.
    Kyriakopoulou, A., Kalamboukis, T.: Text classification using clustering. In: ECML-PKDD Discovery Challenge Workshop Proceedings (2006)Google Scholar
  9. 9.
    Zhang, B., Srihari, S.N.: Fast k-nearest neighbor classification using cluster-based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 525–528 (2004)CrossRefGoogle Scholar
  10. 10.
    Mui, J., Fu, K.: Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence 2, 429–443 (1980)Google Scholar
  11. 11.
    Huang, Z., Ng, M., Lin, T., Cheung, D.: An interactive approach to building classification models by clustering and cluster validation. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 23–28. Springer, Heidelberg (2000)Google Scholar
  12. 12.
    Huang, Z., Lin, T.: A visual method of cluster validation with fastmap. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 153–164. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Blockeel, H., Raedt, L., Ramong, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Yan Li
    • 1
  • Edward Hung
    • 1
  1. 1.Department of ComputingThe Hong Kong Polytechnic University Hung HomHong Kong

Personalised recommendations