Advertisement

A Creditable Subspace Labeling Method Based on D-S Evidence Theory

  • Yu Zong
  • Xian-Chao Zhang
  • He Jiang
  • Ming-Chu Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5012)

Abstract

Due to inherent sparse, noise and nearly zero difference characteristics of high dimensional data sets, traditional clustering methods fails to detect meaningful clusters in them. Subspace clustering attempts to find the true distribution inherent to the subsets with original attributes. However, which subspace contains the true clustering result is usually uncertain. From this point of view, subspace clustering can be regarded as an uncertain discursion problem. In this paper, we firstly develop the criterion to evaluate creditable subspaces which contain the meaningful clustering results, and then propose a creditable subspace labeling method (CSL) based on D-S evidence theory. The creditable subspaces of the original data space can be found by iteratively executing the algorithm CSL. Once the creditable subspaces are got, the true clustering results can be found using a traditional clustering algorithm on each creditable subspace. Experiments show that CSL can detect the actual creditable subspace with the original attribute. In this way, a novel approach of clustering problems using traditional clustering algorithms to deal with high dimension data sets is proposed.

Keywords

Cluster Algorithm Subspace Cluster Evidence Theory Spatial Data Mining Original Data Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, California (2002)Google Scholar
  2. 2.
    Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics (28), 100–108 (1979)zbMATHCrossRefGoogle Scholar
  3. 3.
    Ng, R.T., Han, J.: Efficient and Effective Clustering Method for Spatial Data Mining. In: Proceeding of the 20th VLDB Conference, pp. 144–155 (1994)Google Scholar
  4. 4.
    Ng, R., Han, J.: CLARANS: A method for clustering objects for spatial data mining. IEEE Trans. on Knowl., Data Eng. 14(5), 1003–1016 (2002)CrossRefGoogle Scholar
  5. 5.
    Zhang, T., Ramakrishna, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Journal of Data Mining and Knowledge Discovery, 141–182 (1997)Google Scholar
  6. 6.
    Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large database. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 73–84 (1998)Google Scholar
  7. 7.
    Ester, M., Kriegel, H.P., Sander, J., et al.: A density-based algorithm for discovering clusters in large spatial database. In: Proc.1996 Int. Conf.Knowledge Discovery and Data Mining (KDD 1996), Portland, OR, August 1996, pp. 226–231 (1996)Google Scholar
  8. 8.
    Agrawal, R., Gehrke, J., Gunopulos, D., et al.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc.1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), Seattle, WA, June 1998, pp. 94–105 (1998)Google Scholar
  9. 9.
    Cheng, C.-H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 84–93. ACM press, New York (1999)CrossRefGoogle Scholar
  10. 10.
    Goil, S., Nagesh, H., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 2145 Sheridan Road, Evanston IL 60208 (June 1999)Google Scholar
  11. 11.
    Kailing, K., Kriegel, H., Kroger, P.: Density-connected Subspace Clustering for High-dimensional Data. In: Proc. 4th SIAM Int. Conf. on Data Mining, Lake Buena Vista, FL, pp. 246–257 (2004)Google Scholar
  12. 12.
    Dempster, A.: Upper and Lower Probabilities induced by multivalued mapping. Annals of Mathematical Statistics 38(2), 325–339 (1967)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Orponen, P.: Dempsster’s rule of combination is #P- complete. Artificial Intelligence 44(1-2), 245–253 (1990)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Jian-Wei, Z., Da-Wei, W., Yu, C., et al.: A Network Anomaly Detector Based on the D-S Evidence Theory. Journal of Software 17(3), 463–471 (2006)CrossRefGoogle Scholar
  15. 15.
    Xiaoyun, Z., Zhihui, S., Baili, Z., et al.: An Efficient Discovering and Maintenance Algorithm of Subspace Clustering over High Dimensional Data Streams. Journal of Computer Research and Development 43(5), 834–840 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yu Zong
    • 1
  • Xian-Chao Zhang
    • 1
  • He Jiang
    • 1
  • Ming-Chu Li
    • 1
  1. 1.School of SoftwareDalian University of TechnologyDalianChina

Personalised recommendations