Space Decomposition in Data Mining: A Clustering Approach

  • Lior Rokach
  • Oded Maimon
  • Inbal Lavi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2871)


Data mining algorithms aim at searching interesting patterns in large amount of data in manageable complexity and good accuracy. Decomposition methods are used to improve both criteria. As opposed to most decomposition methods, that partition the dataset via sampling, this paper presents an accuracy-oriented method that partitions the instance space into mutually exclusive subsets using K-means clustering algorithm. After employing the basic divide-and-induce method on several datasets with different classifiers, its error rate is compared to that of the basic learning algorithm. An analysis of the results shows that the proposed method is well suited for datasets of numeric input attributes and that its performance is influenced by the dataset size and its homogeneity. Finally, a homogeneity threshold is developed, that can be used for deciding whether to decompose the data set or not.


Data Mining Cluster Center Unlabeled Data Validity Index Data Mining Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting and variants. Machine learning 36, 105–142 (1999)CrossRefGoogle Scholar
  2. 2.
    Dhillon, I.S., Modha, D.S.: A data clustering algorithm on distributed memory machines. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 245–260. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Fayyad, U., Piatesky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–30. MIT Press, Cambridge (1996)Google Scholar
  4. 4.
    Hartigan, J.A.: Clustering algorithms. John Wiley & Sons, Chichester (1975)zbMATHGoogle Scholar
  5. 5.
    Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the first Pacific-Asia conference on knowledge discovery and data mining, Singapore (1997)Google Scholar
  6. 6.
    Kim, D.J., Park, Y.W., Park, D.J.: A novel validity index for determination of the optimal number of clusters. IEICE Trans. Inf. E84-D(2), 281–285 (2001)Google Scholar
  7. 7.
    Kusiak, A.: Decomposition in data mining: An industrial case study. IEEE Trans. Electron. Packag. Manufact. 23, 345–353 (2000)CrossRefGoogle Scholar
  8. 8.
    Maimon, O., Rokach, L.: Theory and Applications of Attribute Decomposition. In: IEEE International Conference on Data Mining, pp. 473–480 (2001)Google Scholar
  9. 9.
    Provost, F.J., Kolluri, V.: A Survey of Methods for Scaling Up Inductive Learning Algorithms. In: Proc. 3rd International Conference on Knowledge Discovery and Data Mining (1997)Google Scholar
  10. 10.
    Ray, S., Turi, R.H.: Determination of Number of Clusters in K-Means Clustering and Application in Color Image Segmentation. Monash university (1999)Google Scholar
  11. 11.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. Tech. Rep. 208, Dept. of Statistics, Stanford University (2000)Google Scholar
  12. 12.
    Wang, X., Yu, Q.: Estimate the number of clusters in web documents via gap statistic (May 2001)Google Scholar
  13. 13.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7) (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Lior Rokach
    • 1
  • Oded Maimon
    • 1
  • Inbal Lavi
    • 1
  1. 1.Department of industrial engineeringTel-Aviv UniversityTel AvivIsrael

Personalised recommendations