EPIC: Efficient Integration of Partitional Clustering Algorithms for Classification

  • Vikas K. Garg
  • M. N. Murty
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6457)

Abstract

Partitional algorithms form an extremely popular class of clustering algorithms. Primarily, these algorithms can be classified into two sub-categories: a) k-means based algorithms that presume the knowledge of a suitable k, and b) algorithms such as Leader, which take a distance threshold value, τ, as an input. In this work, we make the following contributions. We 1) propose a novel technique, EPIC, which is based on both the number of clusters, k and the distance threshold, τ, 2) demonstrate that the proposed algorithm achieves better performance than the standard k-means algorithm, and 3) present a generic scheme for integrating EPIC into different classification algorithms to reduce their training time complexity.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3) (1999)Google Scholar
  2. 2.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very large Databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)Google Scholar
  3. 3.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)Google Scholar
  4. 4.
    Spath, H.: Cluster Analysis Algorithms for Data Reduction and Classification. Ellis Horwood, ChichesterGoogle Scholar
  5. 5.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An Efficient k-means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 881–892 (2002)Google Scholar
  6. 6.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  7. 7.
    Garg, V.K., Murty, M.N.: Pragmatic Data Mining: Novel Paradigms for Tackling Key Challenges. Technical Report TR/2009/11, CSA, IISc (2009), http://csa.iisc.ernet.in/TR/2009/11/

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Vikas K. Garg
    • 1
  • M. N. Murty
    • 2
  1. 1.IBM ResearchIndia
  2. 2.Department of Computer Science and Automation (CSA)Indian Institute of Science (IISc)BangaloreIndia

Personalised recommendations