Advertisement

Journal of Zhejiang University-SCIENCE A

, Volume 7, Issue 10, pp 1626–1633 | Cite as

An efficient enhanced k-means clustering algorithm

  • Fahim A. M. 
  • Salem A. M. 
  • Torkey F. A. 
  • Ramadan M. A. 
Article

Abstract

In k-means clustering, we are given a set of n data points in d-dimensional space ℝd and an integer k and the problem is to determine a set of k points in ℝd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.

Key words

Clustering algorithms Cluster analysis k-means algorithm Data analysis 

CLC number

TP301.6 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P., 1998. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Proc. ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p.94–105.Google Scholar
  2. Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. Proc. ACM SIGMOD Int. Con. Management of Data Mining, p.49–60.Google Scholar
  3. Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.zbMATHGoogle Scholar
  4. Ester, M., Kriegel, H.P., Sander, J., Xu, X., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press, Portland, OR, p.226–231.Google Scholar
  5. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., 1996. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press.Google Scholar
  6. Gersho, A., Gray, R.M., 1992. Vector Quantization and Signal Compression. Kluwer Academic, Boston.CrossRefzbMATHGoogle Scholar
  7. Guha, S., Rastogi, R., Shim, K., 1998. CURE: An Efficient Clustering Algorithms for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p.73–84.Google Scholar
  8. Hinneburg, A., Keim, D., 1998. An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining. New York City, NY.Google Scholar
  9. Huang, Z., 1997. A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tech. Report 97-07, Dept. of CS, UBC.Google Scholar
  10. Jain, A.K., Dubes, R.C., 1988. Algorithms for Clustering Data. Prentice-Hall Inc.Google Scholar
  11. Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.Google Scholar
  12. MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations. 5th Berkeley Symp. Math. Statist. Prob., 1:281–297.MathSciNetzbMATHGoogle Scholar
  13. Merz, P., 2003. An Iterated Local Search Approach for Minimum Sum of Squares Clustering. IDA 2003, p.286–296.Google Scholar
  14. Ng, R.T., Han, J., 1994. Efficient and Effective Clustering Methods for Spatial Data Mining. Proc. 20th Int. Conf. on Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, p.144–155.Google Scholar
  15. Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wave-Cluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proc. 24th Int. Conf. on Very Large Data Bases. New York, p.428–439.Google Scholar
  16. Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Comp. Journal, 16(1):30–34. [doi:10.1093/comjnl/16.1.30]MathSciNetCrossRefGoogle Scholar
  17. Zhang, T., Ramakrishnan, R., Linvy, M., 1996. BIRCH: An Efficient Data Clustering Method for Very Large Data-bases. Proc. ACM SIGMOD Int. Conf. on Management of Data. ACM Press, New York, p.103–114.Google Scholar

Copyright information

© Zhejiang University 2006

Authors and Affiliations

  • Fahim A. M. 
    • 1
  • Salem A. M. 
    • 2
  • Torkey F. A. 
    • 3
  • Ramadan M. A. 
    • 4
  1. 1.Department of Mathematics, Faculty of EducationSuez Canal UniversitySuez cityEgypt
  2. 2.Department of Computer Science, Faculty of Computers & InformationAin Shams UniversityCairo cityEgypt
  3. 3.Department of Computer Science, Faculty of Computers & InformationMinufiya UniversityShbeen El Koom CityEgypt
  4. 4.Department of Mathematics, Faculty of ScienceMinufiya UniversityShbeen El Koom CityEgypt

Personalised recommendations