AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets

  • Zhao Yanchang
  • Song Junde
Conference paper

DOI: 10.1007/3-540-36175-8_27

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2637)
Cite this paper as:
Yanchang Z., Junde S. (2003) AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets. In: Whang KY., Jeon J., Shim K., Srivastava J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science, vol 2637. Springer, Berlin, Heidelberg

Abstract

The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a nonparametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.

Keywords

Data mining clustering grid iso-density line dimensionality 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Zhao Yanchang
    • 1
  • Song Junde
    • 1
  1. 1.Electronic Engineering SchoolBeijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations