Skip to main content
Log in

A statistical information-based clustering approach in distance space

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Esteret al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example, DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerstet al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • Agrawal, R., Gehrke, J., Gunpopulos, D., Raghavan, P., 1998. Automatic Subspace Clustering of High DiMensional Data for Data Mining Applications. Proc. of ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, p. 73–84.

  • Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. Proc. 1999 ACM SIGMOD Int. Conf. Management of Data Mining, PA, p. 49–60.

  • Bechmann, N., Kriegel, H.P., Schneider, R., Seeger, B., 1990. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. Proc. ACM SIGMOD Int. Conf. On Management of Data. Alt. City, NJ, p. 322–331.

  • Ester, M., Kriegel, H.P., Sander, H., XU, X., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. of 2nd Int. Conf. on Knowledge Discovering in Databases and Data Mining. Portland, Oregon, p. 232–1239.

  • Guha, S., Rastogi, R., Shim, K., 1998. CURE: AN Efficient Clustering Algorithm for Large Databases. Proc. of the ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p. 73–84.

  • Han, J., 2001. Data Mining. Morgan Kaufmann Publishers, USA, p. 242–266.

    Google Scholar 

  • Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002. Clustering validity checking methods: part II.SIGMOD Record,31(4):51–62.

    MATH  Google Scholar 

  • Karypos, G., Han, E.H., Kunar, V., 1993. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling.Computer,32(8):68–75.

    Article  Google Scholar 

  • Nakamura, E., Kehtarnavaz, N., 1998. Determining number of clusters and prototype locations via multi-scale clustering.Pattern Recognition Letters,19(3):1265–1283.

    Article  MATH  Google Scholar 

  • Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wavecluster: A Multi-resolution Clustering Approach for very Large Spatial Databases. Proc. of 24th VLDB Conf., New York, p. 428–439.

  • Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2004. Using Greedy algorithm: DBSCAN revisited II.J Zhejiang Univ SCI,5(11):1405–1412.

    Article  Google Scholar 

  • Zhang, W., Yang, Y., Munta, R., 1997. STING: An Statistical Information Grid Approach to Spatial Data Mining. Proc. of 23rd VLDB Conf., Seattle, WA, p. 186–195.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Project (No. 2002AA412010-12) supported by the Hi-Tech Research and Development Program (863) of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi-hong, Y., Ping, L., Ji-dong, G. et al. A statistical information-based clustering approach in distance space. J. Zheijang Univ.-Sci. A 6, 71–78 (2005). https://doi.org/10.1631/BF02842480

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/BF02842480

Key words

Document code

CLC number

Navigation