Data Mining and Knowledge Discovery

, Volume 2, Issue 2, pp 169–194 | Cite as

Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

  • Jörg Sander
  • Martin Ester
  • Hans-Peter Kriegel
  • Xiaowei Xu
Article

Abstract

The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we generalize this algorithm in two important directions. The generalized algorithm—called GDBSCAN—can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes. In addition, four applications using 2D points (astronomy), 3D points (biology), 5D points (earth science) and 2D polygons (geography) are presented, demonstrating the applicability of GDBSCAN to real-world problems.

clustering algorithms spatial databases efficiency applications 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Becker, R.H., White, R.L., and Helfand, D.J. 1995. The FIRST survey: Faint images of the radio sky at twenty centimeters. Astrophys. J., 450:559.Google Scholar
  2. Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. Proc. ACM SIGMOD Int. Conf. on Management of Data. Atlantic City, NJ, pp. 322–331.Google Scholar
  3. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanovichi, T., and Tasumi, M. 1977. The protein data bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 112:535–542.Google Scholar
  4. Brinkhoff, T., Kriegel, H.-P., Schneider, R., and Seeger, B. 1994. Multi-step processing of spatial joins. Proc. ACM SIGMOD Int. Conf. on Management of Data. Minneapolis, MN, pp. 197–208.Google Scholar
  5. Connolly, M.L. 1986. Measurement of protein surface shape by solid angles. Journal of Molecular Graphics, 4(1):3–6.Google Scholar
  6. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, pp. 226–231.Google Scholar
  7. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1997. Density-connected sets and their application for trend detection in spatial databases. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining. Newport Beach, CA, pp. 10–15.Google Scholar
  8. Ester, M., Kriegel, H.-P., and Xu, X. 1995. A database interface for clustering in large spatial databases. Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining. Montreal, Canada, pp. 94–99.Google Scholar
  9. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. Knowledge discovery and data mining: Towards a unifying framework. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, pp. 82–88.Google Scholar
  10. Gueting, R.H. 1994. An introduction to spatial database systems. The VLDB Journal, 3(4):357–399.Google Scholar
  11. Hattori, K. and Torii, Y. 1993. Effective algorithms for the nearest neighbor method in the clustering problem. Pattern Recognition, 26(5):741–746.Google Scholar
  12. Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data. New Jersey: Prentice Hall.Google Scholar
  13. Kaufman, L. and Rousseeuw, P.J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.Google Scholar
  14. MacQueen, J. 1967. Some Methods for Classification and Analysis of Multivariate Observations. In 5th Berkeley Symp. Math. Statist. Prob., L. Le Cam and J. Neyman (Eds.), vol. 1, pp. 281–297.Google Scholar
  15. Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):903–913.Google Scholar
  16. Murtagh, F. 1983. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354–359.Google Scholar
  17. Ng, R.T. and Han, J. 1994. Efficient and effective clustering methods for spatial data mining. Proc. 20th Int. Conf. on Very Large Data Bases. Santiago, Chile, pp. 144–155.Google Scholar
  18. Niemann, H. 1990. Pattern Analysis and Understanding. Berlin: Springer-Verlag.Google Scholar
  19. Protein Data Bank. 1994. Quarterly Newsletter 70. Brookhaven National Laboratory, Upton, NY.Google Scholar
  20. Reid, I.N., et al. 1991. The second palomar sky survey. Publ. Astron. Soc. Pacific, 103:661.Google Scholar
  21. Richards, A.J. 1983. Remote Sensing Digital Image Analysis. An Introduction. Berlin: Springer-Verlag.Google Scholar
  22. Sibson, R. 1973. SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1):30–34.Google Scholar
  23. Stonebraker, M., Frew, J., Gardels, K., and Meredith, J. 1993. The SEQUOIA 2000 storage benchmark. Proc. ACM SIGMOD Int. Conf. on Management of Data. Washington, DC, pp. 2–11.Google Scholar
  24. Vinod, H. 1969. Integer programming and the theory of grouping. J. Amer. Statist. Assoc., 64:506–517.Google Scholar
  25. Weir, N., Fayyad, U.M., and Djorgovski, S. 1995. Automated star/galaxy classification for digitized POSS-II. Astron. J., 109:2401.Google Scholar
  26. Zepka, A.F., Cordes, J.M., and Wasserman, I. 1994. Signal detection amid noise with known statistics. Astrophys. J., 427–438.Google Scholar
  27. Zhang, T., Ramakrishnan, R., and Linvy, M. 1997. BIRCH: An efficient data clustering method for very large databases. Data Mining and Knowledge Discovery, 1(2):141–182.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Jörg Sander
    • 1
  • Martin Ester
    • 1
  • Hans-Peter Kriegel
    • 1
  • Xiaowei Xu
    • 1
  1. 1.Institute for Computer ScienceUniversity of MunichMünchenGermany

Personalised recommendations