Abstract
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we generalize this algorithm in two important directions. The generalized algorithm—called GDBSCAN—can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes. In addition, four applications using 2D points (astronomy), 3D points (biology), 5D points (earth science) and 2D polygons (geography) are presented, demonstrating the applicability of GDBSCAN to real-world problems.
Similar content being viewed by others
References
Becker, R.H., White, R.L., and Helfand, D.J. 1995. The FIRST survey: Faint images of the radio sky at twenty centimeters. Astrophys. J., 450:559.
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. Proc. ACM SIGMOD Int. Conf. on Management of Data. Atlantic City, NJ, pp. 322–331.
Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanovichi, T., and Tasumi, M. 1977. The protein data bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 112:535–542.
Brinkhoff, T., Kriegel, H.-P., Schneider, R., and Seeger, B. 1994. Multi-step processing of spatial joins. Proc. ACM SIGMOD Int. Conf. on Management of Data. Minneapolis, MN, pp. 197–208.
Connolly, M.L. 1986. Measurement of protein surface shape by solid angles. Journal of Molecular Graphics, 4(1):3–6.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, pp. 226–231.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1997. Density-connected sets and their application for trend detection in spatial databases. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining. Newport Beach, CA, pp. 10–15.
Ester, M., Kriegel, H.-P., and Xu, X. 1995. A database interface for clustering in large spatial databases. Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining. Montreal, Canada, pp. 94–99.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. Knowledge discovery and data mining: Towards a unifying framework. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, pp. 82–88.
Gueting, R.H. 1994. An introduction to spatial database systems. The VLDB Journal, 3(4):357–399.
Hattori, K. and Torii, Y. 1993. Effective algorithms for the nearest neighbor method in the clustering problem. Pattern Recognition, 26(5):741–746.
Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data. New Jersey: Prentice Hall.
Kaufman, L. and Rousseeuw, P.J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.
MacQueen, J. 1967. Some Methods for Classification and Analysis of Multivariate Observations. In 5th Berkeley Symp. Math. Statist. Prob., L. Le Cam and J. Neyman (Eds.), vol. 1, pp. 281–297.
Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):903–913.
Murtagh, F. 1983. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354–359.
Ng, R.T. and Han, J. 1994. Efficient and effective clustering methods for spatial data mining. Proc. 20th Int. Conf. on Very Large Data Bases. Santiago, Chile, pp. 144–155.
Niemann, H. 1990. Pattern Analysis and Understanding. Berlin: Springer-Verlag.
Protein Data Bank. 1994. Quarterly Newsletter 70. Brookhaven National Laboratory, Upton, NY.
Reid, I.N., et al. 1991. The second palomar sky survey. Publ. Astron. Soc. Pacific, 103:661.
Richards, A.J. 1983. Remote Sensing Digital Image Analysis. An Introduction. Berlin: Springer-Verlag.
Sibson, R. 1973. SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1):30–34.
Stonebraker, M., Frew, J., Gardels, K., and Meredith, J. 1993. The SEQUOIA 2000 storage benchmark. Proc. ACM SIGMOD Int. Conf. on Management of Data. Washington, DC, pp. 2–11.
Vinod, H. 1969. Integer programming and the theory of grouping. J. Amer. Statist. Assoc., 64:506–517.
Weir, N., Fayyad, U.M., and Djorgovski, S. 1995. Automated star/galaxy classification for digitized POSS-II. Astron. J., 109:2401.
Zepka, A.F., Cordes, J.M., and Wasserman, I. 1994. Signal detection amid noise with known statistics. Astrophys. J., 427–438.
Zhang, T., Ramakrishnan, R., and Linvy, M. 1997. BIRCH: An efficient data clustering method for very large databases. Data Mining and Knowledge Discovery, 1(2):141–182.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sander, J., Ester, M., Kriegel, HP. et al. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Mining and Knowledge Discovery 2, 169–194 (1998). https://doi.org/10.1023/A:1009745219419
Issue Date:
DOI: https://doi.org/10.1023/A:1009745219419