Journal of Intelligent Information Systems

, Volume 24, Issue 1, pp 5–27 | Cite as

Clustering in Dynamic Spatial Databases

Article

Abstract

Efficient clustering in dynamic spatial databases is currently an open problem with many potential applications. Most traditional spatial clustering algorithms are inadequate because they do not have an efficient support for incremental clustering.In this paper, we propose DClust, a novel clustering technique for dynamic spatial databases. DClust is able to provide multi-resolution view of the clusters, generate arbitrary shapes clusters in the presence of noise, generate clusters that are insensitive to ordering of input data and support incremental clustering efficiently. DClust utilizes the density criterion that captures arbitrary cluster shapes and sizes to select a number of representative points, and builds the Minimum Spanning Tree (MST) of these representative points, called R-MST. After the initial clustering, a summary of the cluster structure is built. This summary enables quick localization of the effect of data updates on the current set of clusters. Our experimental results show that DClust outperforms existing spatial clustering methods such as DBSCAN, C2P, DENCLUE, Incremental DBSCAN and BIRCH in terms of clustering time and accuracy of clusters found.

Keywords

spatial databases data mining multi-resolution clustering incremental clustering Minimum Spanning Tree 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berchtold, S., Keim, D.A., and Kriegel, H. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. 22nd International Conference on Very Large Data Base (VLDB’96) (pp. 28–39). Mumbai, India.Google Scholar
  2. Can, F. (1993). Incremental Clustering for Dynamic Information Processing. ACM Transactions on Information Systems, 11(2), 143–164.Google Scholar
  3. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96) (pp. 226–231). Portland, USA.Google Scholar
  4. Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., and Xu, X. (1998). Incremental Clustering for Mining in a Data Warehouse Environment. In Proc. 24th International Conference on Very Large Data Base (VLDB’98) (pp. 323–333). New York, USA.Google Scholar
  5. Fisher, D.H. (1987). Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning, 2(2), 139–172.Google Scholar
  6. Ganti, V., Gehrke, J., and Ramakrishnan, R. (2001). DEMON: Mining and Mentoring Evolving Data. IEEE Transactions on Knowledge and Data Engineering, 13(1).Google Scholar
  7. Ganti, V., Ramakrishnan, R., Gehrke, J., Powell, A., and French, J. (1999). Clustering Large Datasets in Arbitrary Metric Spaces. In Proc. 15thInternational Conference on Data Engineering (ICDE’99)(pp. 502–511). Sydney, Australia.Google Scholar
  8. Guha, S., Rastogi, R., and Shim, K. (1998). CURE: An Efficient ClusteringAlgorithm for Large Databases. In Proc. 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’98) (pp. 73–84). Seattle, WA, USA.Google Scholar
  9. Hinneburg, A. and Keim, D.A. (1998). An Efficient Approach to Clusteringin Large Multimedia Databases with Noise. In Proc. 4th International Conference on Knowledge Discovery and Data Mining (KDD’98) (pp. 58–65). New York City, USA.Google Scholar
  10. MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In Proc. 5th Berkeley Symposium on Math, Statistics and Probability, vol. 1 (pp. 281–297).Google Scholar
  11. Nanopoulos, A., Theodoridis, Y., and Manolopoulos, Y. (2001). C2P: Clustering Based on Closest Pairs. In Proc. 27th International Conference on Very Large Data Base (VLDB’01) (pp. 331–340). Roma, Italy.Google Scholar
  12. Ng, R. and Han, J. (1994). Efficient and Effective Clustering Methods for Spatial Data Mining. In Proc. 20th International Conference on Very Large Data Base (VLDB’94) (pp. 144–155). Santiago, Chile.Google Scholar
  13. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., and Motwani, R. (2002). Streaming-Data Algorithms For High-Quality Clustering. In Proc. 18th International Conference on Data Engineering (ICDE’02) (pp. 685–694). San Jose, California, USA.Google Scholar
  14. Sheikholeslami, G., Chatterjee, S., and Zhang, A. (1999). WaveCluster: A Wavelet based Clustering Approach for Spatial Data in Very Large Database. VLDB Journal, 8(3/4), 289–304.Google Scholar
  15. Utgoff, P.E. (1989). Incremental Induction of Decision Tress. Machine Learning, 4, 161–186.Google Scholar
  16. Wang, W., Yang, J., and Muntz, R. (1997). STING: A Statistical Information Grid Approach to Spatial Data Mining. In Proc. 23rd International Conference on Very Large Data Base (VLDB’97) (pp. 186–195). Athens, Green.Google Scholar
  17. Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proc. 1996 ACMSIGMOD International Conference on Management of Data (SIGMOD’96) (pp.103–114). Montreal, Canada.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.School of ComputingNational University of SingaporeSingapore

Personalised recommendations