Abstract
Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Esteret al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example, DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerstet al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.
References
Agrawal, R., Gehrke, J., Gunpopulos, D., Raghavan, P., 1998. Automatic Subspace Clustering of High DiMensional Data for Data Mining Applications. Proc. of ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, p. 73–84.
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. Proc. 1999 ACM SIGMOD Int. Conf. Management of Data Mining, PA, p. 49–60.
Bechmann, N., Kriegel, H.P., Schneider, R., Seeger, B., 1990. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. Proc. ACM SIGMOD Int. Conf. On Management of Data. Alt. City, NJ, p. 322–331.
Ester, M., Kriegel, H.P., Sander, H., XU, X., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. of 2nd Int. Conf. on Knowledge Discovering in Databases and Data Mining. Portland, Oregon, p. 232–1239.
Guha, S., Rastogi, R., Shim, K., 1998. CURE: AN Efficient Clustering Algorithm for Large Databases. Proc. of the ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p. 73–84.
Han, J., 2001. Data Mining. Morgan Kaufmann Publishers, USA, p. 242–266.
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002. Clustering validity checking methods: part II.SIGMOD Record,31(4):51–62.
Karypos, G., Han, E.H., Kunar, V., 1993. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling.Computer,32(8):68–75.
Nakamura, E., Kehtarnavaz, N., 1998. Determining number of clusters and prototype locations via multi-scale clustering.Pattern Recognition Letters,19(3):1265–1283.
Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wavecluster: A Multi-resolution Clustering Approach for very Large Spatial Databases. Proc. of 24th VLDB Conf., New York, p. 428–439.
Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2004. Using Greedy algorithm: DBSCAN revisited II.J Zhejiang Univ SCI,5(11):1405–1412.
Zhang, W., Yang, Y., Munta, R., 1997. STING: An Statistical Information Grid Approach to Spatial Data Mining. Proc. of 23rd VLDB Conf., Seattle, WA, p. 186–195.
Author information
Authors and Affiliations
Additional information
Project (No. 2002AA412010-12) supported by the Hi-Tech Research and Development Program (863) of China
Rights and permissions
About this article
Cite this article
Shi-hong, Y., Ping, L., Ji-dong, G. et al. A statistical information-based clustering approach in distance space. J. Zheijang Univ.-Sci. A 6, 71–78 (2005). https://doi.org/10.1631/BF02842480
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/BF02842480