A statistical information-based clustering approach in distance space

Shi-hong, Yue; Ping, Li; Ji-dong, Guo; Shui-geng, Zhou

doi:10.1631/BF02842480

A statistical information-based clustering approach in distance space

Published: 01 August 2005

Volume 6, pages 71–78, (2005)
Cite this article

Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Yue Shi-hong¹,
Li Ping¹,
Guo Ji-dong² &
…
Zhou Shui-geng¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Esteret al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example, DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerstet al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Gehrke, J., Gunpopulos, D., Raghavan, P., 1998. Automatic Subspace Clustering of High DiMensional Data for Data Mining Applications. Proc. of ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, p. 73–84.
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. Proc. 1999 ACM SIGMOD Int. Conf. Management of Data Mining, PA, p. 49–60.
Bechmann, N., Kriegel, H.P., Schneider, R., Seeger, B., 1990. The R^*-tree: An Efficient and Robust Access Method for Points and Rectangles. Proc. ACM SIGMOD Int. Conf. On Management of Data. Alt. City, NJ, p. 322–331.
Ester, M., Kriegel, H.P., Sander, H., XU, X., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. of 2nd Int. Conf. on Knowledge Discovering in Databases and Data Mining. Portland, Oregon, p. 232–1239.
Guha, S., Rastogi, R., Shim, K., 1998. CURE: AN Efficient Clustering Algorithm for Large Databases. Proc. of the ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p. 73–84.
Han, J., 2001. Data Mining. Morgan Kaufmann Publishers, USA, p. 242–266.
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002. Clustering validity checking methods: part II.SIGMOD Record,31(4):51–62.
MATH Google Scholar
Karypos, G., Han, E.H., Kunar, V., 1993. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling.Computer,32(8):68–75.
Article Google Scholar
Nakamura, E., Kehtarnavaz, N., 1998. Determining number of clusters and prototype locations via multi-scale clustering.Pattern Recognition Letters,19(3):1265–1283.
Article MATH Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wavecluster: A Multi-resolution Clustering Approach for very Large Spatial Databases. Proc. of 24th VLDB Conf., New York, p. 428–439.
Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2004. Using Greedy algorithm: DBSCAN revisited II.J Zhejiang Univ SCI,5(11):1405–1412.
Article Google Scholar
Zhang, W., Yang, Y., Munta, R., 1997. STING: An Statistical Information Grid Approach to Spatial Data Mining. Proc. of 23rd VLDB Conf., Seattle, WA, p. 186–195.

Download references

Author information

Authors and Affiliations

Institute of Industrial Process Control, Zhejiang University, 310027, Hangzhou, China
Yue Shi-hong, Li Ping & Zhou Shui-geng
Department of Mathematics, Yili Teacher's College, 835000, Yining, China
Guo Ji-dong

Authors

Yue Shi-hong
View author publications
You can also search for this author in PubMed Google Scholar
Li Ping
View author publications
You can also search for this author in PubMed Google Scholar
Guo Ji-dong
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Shui-geng
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Project (No. 2002AA412010-12) supported by the Hi-Tech Research and Development Program (863) of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi-hong, Y., Ping, L., Ji-dong, G. et al. A statistical information-based clustering approach in distance space. J. Zheijang Univ.-Sci. A 6, 71–78 (2005). https://doi.org/10.1631/BF02842480

Download citation

Received: 18 June 2003
Accepted: 12 October 2003
Published: 01 August 2005
Issue Date: August 2005
DOI: https://doi.org/10.1631/BF02842480

Key words

Document code

A

CLC number

TP391.41

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A statistical information-based clustering approach in distance space

Abstract

Access this article

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Document code

CLC number

Search

Navigation