Abstract
Clustering data has been an important task in data analysis for years as it is now. The de facto standard algorithm for density-based clustering today is DBSCAN. The main drawback of this algorithm is the need to tune its two parameters ε and minPts. In this paper we explore the possibilities and limits of two novel different clustering algorithms. Both require just one DBSCAN-like parameter. Still they perform well on benchmark data sets. Our first approach just uses a parameter similar to DBSCAN’s minPts parameter that is used to incrementally find protoclusters which are eventually merged while discarding those that are too sparse. Our second approach only uses a local density without any minimum number of points to be specified. It estimates clusters by seeing them from spectators watching the data points at different angles. Both algorithms lead to results comparable to DBSCAN. Our first approach yields similar results to DBSCAN while being able to assign multiple cluster labels to a points while the second approach works significantly faster than DBSCAN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York
Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of the fifteenth international conference on machine learning (ICML), vol 98, pp 91–99
Braune C, Borgelt C, Grün S (2012) Assembly detection in continuous neural spike train data. In: Advances in intelligent data analysis XI. Springer, Berlin, Heidelberg, pp 78–89
Braune C, Borgelt C, Kruse R (2013) Behavioral clustering for point processes. In: Advances in intelligent data analysis XII. Springer, Berlin, Heidelberg, pp 127–137
Bridges CC Jr (1966) Hierarchical cluster analysis. Psychol Rep 18(3):851–854
Celebi ME, Kingravi H (2012) Deterministic initialization of the K-means algorithm using hierarchical clustering. Int J Pattern Recognit Artif Intell 26(7):1250018
Celebi ME, Kingravi H, Vela PA (2013) A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst Appl 40(1):200–2010
Döring C, Lesot MJ, Kruse R (2006) Data analysis with fuzzy clustering methods. Comput Stat Data Anal 51(1):192–214
Esmaelnejad J, Habibi J, Yeganeh SH (2010) A novel method to find appropriate ε for DBSCAN. In: Intelligent information and database systems. Springer, Berlin, Heidelberg, pp 93–102
Ester M, Kriegel HP, Sander J, Xu X (1966) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining (KDD), vol 96, pp 226–231
Gerstein, GL, Perkel DH (1969) Simultaneously recorded trains of action potentials: analysis and functional interpretation. Science 164(3881):828–830
Hall LO, Bensaid AM, Clarke LP, Velthuizen RP, Silbiger MS, Bezdek JC (1992) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans Neural Netw 3(5):672–682
Hentschel C, Stober S, Nürnberger A, Detyniecki M (2008) Automatic image annotation using a visual dictionary based on reliable image segmentation. In: Adaptive multimedia retrieval: retrieval, user, and semantics. Springer, Berlin, Heidelberg, pp 45–56
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Jing L, Ng MK, Xu J, Huang JZ (2005) Subspace clustering of text documents with feature weighting k-means algorithm. In: Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 802–812
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. ACM SIGMOD Record 26(2):440–447
Krinidis S, Chatzis V (2010) A robust fuzzy local information C-means clustering algorithm. IEEE Trans Image Process 19(5):1328–1337
Kruse R, Borgelt C, Klawonn F, Moewes C, Steinbrecher M, Held P (2013) Computational intelligence: a methodological introduction. Springer, Berlin
Li Y, Luo C, Chung SM (2008) Text clustering with feature selection by using statistical data. IEEE Trans Knowl Data Eng 20(5):641–652
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 281–297
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Perkel DH, Gerstein GL, Moore GP (1967) Neuronal spike trains and stochastic point processes: II. Simultaneous spike trains. Biophys J 7(4):419–440
Sugar CA, James GM (2003) Finding the number of clusters in a dataset. J Am Stat Assoc 98(463):750–763
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1.1:67–82
Zhang X, Jiao L, Liu F, Bo L, Gong M (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Braune, C., Besecke, S., Kruse, R. (2015). Density Based Clustering: Alternatives to DBSCAN. In: Celebi, M. (eds) Partitional Clustering Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-09259-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-09259-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09258-4
Online ISBN: 978-3-319-09259-1
eBook Packages: EngineeringEngineering (R0)