Abstract
Mean shift is a nonparametric clustering technique that does not require the number of clusters in input and can find clusters of arbitrary shapes. While appealing, the performance of the mean shift algorithm is sensitive to the selection of the bandwidth, and can fail to capture the correct clustering structure when multiple modes exist in one cluster. DBSCAN is an efficient density based clustering algorithm, but it is also sensitive to its parameters and typically merges overlapping clusters. In this paper we propose Boosted Mean Shift Clustering (BMSC) to address these issues. BMSC partitions the data across a grid and applies mean shift locally on the cells of the grid, each providing a number of intermediate modes (iModes). A mode-boosting technique is proposed to select points in denser regions iteratively, and DBSCAN is utilized to partition the obtained iModes iteratively. Our proposed BMSC can overcome the limitations of mean shift and DBSCAN, while preserving their desirable properties. Complexity analysis shows its potential to deal with large-scale data and extensive experimental results on both synthetic and real benchmark data demonstrate its effectiveness and robustness to parameter settings.
Keywords
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ankerst, M., Breunig, M.M., Peter Kriegel, H., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60. ACM Press (1999)
Avidan, S.: Ensemble tracking. TPAMI 29(2), 261–271 (2007)
Cheng, Y.: Mean shift, mode seeking, and clustering. TPAMI 17(8), 790–799 (1995)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. TPAMI 24(5), 603–619 (2002)
Comaniciu, D., Ramesh, V., Meer, P.: The variable bandwidth mean shift and data-driven scale selection. In: ICCV, pp. 438–445 (2001)
Daszykowski, M., Walczak, B., Massart, D.: Looking for natural patterns in analytical data. Part 2: Tracing local density with OPTICS. Journal of Chemical Information and Computer Sciences 42(3), 500–507 (2002)
Daszykowski, M., Walczak, B., Massart, D.L.: Looking for natural patterns in data. Part 1: Density based approach. Chemometrics and Intelligent Laboratory Systems 56(2), 83–92 (2001)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journals of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. DMKD 14(1), 63–97 (2007)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Fukunaga, K., Hostetler, L.D.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory 21(1), 32–40 (1975)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. TKDD 1(1), 1–30 (2007)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Kamath, U., Domeniconi, C., Jong, K.A.D.: An analysis of a spatial ea parallel boosting algorithm. In: GECCO, pp. 1053–1060 (2013)
Kamath, U., Kaers, J., Shehu, A., De Jong, K.A.: A spatial EA framework for parallelizing machine learning methods. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) PPSN 2012, Part I. LNCS, vol. 7491, pp. 206–215. Springer, Heidelberg (2012)
Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids. Faculty of Mathematics and Informatics (1987)
Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering Homepage (2011), http://www.cs.ucr.edu/~eamonn/time_series_data/
Kriegel, H.-P., Kroger, P., Sander, J., Zimek, A.: Density-based clustering. DMKD 1(3), 231–240 (2011)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1967)
Sarma, J., Jong, K.: An analysis of the effects of neighborhood size and shape on local selection algorithms. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 236–244. Springer, Heidelberg (1996)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. JMLR 3, 583–617 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ren, Y., Kamath, U., Domeniconi, C., Zhang, G. (2014). Boosted Mean Shift Clustering. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44851-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-662-44851-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44850-2
Online ISBN: 978-3-662-44851-9
eBook Packages: Computer ScienceComputer Science (R0)