Cell-Based DBSCAN Algorithm Using Minimum Bounding Rectangle Criteria
The density-based spatial clustering of applications with noise (DBSCAN) algorithm has been well studied in database domains for clustering multi-dimensional data to extract arbitrary shape clusters. Recently, with the growing interest in big data and increasing diversification of data, the typical size and volume of databases have increased and data have increasingly become high-dimensional. Therefore, a large number of speed-up techniques for DBSCAN algorithms including exact and approximate approaches have been proposed. The fastest DBSCAN algorithm is the cell-based algorithm, which divides the whole data set into small cells. In this paper, we propose a novel exact version cell-based DBSCAN algorithm using minimum bounding rectangle (MBR) criteria. The connecting cells step is the most time-consuming step of the cell-based algorithm. The proposed algorithm can process the connecting cells step at high speed by using MBR criteria. We implemented the proposed cell-based DBSCAN algorithm and show that it outperforms the conventional one in high dimensions.
KeywordsDBSCAN Density-based clustering Cell-based DBSCAN algorithm Minimum bounding rectangle
This work was supported by JSPS KAKENHI Grant Number JP16J05403 and JP26330139, and the MIC/SCOPE #162308002.
- 1.Borah, B., Bhattacharyya, D.K.: An improved sampling-based DBSCAN for large spatial databases. In: 2004 Proceedings of International Conference on Intelligent Sensing and Information Processing, pp. 92–96 (2004)Google Scholar
- 2.Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD-1996, pp. 226–231 (1996)Google Scholar
- 3.Gan, J., Tao, Y.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data SIGMOD 2015, pp. 519–530 (2015)Google Scholar
- 4.Gunawan, A.: A faster algorithm for DBSCAN. Master’s thesis, Technische University Eindhoven (2013)Google Scholar
- 5.Liria, A.L.C.: Algorithms for processing of spatial queries using R-trees. The closest pairs query and its application on spatial databases. Ph.D. thesis, Department of Languages and Computation, University of Almeria (2002)Google Scholar
- 6.Liu, B.: A fast density-based clustering algorithm for large databases. In: 2006 International Conference on Machine Learning and Cybernetics, pp. 996–1000 (2006)Google Scholar
- 7.Mahran, S., Mahar, K.: Using grid for accelerating density-based clustering. In: 2008 8th IEEE International Conference on Computer and Information Technology, pp. 35–40 (2008)Google Scholar
- 8.Mai, S.T., Assent, I., Storgaard, M.: AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2016, pp. 1025–1034 (2016)Google Scholar
- 10.Tsai, C.F., Wu, C.T.: GF-DBSCAN: a new efficient and effective data clustering technique for large databases. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems Signal Processing MUSP 2009, pp. 231–236 (2009)Google Scholar