Cell-Based DBSCAN Algorithm Using Minimum Bounding Rectangle Criteria

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10179)

Abstract

The density-based spatial clustering of applications with noise (DBSCAN) algorithm has been well studied in database domains for clustering multi-dimensional data to extract arbitrary shape clusters. Recently, with the growing interest in big data and increasing diversification of data, the typical size and volume of databases have increased and data have increasingly become high-dimensional. Therefore, a large number of speed-up techniques for DBSCAN algorithms including exact and approximate approaches have been proposed. The fastest DBSCAN algorithm is the cell-based algorithm, which divides the whole data set into small cells. In this paper, we propose a novel exact version cell-based DBSCAN algorithm using minimum bounding rectangle (MBR) criteria. The connecting cells step is the most time-consuming step of the cell-based algorithm. The proposed algorithm can process the connecting cells step at high speed by using MBR criteria. We implemented the proposed cell-based DBSCAN algorithm and show that it outperforms the conventional one in high dimensions.

Keywords

DBSCAN Density-based clustering Cell-based DBSCAN algorithm Minimum bounding rectangle 

References

  1. 1.
    Borah, B., Bhattacharyya, D.K.: An improved sampling-based DBSCAN for large spatial databases. In: 2004 Proceedings of International Conference on Intelligent Sensing and Information Processing, pp. 92–96 (2004)Google Scholar
  2. 2.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD-1996, pp. 226–231 (1996)Google Scholar
  3. 3.
    Gan, J., Tao, Y.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data SIGMOD 2015, pp. 519–530 (2015)Google Scholar
  4. 4.
    Gunawan, A.: A faster algorithm for DBSCAN. Master’s thesis, Technische University Eindhoven (2013)Google Scholar
  5. 5.
    Liria, A.L.C.: Algorithms for processing of spatial queries using R-trees. The closest pairs query and its application on spatial databases. Ph.D. thesis, Department of Languages and Computation, University of Almeria (2002)Google Scholar
  6. 6.
    Liu, B.: A fast density-based clustering algorithm for large databases. In: 2006 International Conference on Machine Learning and Cybernetics, pp. 996–1000 (2006)Google Scholar
  7. 7.
    Mahran, S., Mahar, K.: Using grid for accelerating density-based clustering. In: 2008 8th IEEE International Conference on Computer and Information Technology, pp. 35–40 (2008)Google Scholar
  8. 8.
    Mai, S.T., Assent, I., Storgaard, M.: AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2016, pp. 1025–1034 (2016)Google Scholar
  9. 9.
    Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)CrossRefGoogle Scholar
  10. 10.
    Tsai, C.F., Wu, C.T.: GF-DBSCAN: a new efficient and effective data clustering technique for large databases. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems Signal Processing MUSP 2009, pp. 231–236 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Tatsuhiro Sakai
    • 1
    • 2
  • Keiichi Tamura
    • 1
  • Hajime Kitakami
    • 1
  1. 1.Graduate School of Information SciencesHiroshima City UniversityHiroshimaJapan
  2. 2.JSPSTokyoJapan

Personalised recommendations