Advertisement

Hashing-Based Approximate DBSCAN

  • Tianrun Li
  • Thomas HeinisEmail author
  • Wayne Luk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9809)

Abstract

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time.

Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.

Keywords

Execution Time Distance Calculation Range Query Query Point Cell Width 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Adaszewski, S., Dukart, J., Kherif, F., Frackowiak, R., Draganski, B.: How early can we predict Alzheimer’s disease using computational anatomy? Neurobiol. Aging 34(12), 2815–2826 (2013)CrossRefGoogle Scholar
  2. 2.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  3. 3.
    Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999 (1999)Google Scholar
  4. 4.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)Google Scholar
  5. 5.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Borah, B., Bhattacharyya, D.: An improved sampling-based DBSCAN for large spatial databases. In: Conference on Intelligent Sensing and Information Processing (2004)Google Scholar
  7. 7.
    Chen, M.-S., Han, J., Yu, P.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)CrossRefGoogle Scholar
  8. 8.
    Collins, L.M., Dent, C.W.: Omega: a general formulation of the rand index of cluster recoverysuitable for non-disjoint solutions. Multivar. Behav. Res. 23(2), 231–242 (1988)CrossRefGoogle Scholar
  9. 9.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 2004 (2004)Google Scholar
  10. 10.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and and Data Mining (1996)Google Scholar
  11. 11.
    Gan, J., Tao, Y.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: SIGMOD 2015 (2015)Google Scholar
  12. 12.
    Gunawan, A.: A faster algorithm for DBSCAN. Master’s thesis, Technical University of Eindhoven, March 2013Google Scholar
  13. 13.
    Patwary, M., Ali, M., Satish, N., Sundaram, N., Manne, F., Habib, S., Dubey, P.: Pardicle: parallel approximate density-based clustering. In: Supercomputing 2014 (2014)Google Scholar
  14. 14.
    Viswanath, P., Pinkesh, R.: l-DBSCAN: a fast hybrid density based clustering method. In: Proceedings of the Conference on Pattern Recognition (2006)Google Scholar
  15. 15.
    Yeganeh, S., Habibi, J., Abolhassani, H., Tehrani, M., Esmaelnezhad, J.: An approximation algorithm for finding skeletal points for density based clustering approaches. In: Symposium on Computational Intelligence and Data Mining (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Tsinghua UniversityBeijingChina
  2. 2.Imperial College LondonLondonUK

Personalised recommendations