Skip to main content

Distributed DBSCAN Algorithm – Concept and Experimental Evaluation

  • Conference paper
  • First Online:
Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017 (CORES 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 578))

Included in the following conference series:

Abstract

One of the most popular clustering algorithm is DBSCAN, which is known to be efficient and highly resistant to noise. In this paper we propose its distributed implementation. Distributed computing is a very fast growing way of solving problems in big datasets using a multinode cluster, rather than parallelization in one computer. Using its features in proper way, can lead to higher performance and, what is probably more important, higher scalability. In order to show added value of this way of designing and implementing algorithms we compare our results with GPU parallelization. On the basis of the obtained results We formulate the propositions how to improve our solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cal, P., Woźniak, M.: Data preprocessing with GPU for DBSCAN algorithm. In: Proceedings of the 8th International Conference on Computer Recognition Systems, CORES 2013, pp. 793–801 (2013)

    Google Scholar 

  2. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press (1996)

    Google Scholar 

  3. Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7, 623–640 (1995)

    Article  Google Scholar 

  4. Zhu, X., Wu, X.: Class noise vs attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)

    Article  MATH  Google Scholar 

  5. Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. 38, 917–932 (2008)

    Article  Google Scholar 

  6. Brecheisen, S., Kriegel, H.-P., Pfeifle, M.: Parallel density-based clustering of complex objects. In: PAKDD 2006, pp. 179–188. Springer, Heidelberg (2006)

    Google Scholar 

  7. Li, H., Chen, M., Gao, X.: Parallel dbscan with priority r-tree. In: Information Management and Engineering (ICIME) (2010)

    Google Scholar 

  8. Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.-K., Manne, F., Choudhary, A.: A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 62:1–62:11. IEEE Computer Society Press, Los Alamitos (2012)

    Google Scholar 

  9. Xu, X., Jäger, J., Kriegel, H.-P.: A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 263–290 (1999)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004, pp. 137–150 (2004)

    Google Scholar 

  11. White, T.: Hadoop, The Definitive Guide. O’Reilly Media Inc. (2012)

    Google Scholar 

  12. Apache Hadoop Project, “Apache Hadoop" (2016). http://hadoop.apache.org/, Accessed December 2016

  13. Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analytics. O’Reilly Media, Incorporated (2015)

    Google Scholar 

  14. Spark, A.: Lightning-fast cluster computing, “Apache Spar” (2016). https://spark.apache.org/, Accessed December 2016

  15. Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 2 July 2014

    Google Scholar 

  16. Porwik, P., Doroz, R.: Self-adaptive biometric classifier working on the reduced dataset. In: 9th International Conference on Hybrid Artificial Intelligence Systems (HAIS), Salamanca. Spain Book Series, LNCS, vol. 8480, pp. 377–388 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Merk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Merk, A., Cal, P., Woźniak, M. (2018). Distributed DBSCAN Algorithm – Concept and Experimental Evaluation. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59162-9_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59161-2

  • Online ISBN: 978-3-319-59162-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics