Advertisement

Distributed DBSCAN Algorithm – Concept and Experimental Evaluation

  • Adam MerkEmail author
  • Piotr Cal
  • Michał Woźniak
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 578)

Abstract

One of the most popular clustering algorithm is DBSCAN, which is known to be efficient and highly resistant to noise. In this paper we propose its distributed implementation. Distributed computing is a very fast growing way of solving problems in big datasets using a multinode cluster, rather than parallelization in one computer. Using its features in proper way, can lead to higher performance and, what is probably more important, higher scalability. In order to show added value of this way of designing and implementing algorithms we compare our results with GPU parallelization. On the basis of the obtained results We formulate the propositions how to improve our solution.

Keywords

Distributed computing Clustering Unsupervised learning Big data 

References

  1. 1.
    Cal, P., Woźniak, M.: Data preprocessing with GPU for DBSCAN algorithm. In: Proceedings of the 8th International Conference on Computer Recognition Systems, CORES 2013, pp. 793–801 (2013)Google Scholar
  2. 2.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press (1996)Google Scholar
  3. 3.
    Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7, 623–640 (1995)CrossRefGoogle Scholar
  4. 4.
    Zhu, X., Wu, X.: Class noise vs attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)CrossRefzbMATHGoogle Scholar
  5. 5.
    Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. 38, 917–932 (2008)CrossRefGoogle Scholar
  6. 6.
    Brecheisen, S., Kriegel, H.-P., Pfeifle, M.: Parallel density-based clustering of complex objects. In: PAKDD 2006, pp. 179–188. Springer, Heidelberg (2006)Google Scholar
  7. 7.
    Li, H., Chen, M., Gao, X.: Parallel dbscan with priority r-tree. In: Information Management and Engineering (ICIME) (2010)Google Scholar
  8. 8.
    Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.-K., Manne, F., Choudhary, A.: A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 62:1–62:11. IEEE Computer Society Press, Los Alamitos (2012)Google Scholar
  9. 9.
    Xu, X., Jäger, J., Kriegel, H.-P.: A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 263–290 (1999)Google Scholar
  10. 10.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004, pp. 137–150 (2004)Google Scholar
  11. 11.
    White, T.: Hadoop, The Definitive Guide. O’Reilly Media Inc. (2012)Google Scholar
  12. 12.
    Apache Hadoop Project, “Apache Hadoop" (2016). http://hadoop.apache.org/, Accessed December 2016
  13. 13.
    Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analytics. O’Reilly Media, Incorporated (2015)Google Scholar
  14. 14.
    Spark, A.: Lightning-fast cluster computing, “Apache Spar” (2016). https://spark.apache.org/, Accessed December 2016
  15. 15.
    Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 2 July 2014Google Scholar
  16. 16.
    Porwik, P., Doroz, R.: Self-adaptive biometric classifier working on the reduced dataset. In: 9th International Conference on Hybrid Artificial Intelligence Systems (HAIS), Salamanca. Spain Book Series, LNCS, vol. 8480, pp. 377–388 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Systems and Computer NetworksWrocław University of Science and TechnologyWrocławPoland

Personalised recommendations