Skip to main content

VDMR-DBSCAN: Varied Density MapReduce DBSCAN

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9498))

Included in the following conference series:

Abstract

DBSCAN is a well-known density based clustering algorithm, which can discover clusters of different shapes and sizes along with outliers. However, it suffers from major drawbacks like high computational cost, inability to find varied density clusters and dependency on user provided input density parameters. To address these issues, we propose a novel density based clustering algorithm titled, VDMR-DBSCAN (Varied Density MapReduce DBSCAN), a scalable DBSCAN algorithm using MapReduce which can detect varied density clusters with automatic computation of input density parameters. VDMR-DBSCAN divides the data into small partitions which are parallely processed on Hadoop platform. Thereafter, density variations in a partition are analyzed statistically to divide the data into groups of similar density called Density level sets (DLS). Input density parameters are estimated for each DLS, later DBSCAN is applied on each DLS using its corresponding density parameters. Most importantly, we propose a novel merging technique, which merges the similar density clusters present in different partitions and produces meaningful and compact clusters of varied density. We experimented on large and small synthetic datasets which well confirms the efficacy of our algorithm in terms of scalability and ability to find varied density clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. IBM. http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

  2. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Amterdam (2011)

    MATH  Google Scholar 

  3. Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)

    Google Scholar 

  4. Birant, D., Kut, A.: ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data Knowl. Eng. 60(1), 208–221 (2007)

    Article  Google Scholar 

  5. Emre, C.M., Aslandogan, Y.A., Bergstresser, P.R.: Mining biomedical images with density-based clustering. In: 2005 International Conference on Information Technology: Coding and Computing, ITCC 2005, vol. 1. IEEE (2005)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Hadoop, W.T.: The Definitive Guide, 1st edn. OReilly Media Inc., Sebastopol (2009)

    Google Scholar 

  8. Ankerst, M., et al.: OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec. 28(2), 49–60 (1999)

    Article  Google Scholar 

  9. Xu, X., Jger, J., Kriegel, H.-P.: A fast parallel clustering algorithm for large spatial databases. In: Guo, Y., Grossman, R. (eds.) High Performance Data Mining, pp. 263–290. Springer US, London (2002)

    Chapter  Google Scholar 

  10. Uncu, O., et al.: GRIDBSCAN: grid density-based spatial clustering of applications with noise. In: 2006 IEEE International Conference on Systems, Man and Cybernetics, SMC 2006, vol. 4. IEEE (2006)

    Google Scholar 

  11. Liu, P., Zhou, D., Wu, N.: VDBSCAN: varied density based spatial clustering of applications with noise. In: 2007 International Conference on Service Systems and Service Management. IEEE (2007)

    Google Scholar 

  12. Mahran, S., Mahar, K.: Using grid for accelerating density-based clustering. In: 2008 8th IEEE International Conference on Computer and Information Technology, CIT 2008. IEEE (2008)

    Google Scholar 

  13. Xiong, Z., et al.: Multi-density DBSCAN algorithm based on density levels partitioning. J. Inf. Comput. Sci. 9(10), 2739–2749 (2012)

    Google Scholar 

  14. Dai, B.-R., Lin, I.: Efficient map/reduce-based DBSCAN algorithm with optimized data partition. In: 2012 IEEE 5th International Conference on Cloud Computing (CLOUD). IEEE (2012)

    Google Scholar 

  15. Gaede, V., Gnther, O.: Multidimensional access methods. ACM Comput. Surv. (CSUR) 30(2), 170–231 (1998)

    Article  Google Scholar 

  16. University of Eastern Finland. http://cs.joensuu.fi/sipu/datasets/

  17. He, Y., et al.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8(1), 83–99 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subrat Kumar Dash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bhardwaj, S., Dash, S.K. (2015). VDMR-DBSCAN: Varied Density MapReduce DBSCAN. In: Kumar, N., Bhatnagar, V. (eds) Big Data Analytics. BDA 2015. Lecture Notes in Computer Science(), vol 9498. Springer, Cham. https://doi.org/10.1007/978-3-319-27057-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27057-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27056-2

  • Online ISBN: 978-3-319-27057-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics