Skip to main content

Ability Study of Proximity Measure for Big Data Mining Context on Clustering

  • Conference paper
  • First Online:
Proceedings of International Conference on Communication and Computational Technologies

Part of the book series: Algorithms for Intelligent Systems ((AIS))

  • 545 Accesses

Abstract

Proximity measure is used for data mining such as classification, cluster construction, regression, statistical analysis, analyzed and validated the mine results, and so on. Due to clustering, the proximity measure is based on cluster construction and cluster validation. Nowadays, growing of digital and communication technology is changing the nature of traditional data to big data. When any traditional taxonomy used for big data mining, it suffers various challenges due to big data dimensions. Proximity measure is one of the challenging issues of the clustering under big data criteria such as Volume, Variety, and Velocity. From a theoretically, practically and the existing research perspective, this paper study various proximity measure under Minkowski, L(1), L(2), Inner product, Shannon’s entropy, Combination, Intersection, and Fidelity family and recognized proximity measures for Volume (Dataset size), Variety (Data type), and Velocity (Time complexity) big data criteria. This study also identifies how to use these proximity measures for cluster construction under the Partition, Hierarchical, Density, Grid, Model, Fuzzy, and Graph-based clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rouhani S, Rotbei S, Hamidi H (2017) What do we know about the big data researches? A systematic review from 2011 to 2017. J Decis Syst 26(4):368–393. https://doi.org/10.1080/12460125.2018.1437654

    Article  Google Scholar 

  2. Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Res 2(2):59–64. https://doi.org/10.1016/j.bdr.2015.01.006

    Article  Google Scholar 

  3. Chen M, Mao S, Liu Y (2014) Big Data: a survey. Mobile Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0

    Article  Google Scholar 

  4. Chen W, Oliverio J, Kim JH, Shen J (2018) The modeling and simulation of data clustering algorithms in data mining with big data. J Ind Integr Manage 4:1850017. https://doi.org/10.1142/s2424862218500173

    Article  Google Scholar 

  5. Zhao X, Liang J, Dang C (2019) A stratified sampling based clustering algorithm for large-scale data. Knowl-Based Syst 163:416–428. https://doi.org/10.1016/j.knosys.2018.09.007

    Article  Google Scholar 

  6. Pandove D, Goel S (2015) A comprehensive study on clustering approaches for big data mining. In: Proceedings of IEEE 2nd international conference on electronics and communication systems. IEEE Xplore Digital Library, pp 1333–1338. https://doi.org/10.1109/ecs.2015.7124801

  7. Chen CP, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci 275:314–347. https://doi.org/10.1016/j.ins.2014.01.015

    Article  Google Scholar 

  8. Amado A, Cortez P, Rita P, Moro S (2018) Research trends on Big Data in Marketing: a text mining and topic modeling based literature analysis. European Res Manage Bus Econ 24(1):1–7. https://doi.org/10.1016/j.iedeen.2017.06.002

    Article  Google Scholar 

  9. Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303. https://doi.org/10.1016/j.bushor.2017.01.004

    Article  Google Scholar 

  10. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manage 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

    Article  Google Scholar 

  11. Sivarajah U, Kamal MM, Irani Z, Weerakkody V (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286. https://doi.org/10.1016/j.jbusres.2016.08.001

    Article  Google Scholar 

  12. Bendechache M, Tari A, Kechadi M (2018) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34:1–19. https://doi.org/10.1080/17445760.2018.1446210

    Article  Google Scholar 

  13. Chen M, Mao S, Liu Y (2014) Big data a survey. Mob Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0

  14. Gole S, Tidke B (2015) A survey of Big Data in social media using data mining techniques. Proc IEEE ICACCS. https://doi.org/10.1109/ICACCS.2015.7324059

    Article  Google Scholar 

  15. Elgendy N, Elragal A (2014) Big data analytics a literature review paper. In: LNAI, vol 8557, pp 214–227. https://doi.org/10.1007/978-3-319-08976-8_16

  16. Cha S (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 4(1):300–307. https://doi.org/10.1109/icpr.2000.906010

    Article  Google Scholar 

  17. Lin Y, Jiang J, Lee S (2014) A similarity measure for text classification and clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590. https://doi.org/10.1109/tkde.2013.19

    Article  Google Scholar 

  18. Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151. https://doi.org/10.1016/j.neucom.2016.12.007

    Article  Google Scholar 

  19. Liu H, Zhang X, Zhang X, Cui Y (2017) Self-adapted mixture distance measure for clustering uncertain data. Knowl-Based Syst 126:33–47. https://doi.org/10.1016/j.knosys.2017.04.002

    Article  Google Scholar 

  20. Weller-Fahy DJ, Borghetti BJ, Sodemann AA (2015) A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Commun Surv Tutor 17(1):70–91. https://doi.org/10.1109/comst.2014.2336610

    Article  Google Scholar 

  21. Grant J, Hunter A (2017) Analysing inconsistent information using distance-based measures. Int J Approx Reason 89:3–26. https://doi.org/10.1016/j.ijar.2016.04.004

    Article  MathSciNet  MATH  Google Scholar 

  22. Merigó JM, Casanovas M, Zeng S (2014) Distance measures with heavy aggregation operators. Appl Math Model 38(13):3142–3153. https://doi.org/10.1016/j.apm.2013.11.036

    Article  MathSciNet  MATH  Google Scholar 

  23. Ikonomakis EK, Spyrou GM, Vrahatis MN (2019) Content driven clustering algorithm combining density and distance functions. Pattern Recogn 87:190–202. https://doi.org/10.1016/j.patcog.2018.10.007

    Article  Google Scholar 

  24. Marcon E, Puech F (2017) A typology of distance-based measures of spatial concentration. Reg Sci Urban Econ 62:56–67. https://doi.org/10.1016/j.regsciurbeco.2016.10.004

    Article  Google Scholar 

  25. Kocher M, Savoy J (2017) Distance measures in author profiling. Inf Process Manage 53(5):1103–1119. https://doi.org/10.1016/j.ipm.2017.04.004

    Article  Google Scholar 

  26. Moghtadaiee V, Dempster AG (2015) Determining the best vector distance measure for use in location fingerprinting. Pervas Mobile Comput 23:59–79. https://doi.org/10.1016/j.pmcj.2014.11.002

    Article  Google Scholar 

  27. Chim H, Deng X (2008) Efficient phrase-based document similarity for clustering. IEEE Trans Knowl Data Eng 20(9):1217–1229. https://doi.org/10.1109/tkde.2008.50

    Article  Google Scholar 

  28. Wang X, Yu F, Pedrycz W (2016) An area-based shape distance measure of time series. Appl Soft Comput 48:650–659. https://doi.org/10.1016/j.asoc.2016.06.033

    Article  Google Scholar 

  29. Ramya R, Sasikala T (2018) A comparative analysis of similarity distance measure functions for biocryptic authentication in cloud databases. Cluster Comput. https://doi.org/10.1007/s10586-017-1568-y

    Article  Google Scholar 

  30. Abudalfa SI, Mikki M (2013) K-means algorithm with a novel distance measure. Turkish J Electr Eng Comput Sci 21:1665–1684. https://doi.org/10.3906/elk-1010-869

    Article  Google Scholar 

  31. Nadler M, Smith EP (1993) Pattern recognition engineering. Wiley, New York. ISBN-13 978-0471622932

    Google Scholar 

  32. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. SIAM, Society for Industrial and Applied Mathematics, Philadelphia, PA

    Book  Google Scholar 

  33. Everitt BS (2011) Cluster analysis, 5th edn. Wiley series in probability and statistics. Wiley, Southern Gate, Chichester, West Sussex, UK. ISBN 978-0-470-74991-3

    Google Scholar 

  34. Aggarwal CC, Reddy C (2014) Data clustering algorithms and applications. CRC Press, Taylor & Francis Group. ISBN 978-1-4665-5822-9

    Google Scholar 

  35. Manning CD, Raghavan P, SchĂĽtze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  36. Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/tetc.2014.2330519

    Article  Google Scholar 

  37. Gupta, S et al (2019) Tier application in multi-cloud databases to improve security and service availability. In: Handbook of research on cloud computing and big data applications in IoT. IGI Global, pp 82–93

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamlesh Kumar Pandey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pandey, K.K., Shukla, D. (2021). Ability Study of Proximity Measure for Big Data Mining Context on Clustering. In: Purohit, S., Singh Jat, D., Poonia, R., Kumar, S., Hiranwal, S. (eds) Proceedings of International Conference on Communication and Computational Technologies. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-5077-5_1

Download citation

Publish with us

Policies and ethics