Ability Study of Proximity Measure for Big Data Mining Context on Clustering

Pandey, Kamlesh Kumar; Shukla, Diwakar

doi:10.1007/978-981-15-5077-5_1

Kamlesh Kumar Pandey⁹ &
Diwakar Shukla⁹

Part of the book series: Algorithms for Intelligent Systems ((AIS))

545 Accesses

Abstract

Proximity measure is used for data mining such as classification, cluster construction, regression, statistical analysis, analyzed and validated the mine results, and so on. Due to clustering, the proximity measure is based on cluster construction and cluster validation. Nowadays, growing of digital and communication technology is changing the nature of traditional data to big data. When any traditional taxonomy used for big data mining, it suffers various challenges due to big data dimensions. Proximity measure is one of the challenging issues of the clustering under big data criteria such as Volume, Variety, and Velocity. From a theoretically, practically and the existing research perspective, this paper study various proximity measure under Minkowski, L(1), L(2), Inner product, Shannon’s entropy, Combination, Intersection, and Fidelity family and recognized proximity measures for Volume (Dataset size), Variety (Data type), and Velocity (Time complexity) big data criteria. This study also identifies how to use these proximity measures for cluster construction under the Partition, Hierarchical, Density, Grid, Model, Fuzzy, and Graph-based clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analytical review of clustering techniques and proximity measures

Article 02 May 2020

Spectral Clustering: An Explorative Study of Proximity Measures

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

References

Rouhani S, Rotbei S, Hamidi H (2017) What do we know about the big data researches? A systematic review from 2011 to 2017. J Decis Syst 26(4):368–393. https://doi.org/10.1080/12460125.2018.1437654
Article Google Scholar
Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Res 2(2):59–64. https://doi.org/10.1016/j.bdr.2015.01.006
Article Google Scholar
Chen M, Mao S, Liu Y (2014) Big Data: a survey. Mobile Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0
Article Google Scholar
Chen W, Oliverio J, Kim JH, Shen J (2018) The modeling and simulation of data clustering algorithms in data mining with big data. J Ind Integr Manage 4:1850017. https://doi.org/10.1142/s2424862218500173
Article Google Scholar
Zhao X, Liang J, Dang C (2019) A stratified sampling based clustering algorithm for large-scale data. Knowl-Based Syst 163:416–428. https://doi.org/10.1016/j.knosys.2018.09.007
Article Google Scholar
Pandove D, Goel S (2015) A comprehensive study on clustering approaches for big data mining. In: Proceedings of IEEE 2nd international conference on electronics and communication systems. IEEE Xplore Digital Library, pp 1333–1338. https://doi.org/10.1109/ecs.2015.7124801
Chen CP, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci 275:314–347. https://doi.org/10.1016/j.ins.2014.01.015
Article Google Scholar
Amado A, Cortez P, Rita P, Moro S (2018) Research trends on Big Data in Marketing: a text mining and topic modeling based literature analysis. European Res Manage Bus Econ 24(1):1–7. https://doi.org/10.1016/j.iedeen.2017.06.002
Article Google Scholar
Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303. https://doi.org/10.1016/j.bushor.2017.01.004
Article Google Scholar
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manage 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Article Google Scholar
Sivarajah U, Kamal MM, Irani Z, Weerakkody V (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286. https://doi.org/10.1016/j.jbusres.2016.08.001
Article Google Scholar
Bendechache M, Tari A, Kechadi M (2018) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34:1–19. https://doi.org/10.1080/17445760.2018.1446210
Article Google Scholar
Chen M, Mao S, Liu Y (2014) Big data a survey. Mob Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0
Gole S, Tidke B (2015) A survey of Big Data in social media using data mining techniques. Proc IEEE ICACCS. https://doi.org/10.1109/ICACCS.2015.7324059
Article Google Scholar
Elgendy N, Elragal A (2014) Big data analytics a literature review paper. In: LNAI, vol 8557, pp 214–227. https://doi.org/10.1007/978-3-319-08976-8_16
Cha S (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 4(1):300–307. https://doi.org/10.1109/icpr.2000.906010
Article Google Scholar
Lin Y, Jiang J, Lee S (2014) A similarity measure for text classification and clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590. https://doi.org/10.1109/tkde.2013.19
Article Google Scholar
Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151. https://doi.org/10.1016/j.neucom.2016.12.007
Article Google Scholar
Liu H, Zhang X, Zhang X, Cui Y (2017) Self-adapted mixture distance measure for clustering uncertain data. Knowl-Based Syst 126:33–47. https://doi.org/10.1016/j.knosys.2017.04.002
Article Google Scholar
Weller-Fahy DJ, Borghetti BJ, Sodemann AA (2015) A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Commun Surv Tutor 17(1):70–91. https://doi.org/10.1109/comst.2014.2336610
Article Google Scholar
Grant J, Hunter A (2017) Analysing inconsistent information using distance-based measures. Int J Approx Reason 89:3–26. https://doi.org/10.1016/j.ijar.2016.04.004
Article MathSciNet MATH Google Scholar
Merigó JM, Casanovas M, Zeng S (2014) Distance measures with heavy aggregation operators. Appl Math Model 38(13):3142–3153. https://doi.org/10.1016/j.apm.2013.11.036
Article MathSciNet MATH Google Scholar
Ikonomakis EK, Spyrou GM, Vrahatis MN (2019) Content driven clustering algorithm combining density and distance functions. Pattern Recogn 87:190–202. https://doi.org/10.1016/j.patcog.2018.10.007
Article Google Scholar
Marcon E, Puech F (2017) A typology of distance-based measures of spatial concentration. Reg Sci Urban Econ 62:56–67. https://doi.org/10.1016/j.regsciurbeco.2016.10.004
Article Google Scholar
Kocher M, Savoy J (2017) Distance measures in author profiling. Inf Process Manage 53(5):1103–1119. https://doi.org/10.1016/j.ipm.2017.04.004
Article Google Scholar
Moghtadaiee V, Dempster AG (2015) Determining the best vector distance measure for use in location fingerprinting. Pervas Mobile Comput 23:59–79. https://doi.org/10.1016/j.pmcj.2014.11.002
Article Google Scholar
Chim H, Deng X (2008) Efficient phrase-based document similarity for clustering. IEEE Trans Knowl Data Eng 20(9):1217–1229. https://doi.org/10.1109/tkde.2008.50
Article Google Scholar
Wang X, Yu F, Pedrycz W (2016) An area-based shape distance measure of time series. Appl Soft Comput 48:650–659. https://doi.org/10.1016/j.asoc.2016.06.033
Article Google Scholar
Ramya R, Sasikala T (2018) A comparative analysis of similarity distance measure functions for biocryptic authentication in cloud databases. Cluster Comput. https://doi.org/10.1007/s10586-017-1568-y
Article Google Scholar
Abudalfa SI, Mikki M (2013) K-means algorithm with a novel distance measure. Turkish J Electr Eng Comput Sci 21:1665–1684. https://doi.org/10.3906/elk-1010-869
Article Google Scholar
Nadler M, Smith EP (1993) Pattern recognition engineering. Wiley, New York. ISBN-13 978-0471622932
Google Scholar
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. SIAM, Society for Industrial and Applied Mathematics, Philadelphia, PA
Book Google Scholar
Everitt BS (2011) Cluster analysis, 5th edn. Wiley series in probability and statistics. Wiley, Southern Gate, Chichester, West Sussex, UK. ISBN 978-0-470-74991-3
Google Scholar
Aggarwal CC, Reddy C (2014) Data clustering algorithms and applications. CRC Press, Taylor & Francis Group. ISBN 978-1-4665-5822-9
Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book Google Scholar
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/tetc.2014.2330519
Article Google Scholar
Gupta, S et al (2019) Tier application in multi-cloud databases to improve security and service availability. In: Handbook of research on cloud computing and big data applications in IoT. IGI Global, pp 82–93
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Applications, Dr. Hari Singh Gour Vishwavidyalaya, Sagar, MP, India
Kamlesh Kumar Pandey & Diwakar Shukla

Authors

Kamlesh Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Diwakar Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamlesh Kumar Pandey .

Editor information

Editors and Affiliations

Rajasthan Technical University, Kota, Rajasthan, India
Sunil Dutt Purohit
Department of Computer Science, University of Science and Technology, Namibia, Windhoek, Namibia
Dharm Singh Jat
Department of ICT and Natural Sciences, Norwegian University of Science and Technology, Alesund, Norway
Ramesh Chandra Poonia
Department of Computer Science, Amity University, Jaipur, Rajasthan, India
Sandeep Kumar
Rajasthan Institute of Engineering and Technology, Jaipur, Rajasthan, India
Saroj Hiranwal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandey, K.K., Shukla, D. (2021). Ability Study of Proximity Measure for Big Data Mining Context on Clustering. In: Purohit, S., Singh Jat, D., Poonia, R., Kumar, S., Hiranwal, S. (eds) Proceedings of International Conference on Communication and Computational Technologies. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-5077-5_1

Download citation

DOI: https://doi.org/10.1007/978-981-15-5077-5_1
Published: 28 August 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5076-8
Online ISBN: 978-981-15-5077-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Ability Study of Proximity Measure for Big Data Mining Context on Clustering

Abstract

Access this chapter

Similar content being viewed by others

Analytical review of clustering techniques and proximity measures

Spectral Clustering: An Explorative Study of Proximity Measures

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Ability Study of Proximity Measure for Big Data Mining Context on Clustering

Abstract

Access this chapter

Similar content being viewed by others

Analytical review of clustering techniques and proximity measures

Spectral Clustering: An Explorative Study of Proximity Measures

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation