Abstract
Proximity measure is used for data mining such as classification, cluster construction, regression, statistical analysis, analyzed and validated the mine results, and so on. Due to clustering, the proximity measure is based on cluster construction and cluster validation. Nowadays, growing of digital and communication technology is changing the nature of traditional data to big data. When any traditional taxonomy used for big data mining, it suffers various challenges due to big data dimensions. Proximity measure is one of the challenging issues of the clustering under big data criteria such as Volume, Variety, and Velocity. From a theoretically, practically and the existing research perspective, this paper study various proximity measure under Minkowski, L(1), L(2), Inner product, Shannon’s entropy, Combination, Intersection, and Fidelity family and recognized proximity measures for Volume (Dataset size), Variety (Data type), and Velocity (Time complexity) big data criteria. This study also identifies how to use these proximity measures for cluster construction under the Partition, Hierarchical, Density, Grid, Model, Fuzzy, and Graph-based clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rouhani S, Rotbei S, Hamidi H (2017) What do we know about the big data researches? A systematic review from 2011 to 2017. J Decis Syst 26(4):368–393. https://doi.org/10.1080/12460125.2018.1437654
Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Res 2(2):59–64. https://doi.org/10.1016/j.bdr.2015.01.006
Chen M, Mao S, Liu Y (2014) Big Data: a survey. Mobile Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0
Chen W, Oliverio J, Kim JH, Shen J (2018) The modeling and simulation of data clustering algorithms in data mining with big data. J Ind Integr Manage 4:1850017. https://doi.org/10.1142/s2424862218500173
Zhao X, Liang J, Dang C (2019) A stratified sampling based clustering algorithm for large-scale data. Knowl-Based Syst 163:416–428. https://doi.org/10.1016/j.knosys.2018.09.007
Pandove D, Goel S (2015) A comprehensive study on clustering approaches for big data mining. In: Proceedings of IEEE 2nd international conference on electronics and communication systems. IEEE Xplore Digital Library, pp 1333–1338. https://doi.org/10.1109/ecs.2015.7124801
Chen CP, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci 275:314–347. https://doi.org/10.1016/j.ins.2014.01.015
Amado A, Cortez P, Rita P, Moro S (2018) Research trends on Big Data in Marketing: a text mining and topic modeling based literature analysis. European Res Manage Bus Econ 24(1):1–7. https://doi.org/10.1016/j.iedeen.2017.06.002
Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303. https://doi.org/10.1016/j.bushor.2017.01.004
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manage 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Sivarajah U, Kamal MM, Irani Z, Weerakkody V (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286. https://doi.org/10.1016/j.jbusres.2016.08.001
Bendechache M, Tari A, Kechadi M (2018) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34:1–19. https://doi.org/10.1080/17445760.2018.1446210
Chen M, Mao S, Liu Y (2014) Big data a survey. Mob Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0
Gole S, Tidke B (2015) A survey of Big Data in social media using data mining techniques. Proc IEEE ICACCS. https://doi.org/10.1109/ICACCS.2015.7324059
Elgendy N, Elragal A (2014) Big data analytics a literature review paper. In: LNAI, vol 8557, pp 214–227. https://doi.org/10.1007/978-3-319-08976-8_16
Cha S (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 4(1):300–307. https://doi.org/10.1109/icpr.2000.906010
Lin Y, Jiang J, Lee S (2014) A similarity measure for text classification and clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590. https://doi.org/10.1109/tkde.2013.19
Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151. https://doi.org/10.1016/j.neucom.2016.12.007
Liu H, Zhang X, Zhang X, Cui Y (2017) Self-adapted mixture distance measure for clustering uncertain data. Knowl-Based Syst 126:33–47. https://doi.org/10.1016/j.knosys.2017.04.002
Weller-Fahy DJ, Borghetti BJ, Sodemann AA (2015) A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Commun Surv Tutor 17(1):70–91. https://doi.org/10.1109/comst.2014.2336610
Grant J, Hunter A (2017) Analysing inconsistent information using distance-based measures. Int J Approx Reason 89:3–26. https://doi.org/10.1016/j.ijar.2016.04.004
Merigó JM, Casanovas M, Zeng S (2014) Distance measures with heavy aggregation operators. Appl Math Model 38(13):3142–3153. https://doi.org/10.1016/j.apm.2013.11.036
Ikonomakis EK, Spyrou GM, Vrahatis MN (2019) Content driven clustering algorithm combining density and distance functions. Pattern Recogn 87:190–202. https://doi.org/10.1016/j.patcog.2018.10.007
Marcon E, Puech F (2017) A typology of distance-based measures of spatial concentration. Reg Sci Urban Econ 62:56–67. https://doi.org/10.1016/j.regsciurbeco.2016.10.004
Kocher M, Savoy J (2017) Distance measures in author profiling. Inf Process Manage 53(5):1103–1119. https://doi.org/10.1016/j.ipm.2017.04.004
Moghtadaiee V, Dempster AG (2015) Determining the best vector distance measure for use in location fingerprinting. Pervas Mobile Comput 23:59–79. https://doi.org/10.1016/j.pmcj.2014.11.002
Chim H, Deng X (2008) Efficient phrase-based document similarity for clustering. IEEE Trans Knowl Data Eng 20(9):1217–1229. https://doi.org/10.1109/tkde.2008.50
Wang X, Yu F, Pedrycz W (2016) An area-based shape distance measure of time series. Appl Soft Comput 48:650–659. https://doi.org/10.1016/j.asoc.2016.06.033
Ramya R, Sasikala T (2018) A comparative analysis of similarity distance measure functions for biocryptic authentication in cloud databases. Cluster Comput. https://doi.org/10.1007/s10586-017-1568-y
Abudalfa SI, Mikki M (2013) K-means algorithm with a novel distance measure. Turkish J Electr Eng Comput Sci 21:1665–1684. https://doi.org/10.3906/elk-1010-869
Nadler M, Smith EP (1993) Pattern recognition engineering. Wiley, New York. ISBN-13 978-0471622932
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. SIAM, Society for Industrial and Applied Mathematics, Philadelphia, PA
Everitt BS (2011) Cluster analysis, 5th edn. Wiley series in probability and statistics. Wiley, Southern Gate, Chichester, West Sussex, UK. ISBN 978-0-470-74991-3
Aggarwal CC, Reddy C (2014) Data clustering algorithms and applications. CRC Press, Taylor & Francis Group. ISBN 978-1-4665-5822-9
Manning CD, Raghavan P, SchĂĽtze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/tetc.2014.2330519
Gupta, S et al (2019) Tier application in multi-cloud databases to improve security and service availability. In: Handbook of research on cloud computing and big data applications in IoT. IGI Global, pp 82–93
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pandey, K.K., Shukla, D. (2021). Ability Study of Proximity Measure for Big Data Mining Context on Clustering. In: Purohit, S., Singh Jat, D., Poonia, R., Kumar, S., Hiranwal, S. (eds) Proceedings of International Conference on Communication and Computational Technologies. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-5077-5_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-5077-5_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5076-8
Online ISBN: 978-981-15-5077-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)