Abstract
In this paper, we propose a novel density-grid-based method for clustering k-dimensional data. KIDS, an acronym for K-dimensional Ink Drop Spread, detects densely-connected pieces of data in k-dimensional grids. It enables one to simultaneously exploit the advantages of fuzzy logic, as well as both density-based and grid-based clustering. In the proposed method, the k-dimensional data space is divided into different cells. Input data records are mapped to the cells. The data points are then spread in the k-dimensional cells, just like what happens to ink drops in water. So the cells adjacent to the data cells also represent the data. Eventually, the impacts of all data grid cells are condensed and compared with the threshold to compute the final clusters. The experimental results show that the method has superior quality and efficiency in both low and high dimensions. In addition, the method is not only robust to noise but it is also capable of finding clusters of arbitrary shapes.
Similar content being viewed by others
Notes
The probability of data being distributed randomly while we mistakenly consider the data as groups that form clusters.
References
Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, CRC Press
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. The Morgan Kaufmann Series in Data Management Systems, Elsevier
Berkhin P (2006) Grouping multidimensional data, 1st edn, Springer, Berlin, Heidelberg, chap a survey of clustering data mining techniques. https://doi.org/10.1007/3-540-28349-8_2
Alam A, Muqeem M, Ahmad S (2021) Comprehensive review on clustering techniques and its application on high dimensional data. Int J Compt Sci Netw Secur 21(6):237–244
Sumathi A, Yasotha K, Nandhinidevi S (2021) High dimensional deep data clustering architecture towards evolving concept. Nat Volatiles Essent Oils 8(5):1695–1703
Boonchoo T, Ao X, Liu Y, Zhao W, Zhuang F, He Q (2019) Grid-based dbscan: Indexing and inference. Pattern Recogn 90:271–284
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16 (3):645–678
Jia C, Tan C, Yong A (2008) A grid and density-Based clustering algorithm for processing data stream. In: Proc, the 2nd Int Conf Genetic and Evolutionary Computing, pp 517–521
Tu L, Chen Y (2009) Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery Data 3(3)
Amini A, Wah T, Saboohi H (2014) On density-Based data streams clustering algorithms: a survey. J Comput Sci Technol 29(1):116–141
Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Density-Based clustering of data streams at multiple resolutions. ACM Transactions on Knowledge Discovery Data 3(3)
Ren J, Cai B, Hu C (2011) Clustering over data streams based on grid density and index tree. J Converg Inf Technol 6(1):83–93
Javadian M, Shouraki SB (2017) UALM: Unsupervised active learning method For clustering low-dimensional data. J Intell Fuzzy Syst 32:2393–2411
Javadian M, Shouraki SB, Sheikhpour S (2017) A novel density-based fuzzy clustering algorithm for low dimensional feature space. Fuzzy Sets Syst 318:34–55
Ghasemi V, Javadian M, Bagheri Shouraki S (2020) High-dimensional unsupervised active learning method. J AI Data Mining 8(3):391–407
Ester M, Kriegel H, Sander J, Xu X (1996) A density-Based algorithm for discovering clusters in large spatial databases with noise. In: Proc 2nd International conference on knowledge discovery and data mining, pp 226–231
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. Proc ACM SIGMOD’99 Int Conf on Management of Data 28(2):49–60
Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining
Bordogna G, Ienco D (2014) Fuzzy core dbscan clustering algorithm. Int Conf Infor Process Mgmt Uncertainty Knowledge-Based Syst 444:100–109
Smiti A, Eloudi Z (2013) Soft DBSCAN: Improving DBSCAN clustering method using fuzzy set theory. 2013 6th International Conference on Human System Interactions (HSI)
Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Infor 13(4):1620–1628
Nayak J, Naik B, Behera H (2015) Computational Intelligence in Data Mining, vol 2, Springer, chap Fuzzy C-means (FCM) clustering algorithm:, a decade review from 2000 to 2014
Nasibov EN, Ulutagay G (2007) A new unsupervised approach for fuzzy clustering. Fuzzy Sets Syst 158:2118–2133
Nasibov E, Ulutagay G (2008) FN-DBSCAN: A novel Density-Based clustering method with fuzzy neighborhood relations. In: Proceedings of 8th international conference application of fuzzy systems and soft computing (ICAFS-2008), pp 101–110
Kriegel H, Pfeifle M (2005) Density-based Clustering of Uncertain Data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp 672–677
Wu L, Liu Z, Zang Z, Xia J, Li S, Li SZ (2020) Deep clustering and representation learning that preserves geometric structures. CoRR arXiv:2009.09590
Rebuffi S, Ehrhardt S, Han K, Vedaldi A, Zisserman A (2020) LSD-C: linearly separable deep clusters. CoRR arXiv:2006.10039
Ghazizadeh-Ahsaee M, Shamsadini-Farsangi A (2020) Developing of a new hybrid clustering algorithm based on density. 2020 6th International Conference on Web Research (ICWR). https://doi.org/10.1109/ICWR49608.2020.9122309
Dong S, Liu J, Liu Y, Zeng L, Xu C, Zhou T (2018) Clustering based on grid and local density with priority-based expansion for multi-density data. Inf Sci 468:103–116
Cai L, Wang H, Jiang F, Zhang Y, Peng Y (2022) A new clustering mining algorithm for multi-source imbalanced location data. Inf Sci 584:50–64
Shouraki SB (2000) A novel fuzzy approach to modeling and control and its hardware implementation based on brain functionality and specifications. PhD thesis The University of Electro-Communications, Chofu, Japan
Javadian M, Malekzadeh A, Heydari G, Shouraki SB (2020) A clustering fuzzification algorithm based on ALM. Fuzzy Sets Syst 389:93–113
Murakami M, Honda N (2007) A study on the modeling ability of the IDS method: a soft computing technique using pattern-based information processing. Int J Approx Reas 45:470–487
Klidbary SH, Shouraki S (2018) Linares-barranco Digital hardware realization of a novel adaptive ink drop spread operator and its application in modeling and classification and on-chip training. International Journal of Machine Learning and Cybernetics
Iranmehr E, Shouraki S, Faraji M, Bagheri N, Linares-Barranco B (2019) Bio-Inspired Evolutionary model of spiking neural networks in ionic liquid space. Frontiers in neuroscience, Neuromorphic Engineering
Hosseini S (2019) Neutron spectroscopy with soft computing: unfolding of the neutron energy spectrum using the developed computer code based on Adaptive Group of Ink Drop Spread (AGIDS). Journal of Instrumentation
Papoulis A, Pillai SU (2002) Probability, Random Variables and Stochastic Processes, 4th edn. McGraw-Hill Europe
Kudelić R (2016) Monte-carlo randomized algorithm for minimal feedback arc set problem. Appl Soft Comput 41:235–246
Zaki MJ, Jr WM (2014) Data Mining and analysis: fundamental concepts and algorithms. Cambridge University Press
Balcan MF, Dick T, Liang Y, Mou W, Zhang H (2017) Differentially private clustering in high-dimensional Euclidean spaces. In: Proceedings of the 34th International conference on machine learning, vol 70, pp 322–331
Blake CL, Merz CJ (1998) Uci repository of machine learning databases
Shah S, Koltun V (2017) Robust continuous clustering. In: Proceedings of the National Academy of Sciences, vol 114, pp 9814–9819
Fränti P, Virmajoki O, Hautamäki V (2006) Fast agglomerative clustering using a k-Nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881
Acknowledgements
The authors would like to thank Nasim Bagheri for her generous contribution to English editing and proofreading of the paper. This work was supported by the INFS (Iran National Science Foundation) Grant number 98011279.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kashani, E.S., Bagheri Shouraki, S., Norouzi, Y. et al. A density-grid-based method for clustering k-dimensional data. Appl Intell 53, 10559–10573 (2023). https://doi.org/10.1007/s10489-022-03711-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03711-0