Abstract
The research optimizes reversible cellular automata based clustering technique for any high dimensional dataset. The reversible rules are characterized using the cycle structure properties of each rule to identify effective rules for clustering. This essentially reduces the rule search space for a given neighborhood size. A novel encoding technique (BiNCE Encoding) that encodes any dataset into binary form without significant data loss is also introduced for our algorithm. Finally, the algorithm and implementation is transformed into a package which is applicable on various datasets, split sizes and cluster sizes for ease of accessibility and reproducibility. While compared against the state-of-the-art methods using benchmark clustering metrics, it is shown that our algorithm is at par or beating the scores for certain datasets and settings.
This work is partially supported by Start-up Research Grant (File number: SRG/2022/002098), SERB, Govt. of India.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Henceforth in this article, RCA will refer to a 1-dimensional 5-neighborhood binary reversible cellular automaton having null boundary conditions.
- 2.
The split size has no relation with the dataset size and is related to the computational time. The value of \(split\_size\) can remain the same for larger datasets resulting in a larger number of splits which can be run in parallel.
- 3.
GitHub repository: https://github.com/Viswonathan06/Reversible-Cellular-Automata-Clustering.
References
Mukherjee, S., Bhattacharjee, K., Das, S.: Reversible cellular automata: a natural clustering technique. J. Cell. Autom. 16, 1–38 (2021)
Mukherjee, S., Bhattacharjee, K., Das, S.: Clustering using cyclic spaces of reversible cellular automata. Complex Syst. 30, 205–237 (2021). https://doi.org/10.25088/ComplexSystems.30.2.205
Bhattacharjee, K., Abhishek, S., Dharwish, M., Das, A.: A cellular automata-based clustering technique for high-dimensional data. In: Das, S., Martinez, G.J. (eds.) ASCAT 2023, pp. 37–51. Springer, Cham (2023). https://doi.org/10.1007/978-981-99-0688-8_4
He, H., Tan, Y.: Automatic pattern recognition of ECG signals using entropy-based adaptive dimensionality reduction and clustering. Appl. Soft Comput. 55, 238–252 (2017). https://doi.org/10.1016/j.asoc.2017.02.001
Negadi, T.: The genetic code via Godel encoding: arXiv preprint arXiv:0805.0695 (2008)
Aumasson, J.-P., Bernstein, D.J.: SipHash: a fast short-input PRF. In: Galbraith, S., Nandi, M. (eds.) INDOCRYPT 2012. LNCS, vol. 7668, pp. 489–508. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34931-7_28
Trejos, J., Murillo, A., Piza, E.: Clustering by ant colony optimization. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation, pp. 25–32. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-17103-1_3
Wan, M., Xiao, J., Wang, C., Yang, Y.: Data clustering using bacterial foraging optimization. J. Intell. Inf. Syst. JIIS 38, 321–341 (2011). https://doi.org/10.1007/s10844-011-0158-3
Kumar, N., Abraham, A., Pant, M.: Biological and Swarm Intelligence-Based Clustering Algorithms: A Comprehensive Survey and Analysis (2017)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33, 1455–1465 (2000). https://doi.org/10.1016/S0031-3203(99)00137-5
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974). https://doi.org/10.1080/03610927408827101
Shah, A.: Credit Card Customer Data. https://www.kaggle.com/datasets/aryashah2k/credit-card-customer-data
Fisher, R.A.: IRIS. UCI Machine Learning Repository (1988). https://doi.org/10.24432/C56C76
Dataworldadmin, School District Breakdowns - dataset by city-of-ny | data.world. https://data.world/city-of-ny/g3vh-kbnw
Mahimkar, A.: Customer Segmentation. https://www.kaggle.com/code/adityamahimkar/customer-segmentation/input?select=segmentation+data.csv
Larxel, Heart Failure Prediction. https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data
Aeberhard, S., Forina, M.: Wine. UCI Machine Learning Repository (1991). https://doi.org/10.24432/C5PC7J
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning Proceedings 1995, pp. 194–202. Morgan Kaufmann (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Manoranjan, V., Sneha Rao, G., Vaidhianathan, S.V., Bhattacharjee, K. (2023). Optimized Reversible Cellular Automata Based Clustering. In: Manzoni, L., Mariot, L., Roy Chowdhury, D. (eds) Cellular Automata and Discrete Complex Systems. AUTOMATA 2023. Lecture Notes in Computer Science, vol 14152. Springer, Cham. https://doi.org/10.1007/978-3-031-42250-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-42250-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42249-2
Online ISBN: 978-3-031-42250-8
eBook Packages: Computer ScienceComputer Science (R0)