Abstract
This paper proposes an Automatic Clustering algorithm for Interval data using the Genetic algorithm (ACIG). In this algorithm, the overlapped distance between intervals is applied to determining the suitable number of clusters. Moreover, to optimize in clustering, we modify the Davies & Bouldin index, and to improve the crossover, mutation, and selection operators of the original genetic algorithm. The convergence of ACIG is theoretically proved and illustrated by the numerical examples. ACIG can be implemented effectively by the established Matlab procedure. Through the experiments on data sets with different characteristics, the proposed algorithm has shown the outstanding advantages in comparison to the existing ones. Recognizing the images by the proposed algorithm gives the potential in real applications of this research.
Similar content being viewed by others
References
Agustı, L. E., Salcedo-Sanz, S., Jiménez-Fernández, S., Carro-Calvo, L., Del Ser, J., & Portilla-Figueras, J. A. (2012). A new grouping genetic algorithm for clustering problems. Expert Systems with Applications, 39(10), 9695–9703.
Cabanes, G., Bennani, Y., Destenay, R., & Hardy, A. (2013). A new topological clustering algorithm for interval data. Pattern Recognition, 46(11), 3030–3039.
Chen, J., Chang, Y., & Hung, W. (2018). A robust automatic clustering algorithm for probability density functions with application to categorizing color images. Communications in Statistics-Simulation and Computation, 47(7), 2152–2168.
Chen, J. H., & Hung, W. L. (2015). An automatic clustering algorithm for probability density functions. Journal of Statistical Computation and Simulation, 85(15), 3047–3063.
Chen, C., & Quadrianto, N. (2016). Clustering high dimensional categorical data via topographical features. JMLR, 48, 2732–2740.
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(2), 224–227.
De Carvalho, FdAT, Pimentel, J. T., & Bezerra, L. X. T. (2007). Clustering of symbolic interval data based on a single adaptive l1 distance. IEEE pp. 224–229,
De Souza, R. M., de Carvalho, FdA., Silva, F. C. (2004). Clustering of interval-valued data using adaptive squared euclidean distances. Springer pp. 775–780.
Goh, A., & Vidal, R. (2008). Clustering and dimensionality reduction on riemannian manifolds. In CVPR 2008 IEEE conference on computer vision and pattern recognition (pp. 377–392).
Grogan, M., & Dahyot, R. (2019). \(L_2\) divergence for robust colour transfer. Computer Vision and Image Understanding.https://doi.org/10.1016/j.cviu.2019.02.002.
Hajjar, C., & Hamdan, H. (2011). Self-organizing map based on hausdorff distance for interval-valued data. IEEE (pp. 1747–1752).
Hajjar, C., & Hamdan, H. (2013). Interval data clustering using self-organizing maps based on adaptive Mahalanobis distances. Neural Networks, 46, 124–132.
Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials. SIAM Journal on Computing, 2(2), 88–105.
Höppner, F., & Böttcher, M. (2007). Matching partitions over time to reliably capture local clusters in noisy domains. Springer, Berlin, Heidelberg (pp. 479–486).
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Hung, W., Yang, J., & Shen, K. F. (2016). Self-updating clustering algorithm for interval-valued data. In IEEE international conference on fuzzy systems (pp. 1494–1500).
Izakian, Z., Saadi Mesgari, M., & Abraham, A. (2016). Automated clustering of trajectory data using a particle swarm optimization. Computers, Environment and Urban Systems, 55, 55–65.
Jain, M., & Vayada, M. G. (2017). Non-cognitive color and texture based image segmentation amalgamation with evidence theory of crop images. IEEE (pp. 160–165).
Kabir, S., Wagner, C., Havens, T. C., Anderson, D. T., & Aickelin, U. (2017). Novel similarity measure for interval-valued data based on overlapping ratio. IEEE (pp. 1–6).
Kao, C. H., Nakano, J., Shieh, S. H., Tien, Y. J., Wu, H. M., Yang, C., et al. (2014). Exploratory data analysis of interval-valued symbolic data with matrix visualization. Computational Statistics & Data Analysis, 79, 14–29.
Kim, K., & Ahn, H. (2008). A recommender system using GA K-means clustering in an online shopping market. Expert Systems with Applications, 34(2), 1200–1209.
Lai, C. C. (2005). A novel clustering approach using hierarchical genetic algorithms. Intelligent Automation & Soft Computing, 11(3), 143–153.
Liu, Y., Wu, X., & Shen, Y. (2011). Automatic clustering using genetic algorithms. Applied Mathematics and Computation, 218(4), 1267–1279.
Masson, M. H., & Denœux, T. (2004). Clustering interval-valued proximity data using belief functions. Pattern Recognition Letters, 25(2), 163–171.
NguyenTrang, T., & VoVan, T. (2017). A new approach for determining the prior probabilities in the classification problem by Bayesian method. Advances in Data Analysis and Classification, 11(3), 629–643.
NguyenTrang, T., & Vovan, T. (2017). Fuzzy clustering of probability density functions. Journal of Applied Statistics, 44(4), 583–601.
Parag, C. P., & James, A. R. (2004). An empirical study of impact of crossover operators on the performance of non-binary genetic algorithm based neural approaches for classification. Computers & Operations Research, 31, 481–498.
Peng, W., & Li, T. (2006). Interval data clustering with applications. IEEE (pp. 355–362).
PhamGia, T., Turkkan, N., & VoVan, T. (2008). Statistical discrimination analysis using the maximum function. Communications in Statistics-Simulation and Computation, 37(2), 320–336.
Ren, Y., Liu, Y.H., Rong, J., & Dew, R. (2009). Clustering interval-valued data using an overlapped interval divergence. Australian Computer Society, Inc (pp. 35–42).
Sato-Ilic, M. (2011). Symbolic clustering with interval-valued data. Procedia Computer Science, 6, 358–363.
Şeref, O., Fan, Y. J., Borenstein, E., & Chaovalitwongse, W. A. (2018). Information-theoretic feature selection with discrete $$k$$k-median clustering. Annals of Operations Research, 263(1), 93–118.
Souza, R. M. C. R., & Carvalho, F. A. T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25(3), 353–365.
Vovan, T., Phamtoan, D., & Tranthituy, D. (2019). Automatic genetic algorithm in clustering for discrete elements. Communications in Statistics-Simulation and Computation,. https://doi.org/10.1080/03610918.2019.1588305.
Vovan, T. (2017). \(L^1\)-distance and classification problem by Bayesian method. Journal of Applied Statistics, 44(3), 385–401.
VoVan, T., NguyenThoi, T., VoDuy, T., HoHuu, V., & NguyenTrang, T. (2017). Modified genetic algorithm-based clustering for probability density functions. Journal of Statistical Computation and Simulation, 87(10), 1964–1979.
VoVan, T., & NguyenTrang, T. (2018). Similar coefficient for cluster of probability density functions. Communications in Statistics—Theory and Methods, 47(8), 1792–1811.
VoVan, T., & NguyenTrang, T. (2018b). Similar coefficient of cluster for discrete elements. Sankhya B, 80(01), 19–36.
VoVan, T., NguyenTrang, T., & CheNgoc, H. (2016). Clustering for probability density functions based on Genetic Algorithm. Boca Raton: CRC Press.
VoVan, T., & PhamGia, T. (2010). Clustering probability distributions. Journal of Applied Statistics, 37(11), 1891–1910.
Xu, X., Li, X., Liu, X., Shen, H., & Shi, Q. (2016). Multimodal registration of remotely sensed images based on jeffrey’s divergence. ISPRS Journal of Photogrammetry and Remote Sensing, 122, 97–115.
Acknowledgements
For Le Hoang Tuan, this research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under Grant Number C2018-26-05.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vovan, T., Phamtoan, D., Tuan, L.H. et al. An automatic clustering for interval data using the genetic algorithm. Ann Oper Res 303, 359–380 (2021). https://doi.org/10.1007/s10479-020-03606-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-020-03606-8