Skip to main content

Advertisement

Log in

An automatic clustering for interval data using the genetic algorithm

  • S.I.: Data Mining and Decision Analytics
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This paper proposes an Automatic Clustering algorithm for Interval data using the Genetic algorithm (ACIG). In this algorithm, the overlapped distance between intervals is applied to determining the suitable number of clusters. Moreover, to optimize in clustering, we modify the Davies & Bouldin index, and to improve the crossover, mutation, and selection operators of the original genetic algorithm. The convergence of ACIG is theoretically proved and illustrated by the numerical examples. ACIG can be implemented effectively by the established Matlab procedure. Through the experiments on data sets with different characteristics, the proposed algorithm has shown the outstanding advantages in comparison to the existing ones. Recognizing the images by the proposed algorithm gives the potential in real applications of this research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Agustı, L. E., Salcedo-Sanz, S., Jiménez-Fernández, S., Carro-Calvo, L., Del Ser, J., & Portilla-Figueras, J. A. (2012). A new grouping genetic algorithm for clustering problems. Expert Systems with Applications, 39(10), 9695–9703.

    Article  Google Scholar 

  • Cabanes, G., Bennani, Y., Destenay, R., & Hardy, A. (2013). A new topological clustering algorithm for interval data. Pattern Recognition, 46(11), 3030–3039.

    Article  Google Scholar 

  • Chen, J., Chang, Y., & Hung, W. (2018). A robust automatic clustering algorithm for probability density functions with application to categorizing color images. Communications in Statistics-Simulation and Computation, 47(7), 2152–2168.

    Article  Google Scholar 

  • Chen, J. H., & Hung, W. L. (2015). An automatic clustering algorithm for probability density functions. Journal of Statistical Computation and Simulation, 85(15), 3047–3063.

    Article  Google Scholar 

  • Chen, C., & Quadrianto, N. (2016). Clustering high dimensional categorical data via topographical features. JMLR, 48, 2732–2740.

    Google Scholar 

  • Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(2), 224–227.

    Article  Google Scholar 

  • De Carvalho, FdAT, Pimentel, J. T., & Bezerra, L. X. T. (2007). Clustering of symbolic interval data based on a single adaptive l1 distance. IEEE pp. 224–229,

  • De Souza, R. M., de Carvalho, FdA., Silva, F. C. (2004). Clustering of interval-valued data using adaptive squared euclidean distances. Springer pp. 775–780.

  • Goh, A., & Vidal, R. (2008). Clustering and dimensionality reduction on riemannian manifolds. In CVPR 2008 IEEE conference on computer vision and pattern recognition (pp. 377–392).

  • Grogan, M., & Dahyot, R. (2019). \(L_2\) divergence for robust colour transfer. Computer Vision and Image Understanding.https://doi.org/10.1016/j.cviu.2019.02.002.

  • Hajjar, C., & Hamdan, H. (2011). Self-organizing map based on hausdorff distance for interval-valued data. IEEE (pp. 1747–1752).

  • Hajjar, C., & Hamdan, H. (2013). Interval data clustering using self-organizing maps based on adaptive Mahalanobis distances. Neural Networks, 46, 124–132.

    Article  Google Scholar 

  • Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials. SIAM Journal on Computing, 2(2), 88–105.

    Article  Google Scholar 

  • Höppner, F., & Böttcher, M. (2007). Matching partitions over time to reliably capture local clusters in noisy domains. Springer, Berlin, Heidelberg (pp. 479–486).

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Hung, W., Yang, J., & Shen, K. F. (2016). Self-updating clustering algorithm for interval-valued data. In IEEE international conference on fuzzy systems (pp. 1494–1500).

  • Izakian, Z., Saadi Mesgari, M., & Abraham, A. (2016). Automated clustering of trajectory data using a particle swarm optimization. Computers, Environment and Urban Systems, 55, 55–65.

    Article  Google Scholar 

  • Jain, M., & Vayada, M. G. (2017). Non-cognitive color and texture based image segmentation amalgamation with evidence theory of crop images. IEEE (pp. 160–165).

  • Kabir, S., Wagner, C., Havens, T. C., Anderson, D. T., & Aickelin, U. (2017). Novel similarity measure for interval-valued data based on overlapping ratio. IEEE (pp. 1–6).

  • Kao, C. H., Nakano, J., Shieh, S. H., Tien, Y. J., Wu, H. M., Yang, C., et al. (2014). Exploratory data analysis of interval-valued symbolic data with matrix visualization. Computational Statistics & Data Analysis, 79, 14–29.

    Article  Google Scholar 

  • Kim, K., & Ahn, H. (2008). A recommender system using GA K-means clustering in an online shopping market. Expert Systems with Applications, 34(2), 1200–1209.

    Article  Google Scholar 

  • Lai, C. C. (2005). A novel clustering approach using hierarchical genetic algorithms. Intelligent Automation & Soft Computing, 11(3), 143–153.

    Article  Google Scholar 

  • Liu, Y., Wu, X., & Shen, Y. (2011). Automatic clustering using genetic algorithms. Applied Mathematics and Computation, 218(4), 1267–1279.

    Article  Google Scholar 

  • Masson, M. H., & Denœux, T. (2004). Clustering interval-valued proximity data using belief functions. Pattern Recognition Letters, 25(2), 163–171.

    Article  Google Scholar 

  • NguyenTrang, T., & VoVan, T. (2017). A new approach for determining the prior probabilities in the classification problem by Bayesian method. Advances in Data Analysis and Classification, 11(3), 629–643.

    Article  Google Scholar 

  • NguyenTrang, T., & Vovan, T. (2017). Fuzzy clustering of probability density functions. Journal of Applied Statistics, 44(4), 583–601.

    Article  Google Scholar 

  • Parag, C. P., & James, A. R. (2004). An empirical study of impact of crossover operators on the performance of non-binary genetic algorithm based neural approaches for classification. Computers & Operations Research, 31, 481–498.

    Article  Google Scholar 

  • Peng, W., & Li, T. (2006). Interval data clustering with applications. IEEE (pp. 355–362).

  • PhamGia, T., Turkkan, N., & VoVan, T. (2008). Statistical discrimination analysis using the maximum function. Communications in Statistics-Simulation and Computation, 37(2), 320–336.

    Article  Google Scholar 

  • Ren, Y., Liu, Y.H., Rong, J., & Dew, R. (2009). Clustering interval-valued data using an overlapped interval divergence. Australian Computer Society, Inc (pp. 35–42).

  • Sato-Ilic, M. (2011). Symbolic clustering with interval-valued data. Procedia Computer Science, 6, 358–363.

    Article  Google Scholar 

  • Şeref, O., Fan, Y. J., Borenstein, E., & Chaovalitwongse, W. A. (2018). Information-theoretic feature selection with discrete $$k$$k-median clustering. Annals of Operations Research, 263(1), 93–118.

    Article  Google Scholar 

  • Souza, R. M. C. R., & Carvalho, F. A. T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25(3), 353–365.

    Article  Google Scholar 

  • Vovan, T., Phamtoan, D., & Tranthituy, D. (2019). Automatic genetic algorithm in clustering for discrete elements. Communications in Statistics-Simulation and Computation,. https://doi.org/10.1080/03610918.2019.1588305.

  • Vovan, T. (2017). \(L^1\)-distance and classification problem by Bayesian method. Journal of Applied Statistics, 44(3), 385–401.

    Article  Google Scholar 

  • VoVan, T., NguyenThoi, T., VoDuy, T., HoHuu, V., & NguyenTrang, T. (2017). Modified genetic algorithm-based clustering for probability density functions. Journal of Statistical Computation and Simulation, 87(10), 1964–1979.

    Article  Google Scholar 

  • VoVan, T., & NguyenTrang, T. (2018). Similar coefficient for cluster of probability density functions. Communications in Statistics—Theory and Methods, 47(8), 1792–1811.

    Article  Google Scholar 

  • VoVan, T., & NguyenTrang, T. (2018b). Similar coefficient of cluster for discrete elements. Sankhya B, 80(01), 19–36.

    Article  Google Scholar 

  • VoVan, T., NguyenTrang, T., & CheNgoc, H. (2016). Clustering for probability density functions based on Genetic Algorithm. Boca Raton: CRC Press.

    Google Scholar 

  • VoVan, T., & PhamGia, T. (2010). Clustering probability distributions. Journal of Applied Statistics, 37(11), 1891–1910.

    Article  Google Scholar 

  • Xu, X., Li, X., Liu, X., Shen, H., & Shi, Q. (2016). Multimodal registration of remotely sensed images based on jeffrey’s divergence. ISPRS Journal of Photogrammetry and Remote Sensing, 122, 97–115.

    Article  Google Scholar 

Download references

Acknowledgements

For Le Hoang Tuan, this research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under Grant Number C2018-26-05.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thao Nguyentrang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vovan, T., Phamtoan, D., Tuan, L.H. et al. An automatic clustering for interval data using the genetic algorithm. Ann Oper Res 303, 359–380 (2021). https://doi.org/10.1007/s10479-020-03606-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-020-03606-8

Keywords

Navigation