Abstract
Massive amount of data are being collected in almost all sectors of life due to recent technological advancements. Various data mining tools including clustering is often applied on huge data sets in order to extract hidden and previously unknown information which can be helpful in future decision-making processes. Clustering is an unsupervised technique of data points which is separated into homogeneous groups. Seed point is an important feature of a clustering technique, which is called the core of the cluster and the performance of seed-based clustering technique depends on the choice of initial cluster center. The initial seed point selection is a challenging job due to formation of better cluster partition with rapidly convergence criteria. In the present research we have proposed the seed point selection algorithm applied on image data by taking the RGB features of color image as well as 2D data based on the maximization of Shannon’s entropy with distance restriction criteria. Our seed point selection algorithm converges in a minimum number of steps for the formation of better clusters. We have applied our algorithm in different image data as well as discrete data and the results appear to be satisfactory. Also we have compared the result with other seed selection methods applied through K-Means algorithm for the comparative study of number of iterations and CPU time with the other clustering technique.
References
Jain, A. K., Dubes, R. C.: Algorithms for Clustering Data. Englewood Cliffs NJ: Prentice-Hall, (1988).
Singh, D., Reddy, C. K.: A survey on platforms for big data analytics. Journal of Big Data, Springer, 2(8), 1–20, doi:10.1186/s40537-014-0008-6, (2014).
Liu, Z., Zheng, Q., Xue, L., Guan, X.: A distributed energy efficient clustering algorithm with improved coverage in wireless sensor networks. Journal of Future Generation Computer System, 28(5), 780–790, (2012).
Wang, Q., Megalooikonomou, V.: A clustering Algorithm for intrusion detection. In Proc. of SPIE, 5812, 31–38, doi:10.1117/12.603567, (2005).
Kodabagi, M. M., Hanji, S. S., Hanji, S. V.: Application of enhanced clustering technique using similarity measure for market segmentation. CS&IT –CSCP-2014, 15–27, (2014).
Villmann, T., Albani, C.: Clustering of categoric data in medicine application of evolutionary algorithms. International Conference 7th Fuzzy Days on Computational Intelligence, Theory and Applications, 619–627, (2001).
Cao, F., Liang, J., Jiang, G.: An initialization for the K-Means algorithm using neighborhood model. Computers and Mathematics with Applications, 58, 474–483, (2009).
Tou, J. T., Gonzales, R. C.: Pattern Recognition Principles. Addison-Wesley, (1974).
Bhattacharya, A., De, R. K.: Divisive correlation clustering algorithm (DCCA) for grouping of genes detecting varying patterns in expression profiles. Bioinformatics, 24, 1359–1366, (2008).
Reddy, C. K., Vinazmuri, B.: A survey of partitional and hierarchical clustering algorithms. Data Clustering Algorithms and Applications, 87–110, (2013).
Arifin, A. Z., Asano, A.: Image segmentation by histogram thresholding using hierarchical cluster analysis. Pattern Recognition Letters, 27(13), 1515–521, (2006).
Jain, A. K.: Data Clustering: 50 Years beyond K-Means. Pattern Recognition Letters, 31(8), 651–666, (2010).
Chen, K., Li, L..: The best K for entropy based categorical data clustering. Proc. of International Conference on Scientific and Statistical Database Management (SSDBM), 253–262, (2005).
Chaudhuri, D., Chaudhuri, B. B.: A novel multi-seed nonhierarchical data clustering technique. IEEE Trans. on Systems, Man and Cybernetics – Part B: 27(5), 871–877, (1997).
Pal, S.K., Paramanik, P. K.: Fuzzy measures in determining seed point in clustering. Pattern Recognition Letters, 4, 159–164, (1986).
Lu, J. F., Tang, J. B., Tang, Z. M., Wang, J. Y.: Hierarchical initialization approach for K-Means clustering. Pattern Recognition Letters, 29, 787–795, (2008).
Reddy, D., Jana, P. K.: Initialization for K-means clustering using Vornoi diagram. Procedia Technology 4, 395–400, (2012).
Bai, L., Liang, J., Dang, C., Cao, F.: A cluster centers initialization method for clustering categorical data. Expert Systems with Applications, 39, 8022–8029, (2012).
Celebi, M. E., Hassan, A. K., Vela, P. A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40, 200–210, (2013).
Zahra, S., Ghazanfar, M. A., Khalid, A., Naeem, U.: Novel centroid selection approaches for K-means-clustering based recommender systems. Information Sciences, 320, 156–189, (2015).
Astrahan, M. M.: Speech analysis by clustering, or the hyperphoneme method. Stanford Artif. Intell. Proj. Memo. AIM-124, AD 09067, Stanford Univ., Stanford, CA, (1970).
Ball, G. H., Hall, D. J.: ISODATA: A novel method of data analysis and pattern classification. Tech. Rep. Stanford Res. Inst., Menlo Park, CA, (1965).
Chaudhuri, D., Murthy, C. A., Chaudhuri, B. B.: Finding a subset of representative points in a data set. IEEE Trans. on Systems, Man and Cybernetics, 24(9), 1416–1424, (1994).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chowdhury, K., Chaudhuri, D., Pal, A.K. (2018). Seed Point Selection Algorithm in Clustering of Image Data. In: Sa, P., Sahoo, M., Murugappan, M., Wu, Y., Majhi, B. (eds) Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Advances in Intelligent Systems and Computing, vol 719. Springer, Singapore. https://doi.org/10.1007/978-981-10-3376-6_13
Download citation
DOI: https://doi.org/10.1007/978-981-10-3376-6_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3375-9
Online ISBN: 978-981-10-3376-6
eBook Packages: EngineeringEngineering (R0)