Abstract
In data domains, the process of clustering is expressed as exploratory data analysis in which similar objects can be grouped as subsets according to the properties of a cluster. Discovering the number of clusters is an important issue in clustering. It is noted that k-means gives poor clustering results when the user attempts an incorrect ‘k’ value. The visual access tendency (VAT) is a widely used technique for discovering the number of clusters. Recently, Bezdek et al. introduced extended ideas of VAT such as SpecVAT, and iVAT. The SpecVAT uses spectral approach and produces accurate clustering results than VAT. The limitation of SpecVAT is that it unables to solve the clustering tendency problem for path-based clustered data. The iVAT technique solves this issue. These techniques use an Euclidean space for dissimilarity matrix computation. In this paper, we use a multi-view point based similarity (MVS) cosine metric for achieving robust results. We present two proposed methods, namely, cSpecVAT and GMMMVS-VAT. The cSpecVAT is developed by cosine metric and spectral concepts and it extracts efficient clustering results over the comprehensive datasets such as synthetic, real, genetic and image. For audio datasets, there is another method proposed called as GMMMVS-VAT, which includes the following steps: modelling the speech data by Gaussian mixture model (GMM), and MVS for extracting the similarity features as reference to multi-view points; hence, it works more effectively on speech datasets. In MVS, we use a number of view-points as reference making it more robust than a single view-point approach.
Similar content being viewed by others
References
Bezdek James (2002) VAT: a tool for visual assessment of cluster tendency. Proc Int Joint Conf Neural Netw 3:2225–2230
Bezdek JC, Pal NR (1998) Some new indexes of clustering validity. IEEE Trans Syst Man Cybernet 28(3):301–315
Bolshakova N, Azuaje F (2003) Cluster validiation techniques for genome expression data. Sig Process 83:825–833
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(2):1624–1637
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score normalization techniques. In proceedings of IEEE Odyssey workshop, Brno
Duda RO, Hart PE, Stork DG (2000) Pattern Classification, 2nd edn. Wiley, New York
Eswara Reddy B, Rajendra Prasad K (2012) Reducing runtime values in minimum spanning tree based clustering by visual access tendency. Int J Data Min Knowl Manag Process 2(3):11–22
Fakunaga K, Hostetler L (1975) The estimation of the gradient of a density function with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validity techniques. J Intell Inform Syst 17(2):107–145
Havens TC, Bezdek JC (2010) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorthm. IEEE Trans Knowl Data Eng 22(10):1401–1413
Jain AK, Murthi MN, Flynn PJ (1999) Data Clustering: Review. ACM Comput Surv 31(3):266–320
Kenny P, Boulianne G (2007) Speaker and session variability in GMM based speaker verification. IEEE Trans Audio Speech Lang Process 15(4):1448–1460
Lovasz L, Plummer M (1986) Matching theory. Akadémiai Kiadó, Budapest
Nguyen DT (2012) Clustering with multi-viewpoint based similarity measure. IEEE Trans Knowl Data Eng 24(6):988–1001
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybernet 9(1):62–66
Pekalska E, Harol A, Duin RPW, Spillmann B, Bunke H (2006) Non-Euclidean or non-metric measures can be informative. In: Yeung D-Y et al (eds) SSPR & SPR 2006. LNCS, vol 4109. Springer, Heidelberg, pp 871–880
Popescu M, Bezdek JC, Havens TC, Keller JM (2013) A clustering validity frame work based on Induced partition dissimilarity. IEEE Trnas Cybern 43(1):308–320
Ramze ReZaee M, Lelieveldt BPF (1998) A new cluster validity index for the fuzzy c-mean. Pattern Recognit Lett 19:237–246
Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17:91–108
Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(3):19–41
Senoussaoui M, Kenny P (2014) A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Trans Audio Speech Lang Process 22(1):217–227
M. Senoussaoui, Patrick Kenny, Themos stafylakis, Pierre Dumouchel (2013) Efficient iterative mean shift based cosine dissimilarity for mutli-recording speaker clustering. In: Proceedings of ICASSP, 7712–7715
Senoussaoui M, Kenny P, Stafylakis T, Dumouchel P (2014) A study of the cosine distance-based mean shift for telephone speech diarization. IEEE Trans Audio, Speech Lang Process 22(1):217–227
Tang H, Chu SM (2012) Partially supervised speaker clustering. IEEE Trans Pattern Anal Mach Intell 34(5):959–971
Wang Liang, Bezdek James (2009) automatically determining the number of clusters in unlabeled datasets. IEEE Trans Knowl Data Eng 21(3):335–349
Wang Liang, Bezdek James (2010) Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl Data Eng 22(10):1401–1413
Wang X, Wang X, Wlkes DM (2009) A divide-and-conquer—approach for minimum spanning tree-based clustering. IEEE Trans Knowl Data Eng 21(7):945–958
Georghiades A et al (2001) Yale face database. http://vision.ucsd.edu/leekc/ExtYaleDatabase/ExtYaleB.html
(2012) http://www.exploredata.net/Downloads/Gene-Expression-Data-Set
Y Yan, L Chen, DT Nguyen (2012) Semi-supervised clustering with multi-viewpoint based similarity measure. In: WCCI 2012 IEEE world congress on computational intelligence, Brisbane, 1–8
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eswara Reddy, B., Rajendra Prasad, K. Improving the performance of visualized clustering method. Int J Syst Assur Eng Manag 7 (Suppl 1), 102–111 (2016). https://doi.org/10.1007/s13198-015-0342-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-015-0342-x