Abstract
This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rasmussen, M., Karypis, G.: Gcluto: An Interactive Clustering, Visualization, and Analysis System, vol. 21. Citeseer (2008)
Liao, H., Ng, M.K.: Categorical data clustering with automatic selection of cluster number. Fuzzy Information and Engineering 1, 5–25 (2009)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Stanfill, C.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation, vol. 30, p. 3. Citeseer (2007)
Ian, H., Witten, E.F.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Internation Journal of Applied Mathematics and Computer Science 14, 241–248 (2004)
Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognition Letters 28, 110–118 (2007)
Le, S.Q., Ho, T.B.: An association-based dissimilarity measure for categorical data. Pattern Recognition Letters 26, 2549–2557 (2005)
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes* 1. Information Systems 25, 345–366 (2000)
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS—clustering categorical data using summaries. In: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, United States, pp. 73–83 (1999)
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. The VLDB Journal 8(3), 222–236 (2000)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD, International Conference on Management of Data, pp. 103–114 (1996)
Barbará, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: 11th International Conference on Information and knowledge Management, pp. 582–589 (2002)
Rendón, E., Sánchez, J.: Clustering based on compressed data for categorical and mixed attributes. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 817–825. Springer, Heidelberg (2006)
Ichino, M., Yaguchi, H.: Generalized Minkoeski metrics for mixed feature-type data analysis. IEEE Transaction on Systems,Man and Cybernitics 24, 694–708 (1994)
Chandra, P., Weisstein, E.W.: Fibonacci Number. In: MathWorld–A Wolfram Web Resource, http://mathworld.wolfram.com/FibonacciNumber.html
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms, vol. 34, pp. 596–615. ACM, New York (1987)
Lacueva-Pérez, F.J.: Supervised Classification Fuzzy Growing Hierarchical SOM. In: Corchado, E., Abraham, A., Pedrycz, W. (eds.) HAIS 2008. LNCS (LNAI), vol. 5271, pp. 220–228. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rawat, R., Nayak, R., Li, Y., Alsaleh, S. (2011). Aggregate Distance Based Clustering Using Fibonacci Series-FIBCLUS. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-20291-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)