Aggregate Distance Based Clustering Using Fibonacci Series-FIBCLUS

Rawat, Rakesh; Nayak, Richi; Li, Yuefeng; Alsaleh, Slah

doi:10.1007/978-3-642-20291-9_6

Rakesh Rawat²¹,
Richi Nayak²¹,
Yuefeng Li²¹ &
…
Slah Alsaleh²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Included in the following conference series:

Asia-Pacific Web Conference

1071 Accesses
1 Citations

Abstract

This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rasmussen, M., Karypis, G.: Gcluto: An Interactive Clustering, Visualization, and Analysis System, vol. 21. Citeseer (2008)
Google Scholar
Liao, H., Ng, M.K.: Categorical data clustering with automatic selection of cluster number. Fuzzy Information and Engineering 1, 5–25 (2009)
Article MATH Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Article Google Scholar
Stanfill, C.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)
Article Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation, vol. 30, p. 3. Citeseer (2007)
Google Scholar
Ian, H., Witten, E.F.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Internation Journal of Applied Mathematics and Computer Science 14, 241–248 (2004)
MATH Google Scholar
Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognition Letters 28, 110–118 (2007)
Article Google Scholar
Le, S.Q., Ho, T.B.: An association-based dissimilarity measure for categorical data. Pattern Recognition Letters 26, 2549–2557 (2005)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes* 1. Information Systems 25, 345–366 (2000)
Article Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS—clustering categorical data using summaries. In: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, United States, pp. 73–83 (1999)
Google Scholar
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. The VLDB Journal 8(3), 222–236 (2000)
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD, International Conference on Management of Data, pp. 103–114 (1996)
Google Scholar
Barbará, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: 11th International Conference on Information and knowledge Management, pp. 582–589 (2002)
Google Scholar
Rendón, E., Sánchez, J.: Clustering based on compressed data for categorical and mixed attributes. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 817–825. Springer, Heidelberg (2006)
Chapter Google Scholar
Ichino, M., Yaguchi, H.: Generalized Minkoeski metrics for mixed feature-type data analysis. IEEE Transaction on Systems,Man and Cybernitics 24, 694–708 (1994)
Google Scholar
Chandra, P., Weisstein, E.W.: Fibonacci Number. In: MathWorld–A Wolfram Web Resource, http://mathworld.wolfram.com/FibonacciNumber.html
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms, vol. 34, pp. 596–615. ACM, New York (1987)
Google Scholar
Lacueva-Pérez, F.J.: Supervised Classification Fuzzy Growing Hierarchical SOM. In: Corchado, E., Abraham, A., Pedrycz, W. (eds.) HAIS 2008. LNCS (LNAI), vol. 5271, pp. 220–228. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science and Technology, Queensland University of University, Brisbane, Australia
Rakesh Rawat, Richi Nayak, Yuefeng Li & Slah Alsaleh

Authors

Rakesh Rawat
View author publications
You can also search for this author in PubMed Google Scholar
Richi Nayak
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Slah Alsaleh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information, Renmin University of China, 100872, Beijing, China
Xiaoyong Du
LFCS, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, Scotland, UK
Wenfei Fan
School of Software, Tsinghua University, Room 819, Main Building, 100084, Beijing, China
Jianmin Wang
Computer School, Wuhan University, Luojiashan Road, 430072, Wuhan, Hubei, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, St. Lucia, Australia
Mohamed A. Sharaf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rawat, R., Nayak, R., Li, Y., Alsaleh, S. (2011). Aggregate Distance Based Clustering Using Fibonacci Series-FIBCLUS. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-20291-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics