GA based Dimension Reduction for enhancing performance of k-Means and Fuzzy k-Means: A Case Study for Categorization of Medical Dataset

Gowda Karegowda, Asha; Jayaram, M.A.; Manjunath, A.S.; Vidya, T.; Shama

doi:10.1007/978-81-322-1038-2_15

Asha Gowda Karegowda⁶,
M.A. Jayaram⁶,
A.S. Manjunath⁶,
T. Vidya⁶ &
…
Shama⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 201))

1527 Accesses
2 Citations

Abstract

Medical Data mining is the process of extracting hidden patterns from medical data. Among the several clustering algorithms, k-means is the one of most extensively used clustering techniques in addition to fuzzy k-means clustering. The performance of both k-means and fuzzy k-means clustering is influenced by the initial cluster centers and might converge to local optimum. In addition, the performance of any data mining algorithm is influenced by the significant feature subset. This paper attempts to augment the performance of both k-means and fuzzy k-means clustering using two stages. As part of first stage, this paper investigates the use of wrapper approach of feature selection for clustering, where Genetic algorithm (GA) is used as a random search technique for subset generation, wrapped with k-means clustering. In the second stage of projected work, GA and Entropy based fuzzy clustering (EFC) are used to find the initial centroids for both k-means and fuzzy k-means clustering. Investigations have been directed using standard medical dataset namely Pima Indians Diabetes Dataset (PIDD). Experimental results confirm markable decline of almost 7% in the classification error of both k-means and fuzzy k-means clustering with GA nominated significant features and GA identified initial centroids when compared to randomly selected centroids with all features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asha Gowda Karegowda, S., Shama, T.R., Vidya, M.A. Jayaram, A.S. Manjunath. Improving performance of K-means clustering by initializing cluster centers using Genetic algorithm and Entropy based Fuzzy clustering for categorization of diabetic patients. In Proceedings of International Conference on Advances in Computing, MSRIT, Bangalore, (2012). (Advances in Intelligent Systems and Computing(Springer), Volume 174, 2013, pp. 899–904).
Google Scholar
BasharAl-Shbour, S.-H., Myaeng, Initializing K-means using Genetic Algorithm World Academy of Science, Engineering and Technlogy, (2009). 54: pp. 114–118.
Google Scholar
Bezdek J.C. (1973). Fuzzy mathematics in pattern classification, Ph.D thesis, Applied Mathematics Center, Ithca:Cornell University.
Google Scholar
Dervis Karboga and Celal Ozturk, Fuzzy clustering with artificial bee colony algorithm, (18 July 2012), Scientific Research and Essays Vol 5(14), pp. 1899–1902.
Google Scholar
Dunn J.C. (1973). A fuzzy relative of the ISODATA process and its used in detecting compact well-separated clusters. Journal of Cybernetics, vol 3.pp. 32–57.
Google Scholar
Eduardo R.H., Ricardo Campello, Alex A., Freitas, Andre C.P., (2009), A Survey of Evolutionary Algorithm for Clustering. IEEE Transanctions on Systems, Man and, Cybernetics, 39(2).
Google Scholar
Goldberg, D., Genetic Algorithms in Search, Optimization, and Machine learning (1989): Addison Wesley.
Google Scholar
Hao-Jun, Genetic Algorithm-based High-dimensional Data Clustering Technique,, in Sixth International Conference on Fuzzy systems and Knowledge Discovery pp. 485–489, (2009).
Google Scholar
Han, J., Kamber, M., Data Mining: Concepts and Techniques (2001), San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Jennifer. G. Dy, C.E., Brodley, Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5: pp. 845–889, (2004).
Google Scholar
Jimenez, J.F., Cuevas, F.J., Carpio, J.M. (2007). Genetic Algorithms applied to Clustering Problem and Data Mining. in 7th WSEAS International Conference on Simulation, Modeling and Optimization.
Google Scholar
Mac Queen, (1967), J. Some methods for the classification and analysis of multivariate observations in Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press.
Google Scholar
Manoranjan Dash, K.C. Peter Scheuermann, Huan Liu, (2002), Feature Selection for Clustering - A Filter Solution. in Second International Conference on Data Mining.
Google Scholar
Manoranjan Dash, H Liu, Feature selection for classification. Intelligent Data Analysis, (1998). 1: pp. 131–156.
Google Scholar
Mohammad F. Eltibi, W.M., Ashour, Initializing K-means Clustering Algorithm using Statistical Information International Journal of Computer Applications, (2011). 29: pp. 51–55.
Google Scholar
Newman, D.J., Hettich, S., Blake, C. L., Merz, C. J., UCI repository of machine learning databases, (1998): University of California, Irvine, Dept. of Information and Computer Sciences.
Google Scholar
Rajashree Dash, Debahuti Mishra, Amiya Kumar Rath, Milu Achrua. "A hybridized K-means clustering approach for high dimensional dataset", International Journal of Engineering, Science and Technology, vol2(2), (2010), pp. 59–66.
Google Scholar
U. Maulik, S., Bandopadhyay, Genetic Algorithm-Based Clustering Technique. Pattern Recognition, (1999). 33: pp. 1455–1465.
Google Scholar
Volker Roth, Tilman Lange, Feature Selection in Clustering Problems, in In Advances in Neural Information Processing Systems (NIPS), (2003).
Google Scholar
Vidyut Dey, D.K., Pratihar, Gauranga Lal Datta, (2011) Genetic algorithm-tuned entropy-based fuzzy C-means algorithm for obtaining distinct and compact clusters Fuzzy Optimization Decision Making 10: pp. 153–166.
Google Scholar
Yao.J, M., Dash, S.T., Tan, Liu., H., (2000) Entropy based fuzzy clustering and fuzzy modeling Fuzzy Sets and Systems. 113: pp. 381–388.
Google Scholar
Yazdani, D., Golyari, S., Meybodi, M.R., "A new hybrid approach for data clustering", 5th International Symposium on, Telecommunications, (4–6 Dec. 2010) pp. 914–919.
Google Scholar

Download references

Author information

Authors and Affiliations

Siddaganga Institute of Technology, Tumkur, India
Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, T. Vidya & Shama

Authors

Asha Gowda Karegowda
View author publications
You can also search for this author in PubMed Google Scholar
M.A. Jayaram
View author publications
You can also search for this author in PubMed Google Scholar
A.S. Manjunath
View author publications
You can also search for this author in PubMed Google Scholar
T. Vidya
View author publications
You can also search for this author in PubMed Google Scholar
Shama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asha Gowda Karegowda .

Editor information

Editors and Affiliations

South Asian University, Chankya Puri, New Delhi, 110021, India
Jagdish Chand Bansal
ABV - IIITM, Gwalior, Gwalior, 474015, Madhya Pradesh, India
Pramod Kumar Singh
, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, 247667, India
Kusum Deep
, Department of Paper Technology, Indian Institute of Technology Roorkee, Saharanpur Campus, Roorkee, India
Millie Pant
, Department of Computer Science, Liverpool Hope University, Office: FML 412, Liverpool, Liverpool, L16 9JD, United Kingdom
Atulya K. Nagar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gowda Karegowda, A., Jayaram, M., Manjunath, A., Vidya, T., Shama (2013). GA based Dimension Reduction for enhancing performance of k-Means and Fuzzy k-Means: A Case Study for Categorization of Medical Dataset. In: Bansal, J., Singh, P., Deep, K., Pant, M., Nagar, A. (eds) Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012). Advances in Intelligent Systems and Computing, vol 201. Springer, India. https://doi.org/10.1007/978-81-322-1038-2_15

Download citation

DOI: https://doi.org/10.1007/978-81-322-1038-2_15
Published: 04 December 2012
Publisher Name: Springer, India
Print ISBN: 978-81-322-1037-5
Online ISBN: 978-81-322-1038-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics