Abstract
Gene datasets from microarray comprise large number of genes. Clustering is a widely used approach for grouping similar kind of genes. The main objective of this paper is to identify the optimal subset of genes from the leukemia dataset in order to classify the leukemia cancer. Different clustering approaches such as K-means (KM) clustering, fuzzy C-means (FCM) clustering, and modified K-means (MKM) clustering have been adopted in this research. The clusters obtained from these methods are further clustered using K-means sample-wise (by omitting class values), and the results are compared with ground truth value to evaluate the performance of the different clustering methods. The highly correlated genes are selected from the cluster that produces more accurate classification results. It is observed that the FCM (gene-wise clustering) with K-means (sample-wise clustering) produces better accuracy, and the resultant genes have been identified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Stanislav Busygin, Gerrit Jacobsen, and Ewald Kramer. Double conjugated clustering applied to leukemia microarray data. In Proceedings of the 2nd SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data, 2002.
Aik Choon Tan and David Gilbert, Ensemble machine learning on gene expression data for cancer classification: Applied Bioinformatics 2003:2 (3 Suppl) S75–S83.
Cherie H. Dunphy (2006) Gene Expression Profiling Data in Lymphoma and Leukemia: Review of the Literature and Extrapolation of Pertinent Clinical Applications. Archives of Pathology & Laboratory Medicine: April 2006, Vol. 130, No. 4, pp. 483–520.
Yoo CK, Vanrolleghem PA. Interpreting patterns and analysis of acute leukemia gene expression data by multivariate statistical analysis. In: Barbosa Povoa A, Matos H, editors. Computer-Aided Chemical Engineering. Elsevier Science; 2004. pp. 1165–70.
Wei Li, Modified K-means clustering algorithm, Congress on Image & Signal Processing, IEEE, 2008, pp. 618–621.
T.R. Golub et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 1999, Vol. 286, pp. 531–537.
Palanisamy, P.; Perumal; Thangavel, K.; Manavalan, R., “A novel approach to select significant genes of leukemia cancer data using K-Means clustering,” Pattern Recognition, Informatics and Medical Engineering (PRIME), 2013 International Conference on, pp. 104, 108, 21–22 Feb. 2013.
Acknowledgments
The third author gratefully acknowledges the UGC, New Delhi, for partial financial assistance under UGC-SAP (DRS) Grant No. F3-50/2011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Prasath, P., Perumal, K., Thangavel, K., Manavalan, R. (2014). A Novel Approach to Gene Selection of Leukemia Dataset Using Different Clustering Methods. In: Krishnan, G., Anitha, R., Lekshmi, R., Kumar, M., Bonato, A., Graña, M. (eds) Computational Intelligence, Cyber Security and Computational Models. Advances in Intelligent Systems and Computing, vol 246. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1680-3_7
Download citation
DOI: https://doi.org/10.1007/978-81-322-1680-3_7
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1679-7
Online ISBN: 978-81-322-1680-3
eBook Packages: EngineeringEngineering (R0)