A New Locally Weighted K-Means for Cancer-Aided Microarray Data Analysis
- 191 Downloads
Cancer has been identified as the leading cause of death. It is predicted that around 20–26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. Microarray technology provides a useful basis of achieving this goal, with cluster analysis of gene expression data leading to the discrimination of patients, identification of possible tumor subtypes and individualized treatment. Amongst clustering techniques, k-means is normally chosen for its simplicity and efficiency. However, it does not account for the different importance of data attributes. This paper presents a new locally weighted extension of k-means, which has proven more accurate across many published datasets than the original and other extensions found in the literature.
KeywordsSubspace clustering Attribute weighting Cancer Microarray data analysis
The authors would like to thank X. Z. Fern and C. E. Brodley for the source code of HBGF, and C. Domeniconi for the implementation of LAC.
Conflict of Interest The authors declare that they have no conflict of interest.
- 1.Aggarwal, C., Procopiuc, C., Wolf, J. L., Yu, P. S., and Park, J. S., Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 61–72, 1999.Google Scholar
- 7.Cheng, Y., and Church, G. M., Biclustering of expression data. In: Proceedings of Int Conf on Intelligent Systems for Molecular Biology, pp 93–103, 2000.Google Scholar
- 15.Gordon, G. J. et al., Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17):4963–4967, 2002.Google Scholar
- 17.Iam-On, N., and Boongoen, T., New soft subspace method to gene expression data clustering. In: Proceedings of IEEE-EMBS International Conference on Biomedical and Health Informatics, pp 984–987, 2012.Google Scholar
- 20.Joliffe, I., Principal component analysis. Springer: New York, 1986.Google Scholar
- 22.Kriegel, H. P., Kroger, P., and Zimek, A., Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. KDD 3(1):1–ex, 2009.Google Scholar
- 24.Ng, A., Jordan, M., and Weiss, Y., On spectral clustering: analysis and an algorithm. Advances in NIPS 14, 2001.Google Scholar
- 25.Nutt, C. et al., Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7):1602–1607, 2003.Google Scholar
- 32.Su, A. et al., Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61(20):7388–7393, 2001.Google Scholar
- 33.Wallqvist, A., Rabow, A., Shoemaker, R., Sausville, E., and Covell, D., Establishing connections between microarray expression data and chemotherapeutic cancer pharmacology. Mol. Cancer. Ther. 1:311–320, 2002.Google Scholar