Journal of Medical Systems

, Volume 36, Supplement 1, pp 43–49 | Cite as

A New Locally Weighted K-Means for Cancer-Aided Microarray Data Analysis

  • Natthakan Iam-OnEmail author
  • Tossapon Boongoen
Original Paper


Cancer has been identified as the leading cause of death. It is predicted that around 20–26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. Microarray technology provides a useful basis of achieving this goal, with cluster analysis of gene expression data leading to the discrimination of patients, identification of possible tumor subtypes and individualized treatment. Amongst clustering techniques, k-means is normally chosen for its simplicity and efficiency. However, it does not account for the different importance of data attributes. This paper presents a new locally weighted extension of k-means, which has proven more accurate across many published datasets than the original and other extensions found in the literature.


Subspace clustering Attribute weighting Cancer Microarray data analysis 



The authors would like to thank X. Z. Fern and C. E. Brodley for the source code of HBGF, and C. Domeniconi for the implementation of LAC.

Conflict of Interest The authors declare that they have no conflict of interest.


  1. 1.
    Aggarwal, C., Procopiuc, C., Wolf, J. L., Yu, P. S., and Park, J. S., Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 61–72, 1999.Google Scholar
  2. 2.
    Alizadeh, A. A. et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511, 2000.CrossRefGoogle Scholar
  3. 3.
    Armstrong, S. et al., MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30:41–47, 2002.CrossRefGoogle Scholar
  4. 4.
    Bittner, M. et al., Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795):536–540, 2000.CrossRefGoogle Scholar
  5. 5.
    Boongoen, T., and Shen, Q., Nearest-neighbor guided evaluation of data reliability and its applications. IEEE Trans. Syst. Man cybern., Part B 40(6):1622–1633, 2010.CrossRefGoogle Scholar
  6. 6.
    Boongoen, T., Shang, C., Iam-On, N., and Shen. Q., Extending data reliability measure to a filter approach for soft subspace clustering. IEEE Trans. Syst. Man cybern., Part B 41(6):1705–1714, 2011.CrossRefGoogle Scholar
  7. 7.
    Cheng, Y., and Church, G. M., Biclustering of expression data. In: Proceedings of Int Conf on Intelligent Systems for Molecular Biology, pp 93–103, 2000.Google Scholar
  8. 8.
    de Souto, M., Costa, I., de Araujo, D., Ludermir, T., and Schliep, A., Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9:497, 2008.CrossRefGoogle Scholar
  9. 9.
    Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D., Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery 14(1):63–97, 2007.MathSciNetCrossRefGoogle Scholar
  10. 10.
    Dy, J. G., and Brodley, C. E., Feature selection for unsupervised learning. J. Mach. Learn. Res. 5:845–889, 2004.MathSciNetzbMATHGoogle Scholar
  11. 11.
    Dyrskjot, L. et al., Identifying distinct classes of bladder carcinoma using microarrays. Nat. Genet. 33:90–96, 2003.CrossRefGoogle Scholar
  12. 12.
    Gan, G. J., and Wu, J. H., A convergence theorem for the fuzzy subspace clustering (FSC) algorithm. Pattern Recogn. 41:1939–1947, 2008.zbMATHCrossRefGoogle Scholar
  13. 13.
    Garber, M. E. et al., Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci. USA 98(24):13784–13789, 2001.CrossRefGoogle Scholar
  14. 14.
    Golub, T. et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.CrossRefGoogle Scholar
  15. 15.
    Gordon, G. J. et al., Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17):4963–4967, 2002.Google Scholar
  16. 16.
    Gu, J., and Liu, J. S., Bayesian biclustering of gene expression data. BMC Genomics 9(Suppl I):S4, 2008.CrossRefGoogle Scholar
  17. 17.
    Iam-On, N., and Boongoen, T., New soft subspace method to gene expression data clustering. In: Proceedings of IEEE-EMBS International Conference on Biomedical and Health Informatics, pp 984–987, 2012.Google Scholar
  18. 18.
    Iam-On, N., Boongoen, T., and Garrett, S., LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519, 2010.CrossRefGoogle Scholar
  19. 19.
    Jing, L., Ng, M. K., and Huang, J. Z., An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19(8):1026–1041, 2007.CrossRefGoogle Scholar
  20. 20.
    Joliffe, I., Principal component analysis. Springer: New York, 1986.Google Scholar
  21. 21.
    Khan, J. et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6):673–679, 2001.CrossRefGoogle Scholar
  22. 22.
    Kriegel, H. P., Kroger, P., and Zimek, A., Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. KDD 3(1):1–ex, 2009.Google Scholar
  23. 23.
    Laiho, P. et al., Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene 26(2):312–320, 2007.CrossRefGoogle Scholar
  24. 24.
    Ng, A., Jordan, M., and Weiss, Y., On spectral clustering: analysis and an algorithm. Advances in NIPS 14, 2001.Google Scholar
  25. 25.
    Nutt, C. et al., Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7):1602–1607, 2003.Google Scholar
  26. 26.
    Pomeroy, S. et al., Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442, 2002.CrossRefGoogle Scholar
  27. 27.
    Ramaswamy, S. et al., Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26):15149–15154, 2001.CrossRefGoogle Scholar
  28. 28.
    Shipp, M. A. et al., Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nat. Med. 8:68–74, 2002.CrossRefGoogle Scholar
  29. 29.
    Singh, D. et al., Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209, 2002.CrossRefGoogle Scholar
  30. 30.
    Spang, R., Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1:264–268, 2003.CrossRefGoogle Scholar
  31. 31.
    Strehl, A., and Ghosh, J., Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3:583–617, 2002.MathSciNetGoogle Scholar
  32. 32.
    Su, A. et al., Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61(20):7388–7393, 2001.Google Scholar
  33. 33.
    Wallqvist, A., Rabow, A., Shoemaker, R., Sausville, E., and Covell, D., Establishing connections between microarray expression data and chemotherapeutic cancer pharmacology. Mol. Cancer. Ther. 1:311–320, 2002.Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.School of Information TechnologyMae Fah Luang UniversityChiang RaiThailand
  2. 2.Department of Mathematics and Computer ScienceRoyal Thai Air Force AcademyBangkokThailand

Personalised recommendations