FCM for Gene Expression Bioinformatics Data

  • Kumar Dhiraj
  • Santanu Kumar Rath
  • Korra Sathya Babu
Part of the Communications in Computer and Information Science book series (CCIS, volume 40)

Abstract

Clustering analysis of data from DNA microarray hybridization studies is essential for evaluating and identifying biologically significant co-expressed genes. The K-means algorithm is one of the most widely used clustering technique. It attempts to solve the clustering problem by assigning each gene to a single cluster. However, in practice especially in case of Bioinformatics data, one gene can be found in many clusters simultaneously. To sort out this problem, Fuzzy C-means (FCM) clustering algorithm is applied to microarray data. Two pattern recognition data (IRIS and WBCD data) and thirteen microarray data is used to evaluate performance of K-means and Fuzzy C-means. Improvement of approx. 30 percent clustering accuracy is achieved in case of FCM compared to K-means algorithm. Extensive simulation results shows that the FCM clustering algorithm was able to provide the highest accuracy and generalization results compared to K-means clustering algorithm.

Keywords

K-means clustering Fuzzy c-means Microarray Cancer data Gene Expression Analysis Bioinformatics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson, E.: The IRISes of the Gaspe Penisula. Bulletin of the American IRIS society 59, 2–5 (1939)Google Scholar
  2. 2.
  3. 3.
  4. 4.
    Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C.F., Trent, J.M., Staudt, L.M., Hudson Jr., J., Bogosk, M.S., et al.: The transcriptional program in the response of human fibroblast to serum. Science 283, 83–87 (1999)CrossRefPubMedGoogle Scholar
  5. 5.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2, 65–73 (1998)CrossRefPubMedGoogle Scholar
  6. 6.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)CrossRefPubMedGoogle Scholar
  7. 7.
    Doulaye, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)CrossRefGoogle Scholar
  8. 8.
  9. 9.
  10. 10.
    Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by geneexpression profiling. Nature 43, 503–511 (2000)CrossRefGoogle Scholar
  11. 11.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286(5439), 531–537 (1999)CrossRefPubMedGoogle Scholar
  12. 12.
    Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinomas Sub-classes. Proceedings of the National Academy of Sciences 98(24), 13790–13795 (2001)CrossRefGoogle Scholar
  13. 13.
    Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.-H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2) (2002)Google Scholar
  14. 14.
  15. 15.
    Hoshida, Y., Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11) (2007)Google Scholar
  16. 16.
    Mangasarian, O.L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)Google Scholar
  17. 17.
    DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M.: Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nature Genetics 14, 457–460 (1996)CrossRefPubMedGoogle Scholar
  18. 18.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Science 96, 6745–6750 (1999)CrossRefGoogle Scholar
  19. 19.
    Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6, 281–297 (1999)CrossRefPubMedGoogle Scholar
  20. 20.
    Eissen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Clustering analysis and display of genome wide expression patterns. Proceedings of the National Academy of Sciences 95, 14863–14868 (1998)CrossRefGoogle Scholar
  21. 21.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231 (1996)Google Scholar
  22. 22.
    Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)Google Scholar
  23. 23.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)Google Scholar
  24. 24.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)Google Scholar
  25. 25.
    Selim, S.Z., Ismail, M.A.: K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87 (1984)CrossRefPubMedGoogle Scholar
  26. 26.
    Spath, H.: Cluster Analysis Algorithms. Ellis Horwood, Chichester (1989)Google Scholar
  27. 27.
    Pal, N.R., Bedzek, J.C., Taso, E.C.K.: Generalized Clustering Networks and Kohonen’s Self- Organizing Scheme. IEEE Trans. on Neural Networks 3(4), 546–557 (1993)Google Scholar
  28. 28.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)CrossRefGoogle Scholar
  29. 29.
    Liew, A.W.C., Yan, H., Yang, M., Chen, P.: Microarray Data Analysis. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, ch. 12, pp. 353–388. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Liew, A.W.C., Yan, H., Yang, M.: Data Mining for Bioinformatics. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, ch. 4, pp. 63–116. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  31. 31.
    Cheng, K.O., Law, N.F., Siu, W.C., Liew, A.W.C.: Identification of coherent patterns in gene expression data using an efficient bi-clustering algorithm and parallel coordinate visualization. BMC Bioinformatics 9(210) (2008), doi.10.1186/1471-2105-9-210Google Scholar
  32. 32.
    Gan, X., Liew, A.W.C., Yan, H.: Discovering biclusters in gene expression data based on high-dimensional linear geometries. BMC Bioinformatics 9(209) (2008), doi:10.1186/1471-2105-9-209Google Scholar
  33. 33.
    Yin, Z.H., Tang Yuangang, Y.G., Sun, F.C., Sun, Z.Q.: Fuzzy Clustering with Novel Separable Criterion. Tsinghua Science and Technology 11, 50–53 (2006)CrossRefGoogle Scholar
  34. 34.
    Liu, H.C., Yih, J.M., Liu, S.W.: Fuzzy C-mean Algorithm Based on Mahalanobis Distances and Better initial values. In: 12th International Conference on Fuzzy Theory & Technology, JCIS, Salt Lake City, Utah (2007)Google Scholar
  35. 35.
    Liu, H.C., Yih, J.M., Sheu, T.W., Liu, S.W.: A New Fuzzy Possibility Clustering Algorithms Based On Unsupervised Mahalanobis Distances. In: International Conference on Machine Learning and Cybernetics, Hong Kong, pp. 3939–3944 (2007)Google Scholar
  36. 36.
    Tang, Y., Zhang, Y.-Q., Huang, Z.: FCM-SVM-RFE Gene Feature Selection Algorithm for Leukemia Classification from Microarray Gene Expression Data. In: The 14th IEEE International Conference on Fuzzy Systems (FUZZ 2005), pp. 97–101 (2005)Google Scholar
  37. 37.
    Wang, W., Wang, C., Cui, X., Wang, A.: A Clustering Algorithm Combine the FCM algorithm with Supervised Learning Normal Mixture Model. In: The 19th IEEE International Conference on pattern Recognition (ICPR 2008), December 2008, pp. 1–4 (2008)Google Scholar
  38. 38.
    Bezdek, J.C., Pal, N.R.: Some New Indexes of Cluster Validity. IEEE Transactions Systs., Man Cyberns. 28, 301–315 (1998)CrossRefGoogle Scholar
  39. 39.
    Pal, S.K., Bandyopadhyay, S., Ray, S.S.: Evolutionary Computation in Bioinformatics: A Review. IEEE Transactions on Systems, Man, And Cybernetics—Part C: Applications And Reviews 36(5), 601–615 (2006)CrossRefGoogle Scholar
  40. 40.
    Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE 16(11) (November 2004)Google Scholar
  41. 41.
    Dhiraj, K., Rath, S.K.: SA-kmeans: A Novel Data Mining Approach to Identifying and Validating Gene Expression Data. In: SPIT-IEEE International conference and colloquium, Mumbai, India, vol. 4, pp. 107–112 (2008)Google Scholar
  42. 42.
    Dhiraj, K., Rath, S.K.: Gene Expression Analysis using Clustering. In: Third IEEE International Conference on Bioinformatics and Biomedical Engineering, to be held on June 11th to 13th in Beijing, China (2009) ISBN: 978-1-4244-2902-8Google Scholar
  43. 43.
    Dhiraj, K., Rath, S.K.: Family of Genetic Algorithm Based Clustering Algorithm for Pattern Recognition. In: 1st IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence, to be held on June 6th to 7th in IIM Ahmedabad, INDIA (2009)Google Scholar
  44. 44.
    Dhiraj, K., Rath, S.K.: Comparison of SGA and RGA based clustering algorithm for pattern recognition. International Journal of Recent Trends in Engineering 1(1) (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kumar Dhiraj
    • 1
  • Santanu Kumar Rath
    • 1
  • Korra Sathya Babu
    • 1
  1. 1.Dept of Computer science and EngineeringNational Institute of Technology RourkelaIndia

Personalised recommendations