Skip to main content

FCM for Gene Expression Bioinformatics Data

  • Conference paper
Book cover Contemporary Computing (IC3 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 40))

Included in the following conference series:

Abstract

Clustering analysis of data from DNA microarray hybridization studies is essential for evaluating and identifying biologically significant co-expressed genes. The K-means algorithm is one of the most widely used clustering technique. It attempts to solve the clustering problem by assigning each gene to a single cluster. However, in practice especially in case of Bioinformatics data, one gene can be found in many clusters simultaneously. To sort out this problem, Fuzzy C-means (FCM) clustering algorithm is applied to microarray data. Two pattern recognition data (IRIS and WBCD data) and thirteen microarray data is used to evaluate performance of K-means and Fuzzy C-means. Improvement of approx. 30 percent clustering accuracy is achieved in case of FCM compared to K-means algorithm. Extensive simulation results shows that the FCM clustering algorithm was able to provide the highest accuracy and generalization results compared to K-means clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, E.: The IRISes of the Gaspe Penisula. Bulletin of the American IRIS society 59, 2–5 (1939)

    Google Scholar 

  2. http://archive.ics.uci.edu/ml/datasets

  3. http://www.sciencemag.org/feature/data/984559.shl

  4. Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C.F., Trent, J.M., Staudt, L.M., Hudson Jr., J., Bogosk, M.S., et al.: The transcriptional program in the response of human fibroblast to serum. Science 283, 83–87 (1999)

    Article  CAS  PubMed  Google Scholar 

  5. Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2, 65–73 (1998)

    Article  CAS  PubMed  Google Scholar 

  6. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)

    Article  CAS  PubMed  Google Scholar 

  7. Doulaye, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)

    Article  Google Scholar 

  8. http://www.cse.buffalo.edu/faculty/azhang/Teaching/index.html

  9. http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

  10. Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by geneexpression profiling. Nature 43, 503–511 (2000)

    Article  Google Scholar 

  11. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286(5439), 531–537 (1999)

    Article  CAS  PubMed  Google Scholar 

  12. Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinomas Sub-classes. Proceedings of the National Academy of Sciences 98(24), 13790–13795 (2001)

    Article  CAS  Google Scholar 

  13. Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.-H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2) (2002)

    Google Scholar 

  14. http://www-igbmc.u-strasbg.fr/projets/fcm

  15. Hoshida, Y., Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11) (2007)

    Google Scholar 

  16. Mangasarian, O.L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)

    Google Scholar 

  17. DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M.: Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nature Genetics 14, 457–460 (1996)

    Article  CAS  PubMed  Google Scholar 

  18. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Science 96, 6745–6750 (1999)

    Article  CAS  Google Scholar 

  19. Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6, 281–297 (1999)

    Article  CAS  PubMed  Google Scholar 

  20. Eissen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Clustering analysis and display of genome wide expression patterns. Proceedings of the National Academy of Sciences 95, 14863–14868 (1998)

    Article  Google Scholar 

  21. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231 (1996)

    Google Scholar 

  22. Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)

    Google Scholar 

  23. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  24. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)

    Google Scholar 

  25. Selim, S.Z., Ismail, M.A.: K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87 (1984)

    Article  CAS  PubMed  Google Scholar 

  26. Spath, H.: Cluster Analysis Algorithms. Ellis Horwood, Chichester (1989)

    Google Scholar 

  27. Pal, N.R., Bedzek, J.C., Taso, E.C.K.: Generalized Clustering Networks and Kohonen’s Self- Organizing Scheme. IEEE Trans. on Neural Networks 3(4), 546–557 (1993)

    Google Scholar 

  28. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  Google Scholar 

  29. Liew, A.W.C., Yan, H., Yang, M., Chen, P.: Microarray Data Analysis. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, ch. 12, pp. 353–388. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  30. Liew, A.W.C., Yan, H., Yang, M.: Data Mining for Bioinformatics. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, ch. 4, pp. 63–116. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  31. Cheng, K.O., Law, N.F., Siu, W.C., Liew, A.W.C.: Identification of coherent patterns in gene expression data using an efficient bi-clustering algorithm and parallel coordinate visualization. BMC Bioinformatics 9(210) (2008), doi.10.1186/1471-2105-9-210

    Google Scholar 

  32. Gan, X., Liew, A.W.C., Yan, H.: Discovering biclusters in gene expression data based on high-dimensional linear geometries. BMC Bioinformatics 9(209) (2008), doi:10.1186/1471-2105-9-209

    Google Scholar 

  33. Yin, Z.H., Tang Yuangang, Y.G., Sun, F.C., Sun, Z.Q.: Fuzzy Clustering with Novel Separable Criterion. Tsinghua Science and Technology 11, 50–53 (2006)

    Article  Google Scholar 

  34. Liu, H.C., Yih, J.M., Liu, S.W.: Fuzzy C-mean Algorithm Based on Mahalanobis Distances and Better initial values. In: 12th International Conference on Fuzzy Theory & Technology, JCIS, Salt Lake City, Utah (2007)

    Google Scholar 

  35. Liu, H.C., Yih, J.M., Sheu, T.W., Liu, S.W.: A New Fuzzy Possibility Clustering Algorithms Based On Unsupervised Mahalanobis Distances. In: International Conference on Machine Learning and Cybernetics, Hong Kong, pp. 3939–3944 (2007)

    Google Scholar 

  36. Tang, Y., Zhang, Y.-Q., Huang, Z.: FCM-SVM-RFE Gene Feature Selection Algorithm for Leukemia Classification from Microarray Gene Expression Data. In: The 14th IEEE International Conference on Fuzzy Systems (FUZZ 2005), pp. 97–101 (2005)

    Google Scholar 

  37. Wang, W., Wang, C., Cui, X., Wang, A.: A Clustering Algorithm Combine the FCM algorithm with Supervised Learning Normal Mixture Model. In: The 19th IEEE International Conference on pattern Recognition (ICPR 2008), December 2008, pp. 1–4 (2008)

    Google Scholar 

  38. Bezdek, J.C., Pal, N.R.: Some New Indexes of Cluster Validity. IEEE Transactions Systs., Man Cyberns. 28, 301–315 (1998)

    Article  CAS  Google Scholar 

  39. Pal, S.K., Bandyopadhyay, S., Ray, S.S.: Evolutionary Computation in Bioinformatics: A Review. IEEE Transactions on Systems, Man, And Cybernetics—Part C: Applications And Reviews 36(5), 601–615 (2006)

    Article  Google Scholar 

  40. Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE 16(11) (November 2004)

    Google Scholar 

  41. Dhiraj, K., Rath, S.K.: SA-kmeans: A Novel Data Mining Approach to Identifying and Validating Gene Expression Data. In: SPIT-IEEE International conference and colloquium, Mumbai, India, vol. 4, pp. 107–112 (2008)

    Google Scholar 

  42. Dhiraj, K., Rath, S.K.: Gene Expression Analysis using Clustering. In: Third IEEE International Conference on Bioinformatics and Biomedical Engineering, to be held on June 11th to 13th in Beijing, China (2009) ISBN: 978-1-4244-2902-8

    Google Scholar 

  43. Dhiraj, K., Rath, S.K.: Family of Genetic Algorithm Based Clustering Algorithm for Pattern Recognition. In: 1st IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence, to be held on June 6th to 7th in IIM Ahmedabad, INDIA (2009)

    Google Scholar 

  44. Dhiraj, K., Rath, S.K.: Comparison of SGA and RGA based clustering algorithm for pattern recognition. International Journal of Recent Trends in Engineering 1(1) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dhiraj, K., Rath, S.K., Babu, K.S. (2009). FCM for Gene Expression Bioinformatics Data. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03547-0_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03546-3

  • Online ISBN: 978-3-642-03547-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics