Skip to main content

Cluster Analysis of Microarray Data

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1986))

Abstract

The cluster analysis has been widely applied by researchers from several scientific fields over the last decades. Advances in knowledge of biological phenomena have revived a great interest in cluster analysis due in part to the large amount of microarray data. Traditional clustering algorithms show, apart from the need of user-defined parameters, clear limitations to handle microarray data owing to its inherent characteristics: high-dimensional-low-sample-sized, highly redundant, and noisy. That has motivated the study of clustering algorithms tailored to the task of analyzing microarray data, which currently continue being developed and adapted. The present chapter is devoted to review clustering methods with different cluster analysis approaches in the challenging context of microarray data. Furthermore, the validation of the clustering results is briefly discussed by means of validity indexes used to assess the goodness of the number of clusters and the induced cluster assignments.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Achcar F, Camadro JM, Mestivier D (2009) AutoClass@IJM: a powerful tool for Bayesian classification of heterogeneous data in biology. Nucleic Acids Res 37:W63-7. https://doi.org/10.1093/nar/gkp430

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. Chapman and Hall, Boca Raton

    Book  Google Scholar 

  3. Aghabozorgi S, Shirkhorshidi AS, Wah T (2015) Time-series clustering - a decade review. Inform Syst 53:16–38

    Article  Google Scholar 

  4. Agrawal R, Gehrke J, Gunopulos D et al (2005) Automatic subspace clustering of high dimensional data. Data Min Knowl Disc 11:5–33

    Article  Google Scholar 

  5. Ahmed HA, Mahanta P, Bhattacharyya DK et al (2011) Intersected coexpressed subcube miner: an effective triclustering algorithm. In: Proceedings WICT2011. https://doi.org/10.1109/WICT.2011.6141358

  6. Aittokallio T (2010) Dealing with missing values in large-scale studies: microarray data imputation and beyond. Brief Bioinform 11:253–264

    Article  CAS  PubMed  Google Scholar 

  7. Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Anand R, Ravichandran S, Chatterjee S (2016) A new method of finding groups of coexpressed genes and conditions of coexpression. BMC Bioinform 17:486. https://doi.org/10.1186/s12859-016-1356-3

    Article  Google Scholar 

  9. Ankerst M, Breunig MM, Kriegel H et al (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings ACM SIGMOD 99. https://doi.org/10.1145/304182.304187

  10. Bandyopadhyay S, Saha S, Maulik U et al (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12:269–283

    Article  Google Scholar 

  11. Bandyopadhyay S, Maulik U, Chakrabortya R (2013) Incorporating ɛ-dominance in AMOSA: application to multiobjective 0/1 knapsack problem and clustering gene expression data. Appl Soft Comput 13:2405–2411

    Article  Google Scholar 

  12. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–822

    Article  Google Scholar 

  13. Belacel N, Cuperlović-Culf M, Laflamme M et al (2004) Fuzzy J-means and VNS methods for clustering genes from microarray data. Bioinformatics 20:1690–1701

    Article  CAS  PubMed  Google Scholar 

  14. Belacel N, Wang Q, Cuperlović-Culf M (2006) Clustering methods for microarray gene expression data. OMICS 10:507–531

    Article  CAS  PubMed  Google Scholar 

  15. Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton

    Book  Google Scholar 

  16. Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6:281–297

    Article  CAS  PubMed  Google Scholar 

  17. Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin

    Google Scholar 

  18. Beyer K, Goldstein J, Ramakrishnan R et al (1999) When is nearest neighbor meaningful? In: Beeri C, Buneman P (eds) Proceedings ICDT 99. Springer, Berlin

    Google Scholar 

  19. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    Book  Google Scholar 

  20. Boutros PC, Okey AB (2005) Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data. Brief Bioinform 6:331–343

    Article  CAS  PubMed  Google Scholar 

  21. Brevern AG, Hazout S, Malpertuy A (2004) Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinform 5:114. https://doi.org/10.1186/1471-2105-5-114

    Article  CAS  Google Scholar 

  22. Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3:1–27

    Google Scholar 

  23. Castellanos-Garzón JA, Díaz F (2013) An evolutionary computational model applied to cluster analysis of DNA microarray data. Expert Syst Appl 40:2575–2591

    Article  Google Scholar 

  24. Cheng Y, Church GM (2000) Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 8:93–103

    CAS  PubMed  Google Scholar 

  25. Chipman H, Hastie TJ, Tibshirani R (2003) Clustering microarray data. In: Speed T (ed) Statistical analysis of gene expression microarray data. Chapman and Hall, Boca Raton

    Google Scholar 

  26. Chipman H, Tibshirani R (2006) Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7:286–301

    Article  PubMed  Google Scholar 

  27. Chiu CC, Chan SY, Wang CC et al (2013) Missing value imputation for microarray data: a comprehensive comparison study and a web tool. BMC Syst Biol 7(Suppl 6):S12. https://doi.org/10.1186/1752-0509-7-S6-S12

    Article  PubMed  PubMed Central  Google Scholar 

  28. Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, New York

    Google Scholar 

  29. Dash R, Misra BB (2018) Performance analysis of clustering techniques over microarray data: a case study. Phys A 493:162–176

    Article  Google Scholar 

  30. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227

    Article  CAS  PubMed  Google Scholar 

  31. Dempster, AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    Google Scholar 

  32. D’haeseleer P (2005) How does gene expression clustering work? Nature Biotech 23:1499–1501

    Article  CAS  Google Scholar 

  33. Do JH, Choi DK (2008) Clustering approaches to identifying gene expression patterns from DNA microarray data. Mol Cells 25:279–288

    CAS  PubMed  Google Scholar 

  34. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  35. Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for classifications of tumors using gene expression data. J Am Stat Assoc 97:77–87

    Article  CAS  Google Scholar 

  36. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57

    Article  Google Scholar 

  37. Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4:95–104

    Article  Google Scholar 

  38. Eisen MB, Spellman PT, Brown PO et al (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Eren K, Deveci M, Küçüktunç O et al (2013) A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinform 14:279–292

    Article  CAS  PubMed  Google Scholar 

  40. Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings KDD 96. AAAI Press, Menlo Park. https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf

  41. Faceli K, Carvalho A, Souto M (2007) Multi-objective clustering ensemble. Int J Hybrid Intell Syst 4:145–156

    Article  Google Scholar 

  42. Forti A, Foresti GL (2006) Growing hierarchical tree SOM: an unsupervised neural network with dynamic topology. Neural Netw 19:1568–1580

    Article  PubMed  Google Scholar 

  43. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631

    Article  Google Scholar 

  44. Franco M, Vivo JM (2018) Genetic algorithms for parameter estimation in modelling of index returns. Eur J Financ. https://doi.org/10.1080/1351847X.2017.1392332

    Article  Google Scholar 

  45. Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8:1. https://doi.org/10.1186/1471-2105-8-3

    Article  CAS  Google Scholar 

  46. Gentleman R, Ding B, Dudoit S et al (2005) Distance measures in DNA microarray data analysis. In: Gentleman R, Carey VJ, Huber W et al (eds) Bioinformatics and computational biology solutions using R and Bioconductor. Springer, New York

    Chapter  Google Scholar 

  47. Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci USA 97:12079–12084

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Gnanadesikan R, Harvey JW, Kettenring JR (1993) Mahalanobis metrics for cluster analysis. Sankhyā A 55:494–505

    Google Scholar 

  49. Goil S, Nagesh H, Choudhary A (1999) MAFIA: efficient and scalable subspace clustering for very large data sets. In: Proceedings 5th ACM SIGKDD 99. http://www.academia.edu/download/38278360/goil99mafia.pdf

  50. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  CAS  PubMed  Google Scholar 

  51. Gollub J, Sherlock G (2006) Clustering microarray data. In: Kimmel AR, Oliver B (eds) DNA microarrays: databases and statistics Part B. Academic Press, San Diego

    Google Scholar 

  52. Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inform Syst 25-345–366

    Article  Google Scholar 

  53. Guha S, Rastogi R, Shim K (2001) CURE: an efficient clustering algorithm for large databases. Inform Syst 26:35–58

    Article  Google Scholar 

  54. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufman, San Francisco

    Google Scholar 

  55. Handl J, Knowles J (2007) An evolutionary approach to multi-objective clustering. IEEE Trans Evol Comput 11:56–76

    Article  Google Scholar 

  56. Hartuv E, Schmitt A, Lange J et al (1999) An algorithm for clustering cDNAs for gene expression analysis. In: Proceedings 3rd RECOMB 99. https://doi.org/10.1145/299432.299483

  57. Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inform Proc Lett 76:175–181

    Article  Google Scholar 

  58. Hathaway RJ, Bezdek JC (1985) Local convergence of the fuzzy c-means algorithms. Pattern Recognit 19:477–480

    Article  Google Scholar 

  59. Hennig C (2007) Cluster-wise assessment of cluster stability. Comput Stat Data Anal 52:258–271

    Article  Google Scholar 

  60. Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17(2):126–136

    Article  CAS  PubMed  Google Scholar 

  61. Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9:1106–1115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings 4th KDD 98, vol 98, pp 58–65

    Google Scholar 

  63. Hsu AL, Tang S, Halgamuge SK (2003) An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19:2131–2140

    Article  CAS  PubMed  Google Scholar 

  64. Irigoien I, Mestres F, Arenas C (2013) The depth problem: identifying the most representative units in a data group. IEEE Trans Comput Biol Bioinform 10:161–172

    Article  Google Scholar 

  65. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323

    Article  Google Scholar 

  66. Jain AK, Dui RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22:4–37

    Article  Google Scholar 

  67. Jardine CJ, Jardine N, Sibson R (1967) The structure and construction of taxonomic hierarchies. Math Biosci 1:173–179

    Article  Google Scholar 

  68. Jaskowiak PA, Campello RJ, Costa IG (2014) On the selection of appropriate distances for gene expression data clustering. BMC Bioinform 15(S2):S2. https://doi.org/10.1186/1471-2105-15-S2-S2

    Article  Google Scholar 

  69. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16:1370–1384

    Article  Google Scholar 

  70. Jiang MRCTD, Pei J, Zhang A (2004) Mining coherent gene clusters from gene-sample-time microarray data. In: Proceedings 10th ACM SIGKDD 04. https://doi.org/10.1145/1014052.1014101

  71. Jiang H, Zhou S, Guan J et al (2006) gTRICLUSTER: a more general and effective 3D clustering algorithm for gene-sampletime microarray data. In: Proceedings BioDM06. Lecture notes in computer science, vol 3916. Springer, Berlin, pp 48–59

    Google Scholar 

  72. Kafieh R, Mehridehnavi A (2013) A comprehensive comparison of different clustering methods for reliability analysis of microarray data. J Med Signals Sens 3:22–30

    PubMed  PubMed Central  Google Scholar 

  73. Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75

    Article  Google Scholar 

  74. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Book  Google Scholar 

  75. Kennedy J, Eberhart RC (1999) Particle swarm optimization. In: Proceedings 1995 IEEE neural networks. https://doi.org/10.1109/ICNN.1995.488968

  76. Kerr G, Ruskin HJ, Crane M et al (2008) Techniques for clustering gene expression data. Comput Biol Med 38:283–293

    Article  CAS  PubMed  Google Scholar 

  77. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  78. Korte B, Vygen J (2006) Combinatorial optimization. Theory and algorithms, 3rd edn. Springer, Berlin

    Google Scholar 

  79. Krishna K, Murty M (1999) Genetic K-means algorithm. IEEE Trans Syst Man Cybern B 29:433–439

    Article  CAS  Google Scholar 

  80. Kumar L, Futschik ME (2007) Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2(1):5–7

    Article  PubMed  PubMed Central  Google Scholar 

  81. Liew AWC, Law NF, Yan H (2011) Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 12:498–513

    Article  PubMed  Google Scholar 

  82. Liu J, Pham TD (2011) Fuzzy clustering for microarray data analysis: a review. Curr Bioinform 6:427–443

    Article  CAS  Google Scholar 

  83. Liu R, Liu Y, Li Y (2012) An improved method for multi-objective clustering ensemble algorithm. In: Proceedings 2012 IEEE WCCI. https://doi.org/10.1109/CEC.2012.6252972

  84. Lord E, Willems M, Lapointe FJ et al (2017) Using the stability of objects to determine the number of clusters in datasets. Inform Sci 393:29–46

    Article  Google Scholar 

  85. Lu Y, Lu S, Deng Y et al (2004) Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinform 5:172. https://doi.org/10.1186/1471-2105-5-172

    Article  CAS  Google Scholar 

  86. Lu Y, Lu S, Fotouchi F et al (2004) FGKA: a fast genetic K-means clustering algorithm. In: Proceedings 2004 ACM SAC. https://doi.org/10.1145/967900.968029

  87. Luo F, Khan L, Bastani F et al (2004) A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles. Bioinformatics 20(16):2605–2617

    Article  CAS  PubMed  Google Scholar 

  88. Macnaughton-Smith P, Williams WT, Dale MB et al (1964) Dissimilarity analysis: a new technique of hierarchical sub-division. Nature 202:1034–1035

    Article  CAS  PubMed  Google Scholar 

  89. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings 5th Berkeley Symp Math Stat Prob. https://projecteuclid.org/download/pdf_1/euclid.bsmspp/1200512992

  90. Mahalanobis PC (1936) On the generalized distance in statistics. In: Proceedings of National Institute of Sciences of India. http://www.insa.nic.in/writereaddata/UpLoadedFiles/PINSA/Vol02_1936_1_Art05.pdf

  91. McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3):413–422

    Article  CAS  PubMed  Google Scholar 

  92. McNicholas PD (2016) Model-based clustering. J Classif 33:331–373

    Article  Google Scholar 

  93. McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26:2705–2712

    Article  CAS  PubMed  Google Scholar 

  94. Monti S, Tamayo P, Mesirov J et al (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118

    Article  Google Scholar 

  95. Murali TM, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 8:77–88

    Google Scholar 

  96. Ng RT, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016

    Article  Google Scholar 

  97. Oghabian A, Kilpinen S, Hautaniemi S et al (2014) Biclustering methods: biological relevance and application in gene expression analysis. PLoS One 9:e90801. https://doi.org/10.1371/journal.pone.0090801

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Oyelade J, Isewon I, Oladipupo F et al (2016) Clustering algorithms: their application to gene expression data. Bioinform Biol Insights 10:237–253

    PubMed  PubMed Central  Google Scholar 

  99. Pan W, Lin J, Le CT (2002) Model-based cluster analysis of microarray gene-expression data. Genome Biol 3(2):research0009.1-0009.8. http://genomebiology.com/2002/3/2/research/0009.1

  100. Parson L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. In: Proceedings 10th ACM SIGKDD. https://doi.org/10.1145/1007730.1007731

    Article  Google Scholar 

  101. Pascual-Marqui RD, Pascual-Montano AD, Kochi K et al (2001) Smoothly distributed fuzzy c-means: a new self-organizing map. Pattern Recognit 34:2395–2402

    Article  Google Scholar 

  102. Pizzuti C (2017) Evolutionary computation for community detection in networks: a review. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2017.2737600

    Article  Google Scholar 

  103. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Article  Google Scholar 

  104. Reiss DJ, Baliga NS, Bonneau R (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform 7:280–302

    Article  CAS  Google Scholar 

  105. Röttger R (2016) Clustering of biological datasets in the era of big data. J Integr Bioinform 13:300. https://doi.org/10.2390/biecoll-jib-2016-300

    Article  PubMed  Google Scholar 

  106. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  107. Roy S, Bhattacharyya DK (2007) Data clustering techniques - a review. In: Bhattacharyya DK, Hazarika SM (eds) Networks, security and soft computing: trends and future directions. Narosa Publishing House, New Delhi

    Google Scholar 

  108. Saini S, Rani P (2017) A survey on STING and CLIQUE grid based clustering methods. Int J Adv Res Comput Sci 8:1510–1512

    Google Scholar 

  109. Saxena S, Purushothaman S, Meghah V et al (2016) Role of annexin gene and its regulation during zebrafish caudal fin regeneration. Wound Repair Regen 24:551–559

    Article  PubMed  Google Scholar 

  110. Saxena A, Prasad M, Gupta A et al (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Article  Google Scholar 

  111. Shannon W, Culverhouse R, Duncan J (2003) Analyzing microarray data using cluster analysis. Pharmacogenomics 4:41–52

    Article  CAS  PubMed  Google Scholar 

  112. Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. Proc Int Conf Intell Syst Mol Biol 8:307–316

    CAS  PubMed  Google Scholar 

  113. Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings 24th VLDB98. http://www.vldb.org/conf/1998/p428.pdf

  114. Sheng Q, Moreau Y, De Smet F et al (2005) Advances in cluster analysis of microarray data. In: Azuje F, Dopazo J (eds) Data analysis and visualization in genomics and proteomics. Wiley, West Sussex

    Google Scholar 

  115. Shirkhorshidi AS, Aghabozoorgi S, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS One 10:e0144059. https://doi.org/10.1371/journal.pone.0144059

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  116. Sneath PHA, Sokal RR (1973) Numerical taxonomy. The principles and practice of numerical classification. Freeman, San Francisco

    Google Scholar 

  117. Steinbach M, Ertöz L, Kumar V (2004) The challenges of clustering high dimensional data. In: Wille LT (ed) New directions in statistical physics. Springer, Berlin

    Google Scholar 

  118. Strehl A, Ghosh J (2002) Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–618

    Google Scholar 

  119. Su M, Chang H (2001) A new model of self-organizing neural networks and its application in data projection. IEEE Trans Neural Netw 12:153–158

    Article  CAS  PubMed  Google Scholar 

  120. Tamayo P, Slonim D, Mesirov J et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96:2907–2912

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Pearson, Boston

    Google Scholar 

  122. Tomasec N, Radovanovic M (2016) Clustering evaluation in high-dimensional data. In: Celebi ME, Aydin K (eds) Unsupervised learning algorithms. Springer, Cham

    Google Scholar 

  123. Uma MS, Porkodi R (2016) A survey on clustering algorithm for microarray gene expression data. Int J Recent Innov Trends Comput Commun 4:335–341

    Google Scholar 

  124. Van der Lann MJ, Pollard KS, Bryan J (2003) A new partitioning around medoids algorithm. J Stat Comput Simul 73:575–584

    Article  Google Scholar 

  125. Vivo JM, Franco M, Vicari D (2018) Rethinking an ROC partial area index for evaluating the classification performance at a high specificity range. Adv Data Anal Classif 12:683–704. https://doi.org/10.107/s11634-017-0295-9

  126. Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings 23rd VLDB97. http://www.vldb.org/conf/1997/P186.pdf

  127. Wang Y, Miller DJ, Clarke R (2008) Approach to working in high-dimensional data spaces: genes expression microarrays. Br J Cancer 98:1023–1028

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244

    Article  Google Scholar 

  129. Wong L (2004) The practical bioinformatician. World Scientific, Singapore

    Book  Google Scholar 

  130. Xiao X, Dow ER, Eberhart R et al (2003) Gene clustering using self-organizing maps and particle swarm optimization. In: Proceedings 17th IPDPS. https://doi.org/10.1109/IPDPS.2003.1213290

  131. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2:165–193

    Article  Google Scholar 

  132. Yang J, Wang H, Wang W et al (2003) Enhanced biclustering on expression data. In: Proceedings 3rd IEEE BIBE 2003. https://doi.org/10.1109/BIBE.2003.1188969

  133. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings 20th ICML-2003. https://www.aaai.org/Papers/ICML/2003/ICML03-111.pdf

  134. Zahn CT (1971) Graph-theorical methods for detecting and describing gestalt cluster. IEEE Trans Comput C-20(1):68–86

    Article  Google Scholar 

  135. Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1:141–182

    Article  Google Scholar 

  136. Zhao L, Zaki MJ (2005) TriCluster: an effective algorithm for mining coherent clusters in 3D microarray data. In: Proceedings 2005 ACM SIGMOD. https://doi.org/10.1145/1066157.1066236

Download references

Acknowledgements

This work has been partially supported by Spanish Ministry of Economy and Competitiveness, and the European Regional Development Fund (ERDF) under grants TIN2014-53749-C2-2R and TIN2017-85949-C2-1-R.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juana-María Vivo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Franco, M., Vivo, JM. (2019). Cluster Analysis of Microarray Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9442-7_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9441-0

  • Online ISBN: 978-1-4939-9442-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics