Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

  • Liliana López KleineEmail author
  • Rosa Montaño
  • Francisco Torres-Avilés
Part of the Methods in Molecular Biology book series (MIMB, volume 1375)


Gene expression data (microarrays and RNA-sequencing data) as well as other kinds of genomic data can be extracted from publicly available genomic data. Here, we explain how to apply multivariate cluster and classification methods on gene expression data. These methods have become very popular and are implemented in freely available software in order to predict the participation of gene products in a specific functional category of interest. Taking into account the availability of data and of these methods, every biological study should apply them in order to obtain knowledge on the organism studied and functional category of interest. A special emphasis is made on the nonlinear kernel classification methods.


Microarrays Functional prediction Multivariate data analysis Clustering Classification 


  1. 1.
    Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sinica 12(1):111–140Google Scholar
  2. 2.
    Moguerza JM, Muñoz A (2006) Support vector machines with applications. Statist Sci 21(3):299–426CrossRefGoogle Scholar
  3. 3.
    R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna,
  4. 4.
    López-Kleine L1, Molano N, Ospina L. Int J Bioinform Res Appl. 2013;9(3):285–300. doi:  10.1504/IJBRA.2013.053607. Using multivariate methods to infer knowledge from genomic data
  5. 5.
    López-Kleine L, Torres-Avilés F, Tejedor FH, Gordillo LA (2012) Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data. Appl Microbiol Biotechnol 93:2091–2098. doi: 10.1007/s00253-012-3917-3 CrossRefPubMedGoogle Scholar
  6. 6.
    López-Kleine L, Romeo J, Torres-Avilés F (2013) Gene functional prediction using clustering methods for the analysis of tomato microarray data. In: Mohamad MS et al (eds) 7th International conference on PACBB, AISC, vol 222, pp 1–6Google Scholar
  7. 7.
    Romeo JS, Torres-Avilés F, López-Kleine L (2013) Detection of influent virulence and resistance genes in microarray data through quasi likelihood modeling. Mol Genet Genomics 288(1–2):49–61. doi: 10.1007/s00438-012-0730-8 CrossRefPubMedGoogle Scholar
  8. 8.
    Huber W, von Heydebreck A, Sueltmann H, Poustka A, Vingron M (2003) Parameter estimation for the calibration and variance stabilization of microarray data. Stat Appl Genet Mol 2(1):Article 3Google Scholar
  9. 9.
    Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Hornik K, Gentry J, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Rencher AC, Christensen WF (2012) Methods of multivariate analysis, 3rd edn. Wiley, Hoboken, NJCrossRefGoogle Scholar
  11. 11.
    Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New YorkCrossRefGoogle Scholar
  12. 12.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefGoogle Scholar
  13. 13.
    Mojena R (1977) Hierarchical grouping methods and stopping rules: an evaluation. Comput J 20(4):359–363. doi: 10.1093/comjnl/20.4.359 CrossRefGoogle Scholar
  14. 14.
    Glenn W, Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRefGoogle Scholar
  15. 15.
    Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Statist 28:100–108CrossRefGoogle Scholar
  16. 16.
    Leiva-Valdebenito S, Torres-Avilés F (2010) A review of the most common partition algorithms in cluster analysis: a comparative study. Rev Colomb Estad 33(2):321–339Google Scholar
  17. 17.
    Kohonen T (1982) Self-organizing formation of topologically correct feature maps. Biol Cybern 43:59–69CrossRefGoogle Scholar
  18. 18.
    Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, New YorkCrossRefGoogle Scholar
  19. 19.
    Friedman JH (1989) Regularized discriminant analysis. JASA 84:165–175CrossRefGoogle Scholar
  20. 20.
    van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536CrossRefGoogle Scholar
  21. 21.
    Schölkopf B, Smola A (2002) Learning with Kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, CambridgeGoogle Scholar
  22. 22.
    Clarke B, Fokoué E, Zhang H (2009) Principles and theory for data mining and machine learning. Springer, New YorkCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Liliana López Kleine
    • 1
    Email author
  • Rosa Montaño
    • 2
  • Francisco Torres-Avilés
    • 2
  1. 1.Departamento de EstadísticaUniversidad Nacional de ColombiaBogotáColombia
  2. 2.Departamento de Matemática y Ciencia de la ComputaciónUniversidad de Santiago de ChileSantiagoChile

Personalised recommendations