Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

Kleine, Liliana López; Montaño, Rosa; Torres-Avilés, Francisco

doi:10.1007/7651_2015_240

Liliana López Kleine³,
Rosa Montaño⁴ &
Francisco Torres-Avilés⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1375))

1958 Accesses

Abstract

Gene expression data (microarrays and RNA-sequencing data) as well as other kinds of genomic data can be extracted from publicly available genomic data. Here, we explain how to apply multivariate cluster and classification methods on gene expression data. These methods have become very popular and are implemented in freely available software in order to predict the participation of gene products in a specific functional category of interest. Taking into account the availability of data and of these methods, every biological study should apply them in order to obtain knowledge on the organism studied and functional category of interest. A special emphasis is made on the nonlinear kernel classification methods.

An erratum of the original chapter can be found under DOI 10.1007/7651_2015_256

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sinica 12(1):111–140
Google Scholar
Moguerza JM, Muñoz A (2006) Support vector machines with applications. Statist Sci 21(3):299–426
Article Google Scholar
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, http://www.R-project.org/
López-Kleine L1, Molano N, Ospina L. Int J Bioinform Res Appl. 2013;9(3):285–300. doi: 10.1504/IJBRA.2013.053607. Using multivariate methods to infer knowledge from genomic data
López-Kleine L, Torres-Avilés F, Tejedor FH, Gordillo LA (2012) Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data. Appl Microbiol Biotechnol 93:2091–2098. doi:10.1007/s00253-012-3917-3
Article PubMed Google Scholar
López-Kleine L, Romeo J, Torres-Avilés F (2013) Gene functional prediction using clustering methods for the analysis of tomato microarray data. In: Mohamad MS et al (eds) 7th International conference on PACBB, AISC, vol 222, pp 1–6
Google Scholar
Romeo JS, Torres-Avilés F, López-Kleine L (2013) Detection of influent virulence and resistance genes in microarray data through quasi likelihood modeling. Mol Genet Genomics 288(1–2):49–61. doi:10.1007/s00438-012-0730-8
Article CAS PubMed Google Scholar
Huber W, von Heydebreck A, Sueltmann H, Poustka A, Vingron M (2003) Parameter estimation for the calibration and variance stabilization of microarray data. Stat Appl Genet Mol 2(1):Article 3
Google Scholar
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Hornik K, Gentry J, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80
Article PubMed PubMed Central Google Scholar
Rencher AC, Christensen WF (2012) Methods of multivariate analysis, 3rd edn. Wiley, Hoboken, NJ
Book Google Scholar
Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York
Book Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Mojena R (1977) Hierarchical grouping methods and stopping rules: an evaluation. Comput J 20(4):359–363. doi:10.1093/comjnl/20.4.359
Article Google Scholar
Glenn W, Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179
Article Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Statist 28:100–108
Article Google Scholar
Leiva-Valdebenito S, Torres-Avilés F (2010) A review of the most common partition algorithms in cluster analysis: a comparative study. Rev Colomb Estad 33(2):321–339
Google Scholar
Kohonen T (1982) Self-organizing formation of topologically correct feature maps. Biol Cybern 43:59–69
Article Google Scholar
Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, New York
Book Google Scholar
Friedman JH (1989) Regularized discriminant analysis. JASA 84:165–175
Article Google Scholar
van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
Article Google Scholar
Schölkopf B, Smola A (2002) Learning with Kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge
Google Scholar
Clarke B, Fokoué E, Zhang H (2009) Principles and theory for data mining and machine learning. Springer, New York
Book Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estadística, Universidad Nacional de Colombia, Edificio 404, Oficina 342, Carreara 45 No 26-28, Bogotá, DC, Colombia
Liliana López Kleine
Departamento de Matemática y Ciencia de la Computación, Universidad de Santiago de Chile, Santiago, Chile
Rosa Montaño & Francisco Torres-Avilés

Authors

Liliana López Kleine
View author publications
You can also search for this author in PubMed Google Scholar
Rosa Montaño
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Torres-Avilés
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liliana López Kleine .

Editor information

Editors and Affiliations

Department of Surgical and Medical Sciences, University “Magna Græcia” of Catanzaro, Catanzaro, Italy
Pietro Hiram Guzzi

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kleine, L.L., Montaño, R., Torres-Avilés, F. (2015). Classification and Clustering on Microarray Data for Gene Functional Prediction Using R. In: Guzzi, P. (eds) Microarray Data Analysis. Methods in Molecular Biology, vol 1375. Humana Press, New York, NY. https://doi.org/10.1007/7651_2015_240

Download citation

DOI: https://doi.org/10.1007/7651_2015_240
Published: 12 March 2015
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3172-9
Online ISBN: 978-1-4939-3173-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics