Abstract
In microarray genomics expression data obtained from hybridization of different cancer tissue samples or samples from cancer and normal cellular conditions, it is of interest to identify differentially expressed genes for follow-up validation studies. One approach to the analysis of genomics expression data is to first reduce the dimensionality using partial least squares (PLS), which has been useful in various cancer microarray data applications (Nguyen and Rocke, Bioinformatics 18:39–50, 2002a). PLS involves reducing the dimensionality of the gene expression data matrix by taking linear combinations of the genes/predictors (referred to as PLS components). However, the weights assigned to each gene and at each dimension are nonlinear functions of both the genes and outcome/response variable, making analytical studies difficult. In this paper, we propose a new measure for identifying relevant genes based on PLS weights called random augmented variance influence on projection (RA-VIP). We compare the relative performance of RA-VIP in terms of its sensitivity and specificity for identifying truly informative (differentially expressed) genes to two previously suggested heuristic measures, both based on PLS weights, namely the variable influence on the projection (VIP) and the PLS regression B-coefficient (denoted B-PLS). The methods are compared using simulation studies. We further illustrate the proposed RA-VIP measure on two microarray cancer genomics data sets involving acute leukemia samples and colon cancer and normal tissues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alon U, Barkai B, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750
Boulesteix A (2004) PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol 3:Article 33
Boulesteix A, Strimmer K (2006) Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform 8:33–44
Golub TR, Slonin DK, Tamayo P et al (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537
Helland IS (1988) On the structure of partial least squares. Commun Stat Simul Comput 17: 581–607
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264
Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA 98:31–36
Nguyen DV, Rocke DM (2002a) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50
Nguyen DV, Rocke DM (2002b) Multiclass cancer classification via partial least squares with gene expression profiles. Bioinformatics 18:1216–1226
Nguyen DV, Rocke DM (2004) On partial least squares dimension reduction for microarray-based classification: A simulation study. Comput Stat Data Anal 46:407–425
SAS Institute, Inc. SAS/STAT User’s Guide (1999) Cary, NC
Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah PR (ed) Multivariate analysis. Academic, New York
Wold S, Johansson E, Cocchi M (1993) PLS-partial least-squares projections to latent structures. In: Kubinyi H. (ed) 3D QSAR in drug design: Theory, methods and applications. ESCOM, Leiden, Holland
Zhou L, Rocke DM (2005) An expression index for Affymetrix GeneChips based on generalized logarithm. Bioinformatics 21:3983–3989
Acknowledgments
Support for this work includes the National Institute of Health (NIH) grants UL1DEO19583, RL1AG032119, RL1AG032115, HD036071, and UL1RR024146. This publication was also made possible by Grant Number UL1 RR024146 from the National Center for Research for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Chen, Y., Nguyen, D.V. (2009). Identification of Relevant Genes from Microarray Experiments based on Partial Least Squares Weights: Application to Cancer Genomics. In: Pham, T. (eds) Computational Biology. Applied Bioinformatics and Biostatistics in Cancer Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0811-7_1
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0811-7_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-0810-0
Online ISBN: 978-1-4419-0811-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)