Skip to main content

Identification of Relevant Genes from Microarray Experiments based on Partial Least Squares Weights: Application to Cancer Genomics

  • Chapter
  • First Online:
Computational Biology

Abstract

In microarray genomics expression data obtained from hybridization of different cancer tissue samples or samples from cancer and normal cellular conditions, it is of interest to identify differentially expressed genes for follow-up validation studies. One approach to the analysis of genomics expression data is to first reduce the dimensionality using partial least squares (PLS), which has been useful in various cancer microarray data applications (Nguyen and Rocke, Bioinformatics 18:39–50, 2002a). PLS involves reducing the dimensionality of the gene expression data matrix by taking linear combinations of the genes/predictors (referred to as PLS components). However, the weights assigned to each gene and at each dimension are nonlinear functions of both the genes and outcome/response variable, making analytical studies difficult. In this paper, we propose a new measure for identifying relevant genes based on PLS weights called random augmented variance influence on projection (RA-VIP). We compare the relative performance of RA-VIP in terms of its sensitivity and specificity for identifying truly informative (differentially expressed) genes to two previously suggested heuristic measures, both based on PLS weights, namely the variable influence on the projection (VIP) and the PLS regression B-coefficient (denoted B-PLS). The methods are compared using simulation studies. We further illustrate the proposed RA-VIP measure on two microarray cancer genomics data sets involving acute leukemia samples and colon cancer and normal tissues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 179.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 229.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alon U, Barkai B, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750

    Article  PubMed  CAS  Google Scholar 

  • Boulesteix A (2004) PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol 3:Article 33

    Google Scholar 

  • Boulesteix A, Strimmer K (2006) Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform 8:33–44

    Article  CAS  Google Scholar 

  • Golub TR, Slonin DK, Tamayo P et al (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  PubMed  CAS  Google Scholar 

  • Helland IS (1988) On the structure of partial least squares. Commun Stat Simul Comput 17: 581–607

    Article  Google Scholar 

  • Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264

    Article  PubMed  Google Scholar 

  • Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA 98:31–36

    Article  PubMed  CAS  Google Scholar 

  • Nguyen DV, Rocke DM (2002a) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50

    Article  PubMed  CAS  Google Scholar 

  • Nguyen DV, Rocke DM (2002b) Multiclass cancer classification via partial least squares with gene expression profiles. Bioinformatics 18:1216–1226

    Article  PubMed  CAS  Google Scholar 

  • Nguyen DV, Rocke DM (2004) On partial least squares dimension reduction for microarray-based classification: A simulation study. Comput Stat Data Anal 46:407–425

    Article  Google Scholar 

  • SAS Institute, Inc. SAS/STAT User’s Guide (1999) Cary, NC

    Google Scholar 

  • Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah PR (ed) Multivariate analysis. Academic, New York

    Google Scholar 

  • Wold S, Johansson E, Cocchi M (1993) PLS-partial least-squares projections to latent structures. In: Kubinyi H. (ed) 3D QSAR in drug design: Theory, methods and applications. ESCOM, Leiden, Holland

    Google Scholar 

  • Zhou L, Rocke DM (2005) An expression index for Affymetrix GeneChips based on generalized logarithm. Bioinformatics 21:3983–3989

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

Support for this work includes the National Institute of Health (NIH) grants UL1DEO19583, RL1AG032119, RL1AG032115, HD036071, and UL1RR024146. This publication was also made possible by Grant Number UL1 RR024146 from the National Center for Research for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danh V. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Chen, Y., Nguyen, D.V. (2009). Identification of Relevant Genes from Microarray Experiments based on Partial Least Squares Weights: Application to Cancer Genomics. In: Pham, T. (eds) Computational Biology. Applied Bioinformatics and Biostatistics in Cancer Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0811-7_1

Download citation

Publish with us

Policies and ethics