Skip to main content

A Weighted Principal Component Analysis, WPCA1; Application to Gene Expression Data

  • Chapter
  • First Online:
Rankings and Preferences

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

  • 1305 Accesses

Abstract

In this chapter, we describe in the first part new developments in Principal Component Analysis (PCA) Jolliffe, Principal Component Analysis, 2002, [42] and in the second part a new method to select variables. The focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. This kind of data, which contains the expression levels of a large number of genes (variables), measured simultaneously, for a relatively much smaller number of tissue samples, presents many statistical challenges. The usual PCA is not appropriate to deal with this kind of problem. In this context, we propose the use of a weighted correlation coefficient as an alternative to Pearson’s. This leads to a so-called weighted PCA (WPCA1). In order to illustrate the features of this WPCA1 and compare it with the usual PCA, we consider the problem of analyzing gene expression datasets. In the second part of this chapter, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray dataset. We show that this algorithm produces better results when WPCA1 is used instead of the usual PCA. Furthermore, using the well-known supervised classification method of Support Vector Machines, we show that this algorithm can also compete with the Significance Analysis of Microarrays (SAM) supervised algorithm, Tibshirani et al., Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS, 10, 1999, [97] and Tusher et al., Proc Nat Acad Sci, 98:5116–5121, 2001, [98].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.lsi.us.es/~aguilar/datasets.html, http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joaquim Pinto da Costa .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Author(s)

About this chapter

Cite this chapter

Pinto da Costa, J. (2015). A Weighted Principal Component Analysis, WPCA1; Application to Gene Expression Data. In: Rankings and Preferences. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48344-2_4

Download citation

Publish with us

Policies and ethics