Protocol

Computational Toxicology

Volume 930 of the series Methods in Molecular Biology pp 527-547

Date:

Principal Components Analysis

  • Detlef GrothAffiliated withAG Bioinformatics, University of Potsdam Email author 
  • , Stefanie HartmannAffiliated withAG Bioinformatics, University of Potsdam
  • , Sebastian KlieAffiliated withAG Bioinformatics, University of Potsdam
  • , Joachim SelbigAffiliated withAG Bioinformatics, University of Potsdam

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Principal components analysis (PCA) is a standard tool in multivariate data analysis to reduce the number of dimensions, while retaining as much as possible of the data’s variation. Instead of investigating thousands of original variables, the first few components containing the majority of the data’s variation are explored. The visualization and statistical analysis of these new variables, the principal components, can help to find similarities and differences between samples. Important original variables that are the major contributors to the first few components can be discovered as well.

This chapter seeks to deliver a conceptual understanding of PCA as well as a mathematical description. We describe how PCA can be used to analyze different datasets, and we include practical code examples. Possible shortcomings of the methodology and ways to overcome these problems are also discussed.

Key words

Principal components analysis Multivariate data analysis Metabolite profiling Codon usage Dimensionality reduction