Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)
- 1.2k Downloads
- 145 Citations
Abstract
This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.
Keywords
PCA PLS Hierarchical modeling Multivariate analysis Omics data analysisReferences
- 1.http://www.nobel.se/chemistry/laureates/2002/chemadv02.pdf (Cited 5 December 2003)
- 2.Lockhart DJ, Winzeler EA (2000) Nature 405:827–836CrossRefPubMedGoogle Scholar
- 3.Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: A Platform for Studying Drug Toxicity and Gene Function. Nat Rev 1:153–161CrossRefGoogle Scholar
- 4.Jackson JE (1991) A user’s guide to principal components. Wiley, New York (ISBN 0-471-62267-2)PubMedGoogle Scholar
- 5.Martens H, Naes T (1989) Multivariate calibration. Wiley, NY, ISBN 0-471-90979-3Google Scholar
- 6.Wold S, Esbensen K, Geladi P (1987) Chemom Intel Lab Syst 2:37–52CrossRefGoogle Scholar
- 7.Wold S, Albano C, Dunn WJ, Edlund U, Esbensen K, Geladi P, Hellberg S, Johansson E, Lindberg W, Sjöström M (1984) In: Kowalski BR (ed) Chemometrics: mathematics and statistics in chemistry, D. Reidel Publishing Company, DordrechtGoogle Scholar
- 8.Sjöström M, Wold S, Söderström B (1985) PLS Discriminant Plots. In: Proceedings of PARC in Practice, AmsterdamGoogle Scholar
- 9.Kalivas JH (1999) J Chemom 13:111–132CrossRefGoogle Scholar
- 10.Wold S, Johansson E, Cocchi M (1993) In: Kubinyi H (ed) 3D-QSAR in drug design, theory, methods, and applications. ESCOM Science Publishers, Leiden, pp 523–550Google Scholar
- 11.Burnham, AJ, Viveros R, MacGregor JF (1996) J Chemom 10:31–45CrossRefGoogle Scholar
- 12.Burnham, AJ, MacGregor JF, Viveros R (1999) Chemom Intel Lab Syst 48:167–180CrossRefGoogle Scholar
- 13.Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis—principles and applications. Umetrics AB. ISBN 91-973730-1-XCrossRefPubMedGoogle Scholar
- 14.Berglund A, De Rosa MC, Wold S (1997) J Comput Aid Mol Des 11:601–612CrossRefGoogle Scholar
- 15.Westerhuis J, Kourti T, MacGregor JF (1998) J Chemom 12:301–321CrossRefGoogle Scholar
- 16.Wold S, Kettaneh N, Tjessem K (1996) J Chemom 10:463–482CrossRefGoogle Scholar
- 17.Janné K, Pettersen J, Lindberg NO, Lundstedt T (2001) J Chemom 15:203–213CrossRefGoogle Scholar
- 18.Eriksson L, Johansson E, Lindgren F, Sjöström M, Wold S (2002) J Comput Aided Mol Des 16:711–726CrossRefPubMedGoogle Scholar
- 19.The data are taken from the web-site http://cellcycle-www.stanford.edu. Cited 25 March 2003
- 20.Spellman et al. (1998) Mol Biol Cell 9:3273–3297.PubMedGoogle Scholar
- 21.Cho RJ et al. (1998) Mol Cell 2:65–73CrossRefPubMedGoogle Scholar
- 22.Johansson D, Lindgren P (2002) Masters Thesis in Bioinformatics, Umeå University,Google Scholar
- 23.Espina JR, Shockcor JP, Herron WJ, Car BD, Contel NR, Ciaccio PJ, Lindon JC, Holmes E, Nicholson JK (2001) Magn Reson Chem 39:559–565CrossRefGoogle Scholar
- 24.Eriksson L, Antti H, Holmes E, Johansson E Multi- and Megavariate Data Analysis: Finding and Using Regularities in Metabonomics Data. In: Robertson DG (Ed) Toxicological metabonomics: the use of NMR spectroscopy and multivariate statistics in drug safety evaluation. Kluwer, DordrechtGoogle Scholar
- 25.Gunnarsson I, Andersson PM, Wikberg J, Lundstedt T (2003) J Chemom 17:82–92CrossRefGoogle Scholar
- 26.Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1988) J Med Chem 41:2481–2491CrossRefGoogle Scholar
- 27.Eriksson L, Andersson PM, Johansson E, Lundstedt T (2002) Statistical molecular design—a core concept in multivariate qsar and combinatorial technologies. Part I—Basic principles and application to lead optimization. Part II—QSAR applications. Part III—QSAR-directed virtual screening. Part IV—SMD: an integral part of combC and HTS. Part V—Some extensions and recent developments. http://www.acc.umu.se/%7Etnkjtg/chemometrics/editorial/. cited 19 December 2003
- 28.
- 29.Wold S (1978) Technometrics 20:397–405Google Scholar
- 30.Trygg J (2001) PhD Thesis. Umeå University,Google Scholar
- 31.Ståhle L, Wold S (1987) J Chemom 1:185–196Google Scholar
- 32.Barker M, Rayens W (2003) J Chemom 17:166–173CrossRefGoogle Scholar
- 33.Atif U, Earll, Eriksson L, Johansson E, Lord P, Margrett S (2002) Analysis of gene expression datasets using partial least-squares discriminant analysis and principal-component analysis. In: Martyn Ford, David Livingstone, John Dearden and Han Van de Waterbeemd (eds) Euro QSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell, Oxford, pp 369–373 ISBN 1-4051-2561-0.Google Scholar
- 34.Wold S, Trygg J, Berglund A, Antti H (2001) Chemom Intell Lab Syst 58:131–150CrossRefGoogle Scholar
- 35.Kristal BS (2002) Practical considerations and approaches for entry-level megavariate analysis. http://mickey.utmem.edu/papers/bioinformatics_02/pdfs/Kristal.pdf. cited 5 February 2004
- 36.Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New YorkGoogle Scholar
- 37.Eriksson L, Johansson E, Kettaneh-Wold N, Wikström C, Wold S (2000) Design of experiments—principles and applications. Umetrics AB, 2000. ISBN 91-973730-0-1PubMedGoogle Scholar
- 38.Olsson I, Gottfries J, Wold S, D-optimal onion design (DOOD) in statistical molecular design, chemometrics and intelligent laboratory systems. Chemom Intell Lab Syst 73:37–46Google Scholar
- 39.Eriksson L, Arnhold T, Beck B, Fox T, Johansson E, Kriegl JM (2004) Onion design and its application to a pharmaceutical QSAR problem. J Chemom 18:188–202CrossRefGoogle Scholar
- 40.Wold S, Antti H, Lindgren F, Öhman J (1998) Chemom Intell Lab Syst 44:175–185CrossRefGoogle Scholar
- 41.Trygg J, Wold S (1998) Chemom Intell Lab Syst 42:209–220CrossRefGoogle Scholar
- 42.Wold S, Kettaneh-Wold N, Skagerberg B (1989) Chemom Intell Lab Syst 7:53–65CrossRefGoogle Scholar
- 43.Wold S (1992) Chemom Intell Lab Syst 14:71–84CrossRefGoogle Scholar
- 44.Eriksson L, Johansson E, Lindgren F, Wold S (2000) Quant Struct Act Relat 19:345–355CrossRefGoogle Scholar
- 45.Berglund A, Wold S (1997) J Chemom 11:141–156CrossRefGoogle Scholar
- 46.Wold S, Hellberg S, Lundstedt T, Sjöström M, Wold H (1987) PLS modeling with latent variables in two or more dimensions. In: Proceedings Frankfurt PLS-meeting, SeptemberGoogle Scholar
- 47.Eriksson L, Damborsky J, Earll M, Johansson E, Trygg J, Wold S (2004) SAR & QSAR Env. Res. 15 ( In press)Google Scholar
- 48.Wold S, Kettaneh N, Fridén H, Holmberg A (1998) Chemom Intell Lab Syst 44:331–340CrossRefGoogle Scholar
- 49.Antti H, Bollard ME, Ebbels T, Keun H, Lindon JC, Nicholson JK, Holmes E (2002) J Chemom 16:461–468CrossRefGoogle Scholar
- 50.Wold S, Geladi P, Esbensen K, Öhman J (1987) J Chemom 1:41–56Google Scholar
- 51.Nomikos P, MacGregor JF (1995) Chemom Intell Lab Syst 30:97–108CrossRefGoogle Scholar