Skip to main content
Log in

Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)

  • Review
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. http://www.nobel.se/chemistry/laureates/2002/chemadv02.pdf (Cited 5 December 2003)

  2. Lockhart DJ, Winzeler EA (2000) Nature 405:827–836

    Article  CAS  PubMed  Google Scholar 

  3. Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: A Platform for Studying Drug Toxicity and Gene Function. Nat Rev 1:153–161

    Article  CAS  Google Scholar 

  4. Jackson JE (1991) A user’s guide to principal components. Wiley, New York (ISBN 0-471-62267-2)

    CAS  PubMed  Google Scholar 

  5. Martens H, Naes T (1989) Multivariate calibration. Wiley, NY, ISBN 0-471-90979-3

    Google Scholar 

  6. Wold S, Esbensen K, Geladi P (1987) Chemom Intel Lab Syst 2:37–52

    Article  CAS  Google Scholar 

  7. Wold S, Albano C, Dunn WJ, Edlund U, Esbensen K, Geladi P, Hellberg S, Johansson E, Lindberg W, Sjöström M (1984) In: Kowalski BR (ed) Chemometrics: mathematics and statistics in chemistry, D. Reidel Publishing Company, Dordrecht

    Google Scholar 

  8. Sjöström M, Wold S, Söderström B (1985) PLS Discriminant Plots. In: Proceedings of PARC in Practice, Amsterdam

  9. Kalivas JH (1999) J Chemom 13:111–132

    Article  CAS  Google Scholar 

  10. Wold S, Johansson E, Cocchi M (1993) In: Kubinyi H (ed) 3D-QSAR in drug design, theory, methods, and applications. ESCOM Science Publishers, Leiden, pp 523–550

    Google Scholar 

  11. Burnham, AJ, Viveros R, MacGregor JF (1996) J Chemom 10:31–45

    Article  CAS  Google Scholar 

  12. Burnham, AJ, MacGregor JF, Viveros R (1999) Chemom Intel Lab Syst 48:167–180

    Article  CAS  Google Scholar 

  13. Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis—principles and applications. Umetrics AB. ISBN 91-973730-1-X

    Book  CAS  PubMed  Google Scholar 

  14. Berglund A, De Rosa MC, Wold S (1997) J Comput Aid Mol Des 11:601–612

    Article  CAS  Google Scholar 

  15. Westerhuis J, Kourti T, MacGregor JF (1998) J Chemom 12:301–321

    Article  CAS  Google Scholar 

  16. Wold S, Kettaneh N, Tjessem K (1996) J Chemom 10:463–482

    Article  CAS  Google Scholar 

  17. Janné K, Pettersen J, Lindberg NO, Lundstedt T (2001) J Chemom 15:203–213

    Article  Google Scholar 

  18. Eriksson L, Johansson E, Lindgren F, Sjöström M, Wold S (2002) J Comput Aided Mol Des 16:711–726

    Article  CAS  PubMed  Google Scholar 

  19. The data are taken from the web-site http://cellcycle-www.stanford.edu. Cited 25 March 2003

  20. Spellman et al. (1998) Mol Biol Cell 9:3273–3297.

    CAS  PubMed  Google Scholar 

  21. Cho RJ et al. (1998) Mol Cell 2:65–73

    Article  CAS  PubMed  Google Scholar 

  22. Johansson D, Lindgren P (2002) Masters Thesis in Bioinformatics, Umeå University,

  23. Espina JR, Shockcor JP, Herron WJ, Car BD, Contel NR, Ciaccio PJ, Lindon JC, Holmes E, Nicholson JK (2001) Magn Reson Chem 39:559–565

    Article  CAS  Google Scholar 

  24. Eriksson L, Antti H, Holmes E, Johansson E Multi- and Megavariate Data Analysis: Finding and Using Regularities in Metabonomics Data. In: Robertson DG (Ed) Toxicological metabonomics: the use of NMR spectroscopy and multivariate statistics in drug safety evaluation. Kluwer, Dordrecht

    Google Scholar 

  25. Gunnarsson I, Andersson PM, Wikberg J, Lundstedt T (2003) J Chemom 17:82–92

    Article  CAS  Google Scholar 

  26. Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1988) J Med Chem 41:2481–2491

    Article  Google Scholar 

  27. Eriksson L, Andersson PM, Johansson E, Lundstedt T (2002) Statistical molecular design—a core concept in multivariate qsar and combinatorial technologies. Part I—Basic principles and application to lead optimization. Part II—QSAR applications. Part III—QSAR-directed virtual screening. Part IV—SMD: an integral part of combC and HTS. Part V—Some extensions and recent developments. http://www.acc.umu.se/%7Etnkjtg/chemometrics/editorial/. cited 19 December 2003

  28. http://www.umetrics.com

  29. Wold S (1978) Technometrics 20:397–405

    Google Scholar 

  30. Trygg J (2001) PhD Thesis. Umeå University,

  31. Ståhle L, Wold S (1987) J Chemom 1:185–196

    Google Scholar 

  32. Barker M, Rayens W (2003) J Chemom 17:166–173

    Article  CAS  Google Scholar 

  33. Atif U, Earll, Eriksson L, Johansson E, Lord P, Margrett S (2002) Analysis of gene expression datasets using partial least-squares discriminant analysis and principal-component analysis. In: Martyn Ford, David Livingstone, John Dearden and Han Van de Waterbeemd (eds) Euro QSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell, Oxford, pp 369–373 ISBN 1-4051-2561-0.

  34. Wold S, Trygg J, Berglund A, Antti H (2001) Chemom Intell Lab Syst 58:131–150

    Article  CAS  Google Scholar 

  35. Kristal BS (2002) Practical considerations and approaches for entry-level megavariate analysis. http://mickey.utmem.edu/papers/bioinformatics_02/pdfs/Kristal.pdf. cited 5 February 2004

  36. Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New York

    Google Scholar 

  37. Eriksson L, Johansson E, Kettaneh-Wold N, Wikström C, Wold S (2000) Design of experiments—principles and applications. Umetrics AB, 2000. ISBN 91-973730-0-1

    CAS  PubMed  Google Scholar 

  38. Olsson I, Gottfries J, Wold S, D-optimal onion design (DOOD) in statistical molecular design, chemometrics and intelligent laboratory systems. Chemom Intell Lab Syst 73:37–46

  39. Eriksson L, Arnhold T, Beck B, Fox T, Johansson E, Kriegl JM (2004) Onion design and its application to a pharmaceutical QSAR problem. J Chemom 18:188–202

    Article  CAS  Google Scholar 

  40. Wold S, Antti H, Lindgren F, Öhman J (1998) Chemom Intell Lab Syst 44:175–185

    Article  CAS  Google Scholar 

  41. Trygg J, Wold S (1998) Chemom Intell Lab Syst 42:209–220

    Article  CAS  Google Scholar 

  42. Wold S, Kettaneh-Wold N, Skagerberg B (1989) Chemom Intell Lab Syst 7:53–65

    Article  CAS  Google Scholar 

  43. Wold S (1992) Chemom Intell Lab Syst 14:71–84

    Article  CAS  Google Scholar 

  44. Eriksson L, Johansson E, Lindgren F, Wold S (2000) Quant Struct Act Relat 19:345–355

    Article  CAS  Google Scholar 

  45. Berglund A, Wold S (1997) J Chemom 11:141–156

    Article  CAS  Google Scholar 

  46. Wold S, Hellberg S, Lundstedt T, Sjöström M, Wold H (1987) PLS modeling with latent variables in two or more dimensions. In: Proceedings Frankfurt PLS-meeting, September

  47. Eriksson L, Damborsky J, Earll M, Johansson E, Trygg J, Wold S (2004) SAR & QSAR Env. Res. 15 ( In press)

  48. Wold S, Kettaneh N, Fridén H, Holmberg A (1998) Chemom Intell Lab Syst 44:331–340

    Article  CAS  Google Scholar 

  49. Antti H, Bollard ME, Ebbels T, Keun H, Lindon JC, Nicholson JK, Holmes E (2002) J Chemom 16:461–468

    Article  CAS  Google Scholar 

  50. Wold S, Geladi P, Esbensen K, Öhman J (1987) J Chemom 1:41–56

    CAS  Google Scholar 

  51. Nomikos P, MacGregor JF (1995) Chemom Intell Lab Syst 30:97–108

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eriksson, L., Antti, H., Gottfries, J. et al. Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm). Anal Bioanal Chem 380, 419–429 (2004). https://doi.org/10.1007/s00216-004-2783-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-004-2783-y

Keywords

Navigation