Analytical and Bioanalytical Chemistry

, Volume 380, Issue 3, pp 419–429 | Cite as

Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)

  • Lennart Eriksson
  • Henrik Antti
  • Johan Gottfries
  • Elaine Holmes
  • Erik Johansson
  • Fredrik Lindgren
  • Ingrid Long
  • Torbjörn Lundstedt
  • Johan Trygg
  • Svante Wold


This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.


PCA PLS Hierarchical modeling Multivariate analysis Omics data analysis 


  1. 1.
  2. 2.
    Lockhart DJ, Winzeler EA (2000) Nature 405:827–836CrossRefPubMedGoogle Scholar
  3. 3.
    Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: A Platform for Studying Drug Toxicity and Gene Function. Nat Rev 1:153–161CrossRefGoogle Scholar
  4. 4.
    Jackson JE (1991) A user’s guide to principal components. Wiley, New York (ISBN 0-471-62267-2)PubMedGoogle Scholar
  5. 5.
    Martens H, Naes T (1989) Multivariate calibration. Wiley, NY, ISBN 0-471-90979-3Google Scholar
  6. 6.
    Wold S, Esbensen K, Geladi P (1987) Chemom Intel Lab Syst 2:37–52CrossRefGoogle Scholar
  7. 7.
    Wold S, Albano C, Dunn WJ, Edlund U, Esbensen K, Geladi P, Hellberg S, Johansson E, Lindberg W, Sjöström M (1984) In: Kowalski BR (ed) Chemometrics: mathematics and statistics in chemistry, D. Reidel Publishing Company, DordrechtGoogle Scholar
  8. 8.
    Sjöström M, Wold S, Söderström B (1985) PLS Discriminant Plots. In: Proceedings of PARC in Practice, AmsterdamGoogle Scholar
  9. 9.
    Kalivas JH (1999) J Chemom 13:111–132CrossRefGoogle Scholar
  10. 10.
    Wold S, Johansson E, Cocchi M (1993) In: Kubinyi H (ed) 3D-QSAR in drug design, theory, methods, and applications. ESCOM Science Publishers, Leiden, pp 523–550Google Scholar
  11. 11.
    Burnham, AJ, Viveros R, MacGregor JF (1996) J Chemom 10:31–45CrossRefGoogle Scholar
  12. 12.
    Burnham, AJ, MacGregor JF, Viveros R (1999) Chemom Intel Lab Syst 48:167–180CrossRefGoogle Scholar
  13. 13.
    Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis—principles and applications. Umetrics AB. ISBN 91-973730-1-XCrossRefPubMedGoogle Scholar
  14. 14.
    Berglund A, De Rosa MC, Wold S (1997) J Comput Aid Mol Des 11:601–612CrossRefGoogle Scholar
  15. 15.
    Westerhuis J, Kourti T, MacGregor JF (1998) J Chemom 12:301–321CrossRefGoogle Scholar
  16. 16.
    Wold S, Kettaneh N, Tjessem K (1996) J Chemom 10:463–482CrossRefGoogle Scholar
  17. 17.
    Janné K, Pettersen J, Lindberg NO, Lundstedt T (2001) J Chemom 15:203–213CrossRefGoogle Scholar
  18. 18.
    Eriksson L, Johansson E, Lindgren F, Sjöström M, Wold S (2002) J Comput Aided Mol Des 16:711–726CrossRefPubMedGoogle Scholar
  19. 19.
    The data are taken from the web-site Cited 25 March 2003
  20. 20.
    Spellman et al. (1998) Mol Biol Cell 9:3273–3297.PubMedGoogle Scholar
  21. 21.
    Cho RJ et al. (1998) Mol Cell 2:65–73CrossRefPubMedGoogle Scholar
  22. 22.
    Johansson D, Lindgren P (2002) Masters Thesis in Bioinformatics, Umeå University,Google Scholar
  23. 23.
    Espina JR, Shockcor JP, Herron WJ, Car BD, Contel NR, Ciaccio PJ, Lindon JC, Holmes E, Nicholson JK (2001) Magn Reson Chem 39:559–565CrossRefGoogle Scholar
  24. 24.
    Eriksson L, Antti H, Holmes E, Johansson E Multi- and Megavariate Data Analysis: Finding and Using Regularities in Metabonomics Data. In: Robertson DG (Ed) Toxicological metabonomics: the use of NMR spectroscopy and multivariate statistics in drug safety evaluation. Kluwer, DordrechtGoogle Scholar
  25. 25.
    Gunnarsson I, Andersson PM, Wikberg J, Lundstedt T (2003) J Chemom 17:82–92CrossRefGoogle Scholar
  26. 26.
    Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1988) J Med Chem 41:2481–2491CrossRefGoogle Scholar
  27. 27.
    Eriksson L, Andersson PM, Johansson E, Lundstedt T (2002) Statistical molecular design—a core concept in multivariate qsar and combinatorial technologies. Part I—Basic principles and application to lead optimization. Part II—QSAR applications. Part III—QSAR-directed virtual screening. Part IV—SMD: an integral part of combC and HTS. Part V—Some extensions and recent developments. cited 19 December 2003
  28. 28.
  29. 29.
    Wold S (1978) Technometrics 20:397–405Google Scholar
  30. 30.
    Trygg J (2001) PhD Thesis. Umeå University,Google Scholar
  31. 31.
    Ståhle L, Wold S (1987) J Chemom 1:185–196Google Scholar
  32. 32.
    Barker M, Rayens W (2003) J Chemom 17:166–173CrossRefGoogle Scholar
  33. 33.
    Atif U, Earll, Eriksson L, Johansson E, Lord P, Margrett S (2002) Analysis of gene expression datasets using partial least-squares discriminant analysis and principal-component analysis. In: Martyn Ford, David Livingstone, John Dearden and Han Van de Waterbeemd (eds) Euro QSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell, Oxford, pp 369–373 ISBN 1-4051-2561-0.Google Scholar
  34. 34.
    Wold S, Trygg J, Berglund A, Antti H (2001) Chemom Intell Lab Syst 58:131–150CrossRefGoogle Scholar
  35. 35.
    Kristal BS (2002) Practical considerations and approaches for entry-level megavariate analysis. cited 5 February 2004
  36. 36.
    Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New YorkGoogle Scholar
  37. 37.
    Eriksson L, Johansson E, Kettaneh-Wold N, Wikström C, Wold S (2000) Design of experiments—principles and applications. Umetrics AB, 2000. ISBN 91-973730-0-1PubMedGoogle Scholar
  38. 38.
    Olsson I, Gottfries J, Wold S, D-optimal onion design (DOOD) in statistical molecular design, chemometrics and intelligent laboratory systems. Chemom Intell Lab Syst 73:37–46Google Scholar
  39. 39.
    Eriksson L, Arnhold T, Beck B, Fox T, Johansson E, Kriegl JM (2004) Onion design and its application to a pharmaceutical QSAR problem. J Chemom 18:188–202CrossRefGoogle Scholar
  40. 40.
    Wold S, Antti H, Lindgren F, Öhman J (1998) Chemom Intell Lab Syst 44:175–185CrossRefGoogle Scholar
  41. 41.
    Trygg J, Wold S (1998) Chemom Intell Lab Syst 42:209–220CrossRefGoogle Scholar
  42. 42.
    Wold S, Kettaneh-Wold N, Skagerberg B (1989) Chemom Intell Lab Syst 7:53–65CrossRefGoogle Scholar
  43. 43.
    Wold S (1992) Chemom Intell Lab Syst 14:71–84CrossRefGoogle Scholar
  44. 44.
    Eriksson L, Johansson E, Lindgren F, Wold S (2000) Quant Struct Act Relat 19:345–355CrossRefGoogle Scholar
  45. 45.
    Berglund A, Wold S (1997) J Chemom 11:141–156CrossRefGoogle Scholar
  46. 46.
    Wold S, Hellberg S, Lundstedt T, Sjöström M, Wold H (1987) PLS modeling with latent variables in two or more dimensions. In: Proceedings Frankfurt PLS-meeting, SeptemberGoogle Scholar
  47. 47.
    Eriksson L, Damborsky J, Earll M, Johansson E, Trygg J, Wold S (2004) SAR & QSAR Env. Res. 15 ( In press)Google Scholar
  48. 48.
    Wold S, Kettaneh N, Fridén H, Holmberg A (1998) Chemom Intell Lab Syst 44:331–340CrossRefGoogle Scholar
  49. 49.
    Antti H, Bollard ME, Ebbels T, Keun H, Lindon JC, Nicholson JK, Holmes E (2002) J Chemom 16:461–468CrossRefGoogle Scholar
  50. 50.
    Wold S, Geladi P, Esbensen K, Öhman J (1987) J Chemom 1:41–56Google Scholar
  51. 51.
    Nomikos P, MacGregor JF (1995) Chemom Intell Lab Syst 30:97–108CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  • Lennart Eriksson
    • 1
  • Henrik Antti
    • 2
    • 4
  • Johan Gottfries
    • 3
    • 4
  • Elaine Holmes
    • 2
  • Erik Johansson
    • 1
  • Fredrik Lindgren
    • 5
  • Ingrid Long
    • 6
  • Torbjörn Lundstedt
    • 6
  • Johan Trygg
    • 4
  • Svante Wold
    • 4
  1. 1.Umetrics ABUmeåSweden
  2. 2.Biological Chemistry, Biomedical Sciences Division, Faculty of MedicineImperial College of Science Technology and MedicineLondonUK
  3. 3.AstraZenecaR&D MölndalMölndalSweden
  4. 4.Institute of ChemistryUmeå UniversityUmeåSweden
  5. 5.Umetrics ABMalmö OfficeMalmöSweden
  6. 6.Department of Pharmaceutical ChemistryUppsala UniversityUppsalaSweden

Personalised recommendations