Chemometrics in analytical chemistry—part I: history, experimental design and data analysis tools


Chemometrics has achieved major recognition and progress in the analytical chemistry field. In the first part of this tutorial, major achievements and contributions of chemometrics to some of the more important stages of the analytical process, like experimental design, sampling, and data analysis (including data pretreatment and fusion), are summarised. The tutorial is intended to give a general updated overview of the chemometrics field to further contribute to its dissemination and promotion in analytical chemistry.

This is a preview of subscription content, log in to check access.

Fig. 1


  1. 1.

    Mandel J. Statistical methods in analytical chemistry. J Chem Educ. 1949;26:534–9.

    CAS  Article  Google Scholar 

  2. 2.

    Weber G. Enumeration of components in complex systems by fluorescence spectrophotometry. Nature. 1961;190:27–9.

    CAS  Article  Google Scholar 

  3. 3.

    Wallace RM. Analysis of absorption spectra by multicomponent systems. J Phys Chem. 1960;64:899–901.

    CAS  Article  Google Scholar 

  4. 4.

    Fisher RA. Statistical methods for research workers. Edinburgh: Oliver and Boyd; 1925.

    Google Scholar 

  5. 5.

    Lindsay RK, Buchanan BG, Feigenbaum EA, Lederberg J. Applications of artificial intelligence for organic chemistry: the DENDRAL project. New York: McGraw-Hill; 1980.

    Google Scholar 

  6. 6.

    Kowalski BR, Jurs PC, Isenhour TL, Reilly CN. Computerized learning machines applied to chemical problems-multicategory pattern classification by least squares. Anal Chem. 1969;41:695–700.

    CAS  Article  Google Scholar 

  7. 7.

    Wold S. Spline functions, a new tool in data-analysis. Kem Tidskr. 1972;3:34–7.

    Google Scholar 

  8. 8.

    B.R. Kowalski (editor), Chemometrics, mathematics, and statistics in chemistry. NATO ASI Series C, Mathematical and Physical Sciences. Vol. 138 D., 1984, Reidel Publishing Company: Dordrecht.

  9. 9.

    D.L. Massart, B.G.M.Vandeginste, S.N.Deming, Y. Michotte and L.Kaufman. Chemometrics: a textbook., Elsevier, Data Handling in Science and Technology, Volume 2, Amsterdam 1988.

  10. 10.

    Brereton RG. Chemometrics for pattern recognition. Chichester: Wiley; 2009.

    Google Scholar 

  11. 11.

    van der Greef J, Smilde AK. Symbiosis of chemometrics and metabolomics: past, present, and future. J Chemometrics. 2005;19:376–86.

    Article  Google Scholar 

  12. 12.

    Marini F, editor. Chemometrics in food chemistry. Amsterdam: Elsevier; 2013.

    Google Scholar 

  13. 13.

    Fisher RA. The design of experiments. Edinburgh: Oliver and Boyd; 1935.

    Google Scholar 

  14. 14.

    Box GEP, Hunter WG, Hunter JS. Statistics for experimenters. New York: Wiley; 1978.

    Google Scholar 

  15. 15.

    Deming SN, Morgan SL. Experimental design: a chemometric approach. Amsterdam: Elsevier; 1987.

    Google Scholar 

  16. 16.

    J.J.Jansen, H.C.J.Hoefslood, R.J.Lalmers, J. van der Greef, M.E. Tiemmerman, A.K. Smilde, Anova simultaneous component analysis, (ASCA): a new tool for analysing designed metabolomics data. Bioinformatics. 2005, 3043–3048.

  17. 17.

    Harrington PB, Viera NE, Espinoza J, Nien JK, Romero R, Lergeyt AL. Analysis of variance-principal component analysis: a soft tool for proteome discovery. Anal Chim Acta. 2005;544:118–27.

    CAS  Article  Google Scholar 

  18. 18.

    F.Marini, D. de Beer, E. Joubert and B. Walczak, Analysis of variance designed chromatographic data sets: the analysis of variance-target projection approach, J Chromatogr. 2015, 94–102.

  19. 19.

    Gy PM. Sampling for analytical purposes. The Netherlands: John Wiley and Sons; 1998.

    Google Scholar 

  20. 20.

    Einax JW, Zwanziger HW, Geis S. Sampling and sampling design. In: Chemometrics in environmental analysis. Weinheim, FRG: Wiley-VCH Verlag GmbH & Co. KGaA; 1997. p. 95–137.

    Google Scholar 

  21. 21.

    Esbensen KH, Geladi P. Principles of proper validation: use and abuse of re-sampling for validation. J Chemom. 2010;24:168–87.

    CAS  Article  Google Scholar 

  22. 22.

    Petersen L, Minkkinen P, Esbensen KH. Representative sampling for reliable data analysis: theory of sampling. Chemom Intell Lab Syst. 2005;77:261–77.

    CAS  Article  Google Scholar 

  23. 23.

    Petersen L, Esbensen KH. Sampling in practice: a tos toolbox of unit operations. In: Pomerantsev A, editor. Progress in chemometrics research. US: Nova Science Publishers; 2005.

    Google Scholar 

  24. 24.

    G. Kateman Chemometrics—sampling strategies, pp. 43–62. In: Chemometrics and species identification, Topics in current chemistry, Vol.141, Springer Verlag, FRG, 1987.

  25. 25.

    Dardenne P, Sinnaeve G, Baeten V. Multivariate calibration and chemometrics for near infrared spectroscopy: which method? J Near Infrared Spectrosc. 2000;8:229–37.

    CAS  Article  Google Scholar 

  26. 26.

    Engel J, Gerretzen J, Szymanska E, Jansen J, Downey G, Blanchet L, et al. Breaking with trends in pre-processing? Trends Anal Chem. 2013;50:96–106.

    CAS  Article  Google Scholar 

  27. 27.

    Data preprocessing chapters in comprehensive chemometrics, Vol2, Section Ed. J.Trygg, General Ed. S.D. Brown, R.Tauler, B.Walczak, Elsevier, Amsterdam, The Netherlands, 2009.

  28. 28.

    Beebe KR, Pell RJ, Seasholtz MB. Chemometrics. A practical guide. New York: Wiley; 1998.

    Google Scholar 

  29. 29.

    Booksh KS, Kowalski BR. Theory of analytical chemistry. Anal Chem. 1994;66:782A–91A.

    CAS  Article  Google Scholar 

  30. 30.

    Linear soft modelling chapters in Comprehensive chemometrics, Vol2, Section Ed. A. de Juan, General Ed. S.D. Brown, R. Tauler, B. Walczak, Elsevier, Amsterdam, The Netherlands, 2009.

  31. 31.

    Malinowski ER. Factor analysis in chemistry. New York: John Wiley & Sons; 2002.

    Google Scholar 

  32. 32.

    Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer Verlag; 2002.

    Google Scholar 

  33. 33.

    Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst. 1987;2:37–52.

    CAS  Article  Google Scholar 

  34. 34.

    Lee TW. Independent component analysis—theory and applications. Dordrecht: Kluewer Academic Publishers; 1998.

    Google Scholar 

  35. 35.

    Tauler R. Multivariate curve resolution applied to second order data. Chemom Intell Lab Syst. 1995;30:133–46.

    CAS  Article  Google Scholar 

  36. 36.

    Smilde A, Bro R, Geladi P. Multiway analysis: applications in the chemical sciences. New York: John Wiley & Sons; 2004.

    Google Scholar 

  37. 37.

    Bro R. PARAFAC tutorial and applications. Chemom Intell Lab Syst. 1997;38:149–71.

    CAS  Article  Google Scholar 

  38. 38.

    Lahat D, Adali T, Jutten C. Multimodal data fusion: an overview of methods, challenges, and prospects. Proc IEEE. 2015;103:1449–77.

    Article  Google Scholar 

  39. 39.

    Blanchet L, Smolinska A. In: Jung K, editor. Statistical analysis in proteomics. New York, NY: Springer New York; 2016. p. 209–23.

    Google Scholar 

  40. 40.

    Acar E, Rasmussen MA, Savorani F, Næs T, Bro R. Understanding data fusion within the framework of coupled matrix and tensor factorizations. Chemom Intell Lab Syst. 2013;129:53–63.

    CAS  Article  Google Scholar 

  41. 41.

    Alter O, Brown PO, Botstein D. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. PNAS. 2003;100:3351–6.

    CAS  Article  Google Scholar 

  42. 42.

    Bylesjö M, Eriksson D, Kusano M, Moritz T, Trygg J. Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. Plant J. 2007;52:1181–91.

    Article  Google Scholar 

  43. 43.

    Löfstedt T, Trygg J. OnPLS - a novel multiblock method for the modelling of predictive and orthogonal variation. J Chemom. 2011;25:441–55.

    Google Scholar 

  44. 44.

    Schouteden M, Van Deun K, Pattyn S, Van Mechelen I. SCA with rotation to distinguish common and distinctive information in linked data. Behav Res Methods. 2013;45:822–33.

    Article  Google Scholar 

  45. 45.

    Kuligowski J, Perez-Guaita D, Sanchez-Illana A, Leon-Gonzalez Z, de la Guardia M, Vento M, et al. Analysis of multi-source metabolomic data using joint and individual variation explained (JIVE). Analyst. 2015;140:4521–9.

    CAS  Article  Google Scholar 

  46. 46.

    Qannari EM, Courcoux P, Vigneau E. Common components and specific weights analysis performed on preference data. Food Qual Prefer. 2001;12:365–8.

    Article  Google Scholar 

  47. 47.

    E. Ortiz-Villanueva, F. Benavente; B. Piña; V. Sanz-Nebot; R. Tauler; J. Jaumot. Data fusion strategies for untargeted metabolomics based on MCR-ALS analysis of CE-MS and LC-MS data. Submitted.

Download references

Author information



Corresponding author

Correspondence to Romà Tauler.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

All participants belong to the chemometrics study group of the Division of Analytical Chemistry of EuCheMS.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Brereton, R.G., Jansen, J., Lopes, J. et al. Chemometrics in analytical chemistry—part I: history, experimental design and data analysis tools. Anal Bioanal Chem 409, 5891–5899 (2017).

Download citation


  • Chemometrics
  • Experimental design
  • Sampling
  • Data preprocessing
  • Projection methods
  • Data fusion