Skip to main content
Log in

Megavariate Analysis of Environmental QSAR Data. Part II – Investigating Very Complex Problem Formulations Using Hierarchical, Non-Linear and Batch-Wise Extensions of PCA and PLS

  • Full-length paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Summary

Three extensions of the basic PCA and PLS methodologies are described. These extensions are hierarchical, non-linear and batch-based in nature. The objectives of these methods are to assist in problem understanding and problem solving in very complex (QSAR) problem formulations. The method extensions are illustrated using two example QSAR data sets containing many X- and Y-variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

ACE:

alternating conditional expectations

BIF-PLS:

bifocal PLS

MARS:

multivariate adaptive regression splines

MLR:

multiple linear regression

NPLS:

non-linear PLS

NN:

neural networks

PARAFAC:

parallel factor analysis

PCA:

principal component analysis

PCB:

polychlorinated biphenyls

PCR:

principal component regression

PLS:

partial least squares projections to latent structures

OPLS:

orthogonal PLS

PLS-DA:

PLS discriminant analysis

QSAR:

quantitative structure-activity relationships

SIMCA:

soft independent modelling of class analogy

SMD:

statistical molecular design

SPLS:

spline PLS

SVM:

support vector machines

References

  1. Eriksson, L., Andersson, P.M., Johansson, E. and Tysklind, M., Megavariate analysis of environmental QSAR data. Part I – A basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD), 2005, This issue.

  2. Eriksson, L., Jaworska, J., Worth, A.P., Cronin, M.T.D., McDowell, R.M. and Gramatica, P., Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSAR, Environmental Health Perspectives, 11 (2003) 1361–1375.

    Article  CAS  Google Scholar 

  3. Wold, S. and Dunn, III, W.J., Multivariate quantitative structure-activity relationships: Conditions for their applicability, J. Chem. Inf. Comp. Sci., 23 (1983) 6–13.

    Article  CAS  Google Scholar 

  4. Eriksson, L. and Johansson, E., Multivariate design and modelling in QSAR, Chemom. Intell. Lab. Syst., 34 (1996) 1–19.

  5. Wold, S., Kettaneh. N. and Tjessem, K., Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection, Journal of Chemometrics, 10 (1996) 463–482.

    Article  CAS  Google Scholar 

  6. Berglund, A., De Rosa, M.C. and Wold, S., Alignment of flexible molecules at their receptor site using 3d descriptors and hi-PCA, Journal of Computer-Aided Molecular Design, 11 (1997) 601–612.

    Article  CAS  Google Scholar 

  7. Rännar, S., MacGregor, J.F. and Wold, S., Adaptive batch monitoring using hierarchical PCA, Chemometrics and Intelligent Laboratory Systems, 41 (1998) 73–81.

    Article  Google Scholar 

  8. Westerhuis, J., Kourti, T. and MacGregor, J.F., Analysis of multiblock and hierarchical PCA and PLS models, Journal of Chemometrics, 12 (1998) 301–332.

    Article  CAS  Google Scholar 

  9. Janné, K., Pettersen, J., Lindberg, N.-O. and Lundstedt, T., Hierarchical principal component analysis (PCA) and projection to latent structure (PLS) technique on spectroscopic data as a data pretreatment for calibration, Journal of Chemometrics, 15 (2001) 203–213.

    Article  Google Scholar 

  10. Eriksson, L., Johansson, E., Lindgren, F., Sjöström, M. and Wold, S., Megavariate analysis of hierarchical QSAR data, Journal of Computer-Aided Molecular Design, 16 (2002) 711–726.

    Article  CAS  Google Scholar 

  11. Gunnarsson, I., Andersson, P., Wikberg, J. and Lundstedt, T., Multivariate analysis of G-protein coupled receptors, Journal of Chemometrics, 17 (2003) 82–92.

    Article  CAS  Google Scholar 

  12. Stefanov, Z.I. and Hoo, K.A., Hierarchical multivariate analysis of cockle phenomena, Journal of Chemometrics, 17 (2003) 550–568.

    Article  CAS  Google Scholar 

  13. Eriksson, L., Arnhold, T., Beck, B., Fox, T., Johansson, E. and Kriegl, J.M., Onion design and its application to a pharmaceutical QSAR problem, Journal of Chemometrics, 18 (2004) 188–202.

    Article  CAS  Google Scholar 

  14. Eriksson, L., Antti, H., Gottfries, J., Holmes, E., Johansson, E., Lindgren, F., Long, I., Lundstedt, T., Trygg, J. and Wold, S., Using chemometrics for navigating in the large data sets of genomics, proteomics and metabonomics, Analytical and Bioanalytical Chemistry, 380 (2004) 419–429.

    Article  CAS  Google Scholar 

  15. Kettaneh, N., Berglund, A. and Wold, S., PCA and PLS with very large data sets, Computational Statistics and Data Analysis, 48 (2005) 69–85.

    Article  Google Scholar 

  16. Hermens, J.L.M., 1989, Quantitative structure-activity relationships of environmental pollutants. In: Hutzinger, O., (Ed.), Handbook of Environmental Chemistry, Vol 2E, Reactions and Processes. Springer-Verlag, Berlin, 1989, pp. 111–162.

  17. Könemann, H., Quantitative structure-activity relationships in fish studies. Part 1: Relationship for 50 industrial pollutants, Toxicology, 19 (1981) 209–221.

    Article  Google Scholar 

  18. Wold, S., Kettaneh-Wold, N. and Skagerberg, B., Non-linear PLS modelling, Chemom. Intell. Lab. Syst., 7 (1989) 53–65.

    Article  CAS  Google Scholar 

  19. Wold, S., Non-linear partial least squares modeling. II. Spline inner realation, Chemom. Intell. Lab. Syst., 14 (1992) 71–84.

    Article  CAS  Google Scholar 

  20. Qin, S.J. and McAvoy, T.J., Non-linear PLS Modelling using neural networks, Comput. Chem. Engng., 16 (1992) 379–391.

    Article  CAS  Google Scholar 

  21. Sekulic, S., Seasholtz, M.B., Wang, Z., Kowalski, B., Lee, S.E. and Holt, B.R, Non-linear multivariate calibration methods in analytical chemistry, Anal. Chem., 65 (1993) 835–845.

    Article  Google Scholar 

  22. Andersson, G., Kaufmann, P. and Renberg, L, Non-linear modelling with a coupled neural network – PLS regression system, J. Chemom., 10 (1996) 605–614.

    Article  CAS  Google Scholar 

  23. Blanco, M., Coello, J., Iturriaga, H., Maspoch, S. and Pagès, J., NIR calibration in non-linear systems: Different PLS approaches and artificial neural networks, Chemom. Intell. Lab. Systs., 50 (2000) 75–82.

    Article  CAS  Google Scholar 

  24. Berglund, A. and Wold, S., INLR, Implicit non-linear latent variable regression, J. Chemom., 11 (1997) 141–156.

    Article  CAS  Google Scholar 

  25. Berglund, A. and Wold, S., A serial extension of multi block PLS, Journal of Chemometrics, 13 (1999) 461–471.

    Article  CAS  Google Scholar 

  26. Berglund, A., Kettaneh, N., Uppgård, L.L., Wold, S., Bandwell, N. and Cameron, D.R., The GIFI approach to non-linear PLS modelling, Journal of Chemometrics, 15 (2001) 321–336.

    Article  CAS  Google Scholar 

  27. Eriksson, L., Johansson, E., Lindgren, F. and Wold, S., GIFI-PLS: modeling of non-linearities and discontinuities in QSAR, Quantitative Structure-Activity Relationships, 19 (2000) 345–355.

    Article  CAS  Google Scholar 

  28. Michailidis, G. and de Leeuw, J., The GIFI system of descriptive multivariate analysis, Statistical Science, 13 (1998) 307–336.

    Article  Google Scholar 

  29. Eriksson, L., Gottfries, J., Johansson, E. and Wold, S., Time-resolved QSAR: An approach to PLS modelling of three-way biological data, Chemometrics and Intelligent Laboratory Systems, 73 (2004) 73–84.

    Article  CAS  Google Scholar 

  30. Wold, S., Kettaneh, N., Fridén, H. and Holmberg, A., Modelling and diagnostics of batch processes and analogous kinetic experiments, Chemometrics and Intelligent Laboratory Systems, 44 (1998) 331–340.

    Article  CAS  Google Scholar 

  31. Esbensen, K. and Geladi, P., Strategy of multivariate image analysis (MIA), Chemometrics and Intelligent Laboratory Systems, 7 (1989) 67–86.

    Article  CAS  Google Scholar 

  32. MacGregor, J.F. and Nomikos, P., Monitoring batch processes, NATO ASI for batch processing systems, May 29–June 7, 1992, Antalya, Turkey.

  33. Kourti, T., Multivariate dynamic data modeling for analysis and statistical process control of batch processes, start-ups and grade transitions, Journal of Chemometrics, 17 (2003) 93–109.

    Article  CAS  Google Scholar 

  34. Bro, R., PARAFAC. Tutorial and Applications, Chemometrics and Intelligent Laboratory Systems, 38 (1997) 149–171.

    Article  CAS  Google Scholar 

  35. Kiers, H.A.L., Some procedures for displaying results from three-way methods, Journal of Chemometrics, 14 (2000) 151–70.

    Article  CAS  Google Scholar 

  36. Nicholson, J.K., Connelly, J., Lindon, J.C. and Holmes, E., Metabonomics: a platform for studying drug toxicity and gene function, Nature Reviews, 1 (2002) 153–162.

    Article  CAS  Google Scholar 

  37. Antti, H., Bollard, M.E., Ebbels, T., Keun, H., Lindon, J.C., Nicholson, J.K and Holmes, E., Batch statistical processing of 1H-NMR-derived urinary spectral data, Journal of Chemometrics, 16 (2002) 461–468.

    Article  CAS  Google Scholar 

  38. Cruciani, G. and Watson, K.A., Comparative molecular field analysis using GRID force-field and GOLPE variable selection methods in a study of inhibitors of glycogen phosphorylase b, Journal of Medicinal Chemistry, 37 (1994) 2589–2601.

    Article  CAS  Google Scholar 

  39. Cocchi, M. and Johansson, E., Amino acids characterization by grid and multivariate data analysis, Quantitative Structure-Activity Relationships, 12 (1993) 1–8.

    Article  CAS  Google Scholar 

  40. Trygg, J. and Wold, S., Orthogonal projections to latent structures, Journal of Chemometrics, 16 (2002) 119–128.

    Article  CAS  Google Scholar 

  41. Trygg, J., O2-PLS for qualitative and quantitative analysis in multivariate calibration, Journal of Chemometrics, 16 (2002) 283–293.

    Article  CAS  Google Scholar 

  42. Trygg, J., Prediction and spectral profile estimation in multivariate calibration, Journal of Chemometrics, 18 (2004) 166–172.

    Article  CAS  Google Scholar 

  43. Kristal, B.S., Practical considerations and approaches for entry-level megavariate analysis (2002). http://mickey.utmem.edu/papers/bioinformatics_02/pdfs/Kristal.pdf. Accessed 2006–01–31.

  44. Wold, S., Hellberg, S., Lundstedt, T., Sjöström, M. and Wold, H., PLS Modeling with latent variables in two or more dimensions, Proceedings Frankfurt PLS-meeting, September, 1987.

  45. Eriksson, L., Damborsky, J., Earll, M., Johansson, E., Trygg, J. and Wold, S., Three-block bi-focal PLS (3BIF-PLS) and its application in QSAR, SAR and QSAR in Environmental Research, 5/6 (2004) 481–499.

  46. Martens, H., Anderssen, E., Flatberg, A., Gidskehaug, L.H., Höy, M., Westad, F., Thybo, A. and Martens, M., Regression of a data matrix on descriptors of both its rows and of its columns via latent variables: L-PLSR, Computational Statistics and Data Analysis, 48 (2005) 103–123.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lennart Eriksson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eriksson, L., Andersson, P.L., Johansson, E. et al. Megavariate Analysis of Environmental QSAR Data. Part II – Investigating Very Complex Problem Formulations Using Hierarchical, Non-Linear and Batch-Wise Extensions of PCA and PLS. Mol Divers 10, 187–205 (2006). https://doi.org/10.1007/s11030-006-9026-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-006-9026-4

Key words

Navigation