Skip to main content
Log in

Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA

  • Research Papers
  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Summary

Three-dimensional molecular modeling can provide an unlimited number m of structural properties. Comparative Molecular Field Analysis (CoMFA), for example, may calculate thousands of field values for each model structure. When m is large, partial least squares (PLS) is the statistical method of choice for fitting and predicting biological responses. Yet PLS is usually implemented in a property-based fashion which is optimal only for small m. We describe here a sample-based formulation of PLS which can be used to fit any single response (bioactivity). SAMPLS reduces all explanatory data to the pairwise ‘distances’ among n sample (molecules), or equivalently to an n-by-n covariance matrix C. This matrix, unmodified, can be used to fit all PLS components. Furthermore, SAMPLS will validate the model by modern resampling techniques, at a cost independent of m. We have implemented SAMPLS as a Fortran program and have reproduced conventional and cross-validated PLS analyses of data from two published studies. Full (leaveach-out) cross-validation of a typical CoMFA takes 0.2 CPU s. SAMPLS is thus ideally suited to structure-activity analysis based on CoMFA fields or bonded topology. The sample-distance formulation also relates PLS to methods like cluster analysis and nonlinear mapping, and shows how drastically PLS simplifies the information in CoMFA fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

PLS:

partial least squares

SAMPLS:

sample-distance partial least squares

CoMFA:

comparative molecular field analysis.

References

  1. CramerIII, R.D., Patterson, D.E. and Bunce, J.E., J. Am. Chem. Soc., 110 (1988) 5959.

    Google Scholar 

  2. Klebe, G. and Abraham, U., J. Med. Chem., 36 (1993) 37.

    Google Scholar 

  3. Wold, S., Ruhe, A., Wold, H., DunnIII, W.J., SIAM J. Sci. Stat. Comput., 5 (1984) 735.

    Google Scholar 

  4. Wold, S., Albano, C., DunnIII, W.J. and Edlund, U., In Kowalski, B.R. (Ed.) Chemometrics—Mathematics and Statistics in Chemistry (NATO ASI Ser. C 138), Reidel, Dordrecht, 1984, pp. 17–95.

    Google Scholar 

  5. Geladi, P. and Kowalski, B.R., Analyt. Chim. Acta, 185 (1986) 1.

    Google Scholar 

  6. Höskuldsson, A., J. Chemometr., 2 (1988) 221.

    Google Scholar 

  7. Stahle, L. and Wold, S., J. Pharmacol. Methods, 16 (1986) 91.

    Google Scholar 

  8. Hellberg, S., Sjöström, M., Skagerberg, B. and Wold, S., J. Med. Chem., 30 (1987) 1126.

    Google Scholar 

  9. Johnsson, J., Eriksson, L., Hellberg, S., Sjöström, M. and Wold, S., Acta Chem. Scand., 43 (1989) 286.

    Google Scholar 

  10. Hartauer, K.J. and Guillory, J.K., Pharm. Res., 6 (1989) 608.

    Google Scholar 

  11. Dousseau, F. and Pezolet, M., Biochemistry, 29 (1990) 8771.

    Google Scholar 

  12. DeMeo, G., Pedini, M., Ricci, A., Bastianini, L., Jacquignon, P.C., Bonelli, D., Clementi, S. and Cruciani, G., Farmaco, 45 (1990) 313.

    Google Scholar 

  13. SYBYL: Tripos Associates, Inc., 1699 South Hanley Road, Suite 303, St. Louis, MO 63144, U.S.A.

  14. Carhart, R.E., Smith, D.H. and Venkataraghavan, R., J. Chem. Inform. Comput. Sci., 25 (1985) 64.

    Google Scholar 

  15. Efron, B. and Tibshirani, R., Science, 253 (1991) 390.

    Google Scholar 

  16. Efron, B. and Gong, G., Am. Statist., 37 (1983) 36.

    Google Scholar 

  17. Stone, A. and Jonathan, P., J. Chemometr., in press.

  18. Glen, W.G., DunnIII, W.J. and Scott, D.R., Tetrahedron Comput. Methodol., 2 (1989) 349.

    Google Scholar 

  19. Glen, W.G., Sarker, M., DunnIII, W.J. and Scott, D.R., Tetrahedron Comput. Methodol., 2 (1989) 377.

    Google Scholar 

  20. Katsumi, H., Yoshida, M., Kikuzono, Y., Takayama, C. and Marsili, M., Analyt. Sci., 7 (Suppl.) (1991) 719.

    Google Scholar 

  21. Lewi, P.J. and Moereels, H., Trends Analyt. Chem., 10 (1991) 283.

    Google Scholar 

  22. vanHeel, M.J., J. Mol. Biol., 220 (1991) 887.

    Google Scholar 

  23. Everitt, B.S., Cluster Analysis, Halsted, New York, 1980.

    Google Scholar 

  24. Kowalski, B.R. and Bender, C.F., J. Am. Chem. Soc., 94 (1972) 5632.

    Google Scholar 

  25. Hudson, B., Livingstone, D.J. and Rahr, E., J. Comput.-Aided Mol. Design, 3 (1989) 55.

    Google Scholar 

  26. Friedman, J.H. and Stuetzle, W., J. Am. Stat. Assoc., 76 (1981) 817.

    Google Scholar 

  27. Norinder, U., J. Comput.-Aided Mol. Design, 5 (1991) 419.

    Google Scholar 

  28. Kim, K.H. and Martin, Y.C., J. Med. Chem., 34 (1991) 2056.

    Google Scholar 

  29. RS/1: BBN Software Products Corp., Cambridge, MA.

  30. Gould, K.J., Manners, C.N., Payling, D.W., Suschitzky, J.L. and Wells, E., J. Med. Chem., 31 (1988) 1445.

    Google Scholar 

  31. Dunn, J.F., Nisula, B.C. and Rodbard, D., J. Clin. Endocrinol. Metab., 1981 (1981) 63.

    Google Scholar 

  32. Mahalanobis, P.C., Proc. Natl. Inst. Sci. (India), 122 (1936) 122.

    Google Scholar 

  33. Crippen, G.M., Distance Geometry and Conformational Calculations (Chemometrics Research Studies Series, No. 1), Research Studies Press, New York, 1981.

    Google Scholar 

  34. Gilson, M. and Honig, B., Proteins, 4 (1988) 7.

    Google Scholar 

  35. Harvey, S.C., Proteins, 5 (1989) 78.

    Google Scholar 

  36. Kellogg, G.E., Semus, S.F. and Abraham, D.J., J. Comput.-Aided Mol. Design, 5 (1991) 545.

    Google Scholar 

  37. Lindgren, F., Geladi, P. and Wold, S., J. Chemometr., 7 (1993) 45.

    Google Scholar 

  38. Good, A.C., So, S-S. and Richards, W.G., J. Med. Chem., 36 (1993) 433.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bush, B.L., Nachbar, R.B. Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA. J Computer-Aided Mol Des 7, 587–619 (1993). https://doi.org/10.1007/BF00124364

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00124364

Key words

Navigation