Summary
Three-dimensional molecular modeling can provide an unlimited number m of structural properties. Comparative Molecular Field Analysis (CoMFA), for example, may calculate thousands of field values for each model structure. When m is large, partial least squares (PLS) is the statistical method of choice for fitting and predicting biological responses. Yet PLS is usually implemented in a property-based fashion which is optimal only for small m. We describe here a sample-based formulation of PLS which can be used to fit any single response (bioactivity). SAMPLS reduces all explanatory data to the pairwise ‘distances’ among n sample (molecules), or equivalently to an n-by-n covariance matrix C. This matrix, unmodified, can be used to fit all PLS components. Furthermore, SAMPLS will validate the model by modern resampling techniques, at a cost independent of m. We have implemented SAMPLS as a Fortran program and have reproduced conventional and cross-validated PLS analyses of data from two published studies. Full (leaveach-out) cross-validation of a typical CoMFA takes 0.2 CPU s. SAMPLS is thus ideally suited to structure-activity analysis based on CoMFA fields or bonded topology. The sample-distance formulation also relates PLS to methods like cluster analysis and nonlinear mapping, and shows how drastically PLS simplifies the information in CoMFA fields.
Similar content being viewed by others
Abbreviations
- PLS:
-
partial least squares
- SAMPLS:
-
sample-distance partial least squares
- CoMFA:
-
comparative molecular field analysis.
References
CramerIII, R.D., Patterson, D.E. and Bunce, J.E., J. Am. Chem. Soc., 110 (1988) 5959.
Klebe, G. and Abraham, U., J. Med. Chem., 36 (1993) 37.
Wold, S., Ruhe, A., Wold, H., DunnIII, W.J., SIAM J. Sci. Stat. Comput., 5 (1984) 735.
Wold, S., Albano, C., DunnIII, W.J. and Edlund, U., In Kowalski, B.R. (Ed.) Chemometrics—Mathematics and Statistics in Chemistry (NATO ASI Ser. C 138), Reidel, Dordrecht, 1984, pp. 17–95.
Geladi, P. and Kowalski, B.R., Analyt. Chim. Acta, 185 (1986) 1.
Höskuldsson, A., J. Chemometr., 2 (1988) 221.
Stahle, L. and Wold, S., J. Pharmacol. Methods, 16 (1986) 91.
Hellberg, S., Sjöström, M., Skagerberg, B. and Wold, S., J. Med. Chem., 30 (1987) 1126.
Johnsson, J., Eriksson, L., Hellberg, S., Sjöström, M. and Wold, S., Acta Chem. Scand., 43 (1989) 286.
Hartauer, K.J. and Guillory, J.K., Pharm. Res., 6 (1989) 608.
Dousseau, F. and Pezolet, M., Biochemistry, 29 (1990) 8771.
DeMeo, G., Pedini, M., Ricci, A., Bastianini, L., Jacquignon, P.C., Bonelli, D., Clementi, S. and Cruciani, G., Farmaco, 45 (1990) 313.
SYBYL: Tripos Associates, Inc., 1699 South Hanley Road, Suite 303, St. Louis, MO 63144, U.S.A.
Carhart, R.E., Smith, D.H. and Venkataraghavan, R., J. Chem. Inform. Comput. Sci., 25 (1985) 64.
Efron, B. and Tibshirani, R., Science, 253 (1991) 390.
Efron, B. and Gong, G., Am. Statist., 37 (1983) 36.
Stone, A. and Jonathan, P., J. Chemometr., in press.
Glen, W.G., DunnIII, W.J. and Scott, D.R., Tetrahedron Comput. Methodol., 2 (1989) 349.
Glen, W.G., Sarker, M., DunnIII, W.J. and Scott, D.R., Tetrahedron Comput. Methodol., 2 (1989) 377.
Katsumi, H., Yoshida, M., Kikuzono, Y., Takayama, C. and Marsili, M., Analyt. Sci., 7 (Suppl.) (1991) 719.
Lewi, P.J. and Moereels, H., Trends Analyt. Chem., 10 (1991) 283.
vanHeel, M.J., J. Mol. Biol., 220 (1991) 887.
Everitt, B.S., Cluster Analysis, Halsted, New York, 1980.
Kowalski, B.R. and Bender, C.F., J. Am. Chem. Soc., 94 (1972) 5632.
Hudson, B., Livingstone, D.J. and Rahr, E., J. Comput.-Aided Mol. Design, 3 (1989) 55.
Friedman, J.H. and Stuetzle, W., J. Am. Stat. Assoc., 76 (1981) 817.
Norinder, U., J. Comput.-Aided Mol. Design, 5 (1991) 419.
Kim, K.H. and Martin, Y.C., J. Med. Chem., 34 (1991) 2056.
RS/1: BBN Software Products Corp., Cambridge, MA.
Gould, K.J., Manners, C.N., Payling, D.W., Suschitzky, J.L. and Wells, E., J. Med. Chem., 31 (1988) 1445.
Dunn, J.F., Nisula, B.C. and Rodbard, D., J. Clin. Endocrinol. Metab., 1981 (1981) 63.
Mahalanobis, P.C., Proc. Natl. Inst. Sci. (India), 122 (1936) 122.
Crippen, G.M., Distance Geometry and Conformational Calculations (Chemometrics Research Studies Series, No. 1), Research Studies Press, New York, 1981.
Gilson, M. and Honig, B., Proteins, 4 (1988) 7.
Harvey, S.C., Proteins, 5 (1989) 78.
Kellogg, G.E., Semus, S.F. and Abraham, D.J., J. Comput.-Aided Mol. Design, 5 (1991) 545.
Lindgren, F., Geladi, P. and Wold, S., J. Chemometr., 7 (1993) 45.
Good, A.C., So, S-S. and Richards, W.G., J. Med. Chem., 36 (1993) 433.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bush, B.L., Nachbar, R.B. Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA. J Computer-Aided Mol Des 7, 587–619 (1993). https://doi.org/10.1007/BF00124364
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00124364