Abstract
Modeling quantitative structure–activity relationships (QSAR) is considered with an emphasis on prediction. An abundance of methods are available to develop such models. Using a harmonious approach that balances the bias and variance of predictions, the best calibration models are identified relative to the bias and variance criteria used. Criteria utilized to determine the adequacy of models are the root mean square error of calibration (RMSEC) and validation (RMSEV), respective R 2 values, and the norm of the regression vector. QSAR data from the literature are used to demonstrate concepts. For these data sets and criteria used, it is suggested that models obtained by ridge regression (RR) are more harmonious and parsimonious than models obtained by partial least squares (PLS) and principal component regression (PCR) when the data is mean-centered. The most harmonious RR models have the best bias/variance tradeoff reflected by the smallest RMSEC, RMSEV, and regression vector norms and the largest calibration and validation R 2 values. The most parsimonious RR models have the smallest effective rank.
Similar content being viewed by others
References
H. van de Waterbeemd (Eds) (1995) Chemometric Methods in Molecular Design VCH New York
B.G.M. Vandeginste D.L. Massart L.M.C. Buydens S. De Jong P.J. Lewi J. Smeyers-Verbeke (1998) Handbook of Chemometrics and Qualimetrics: Part B, Chapter 37 Elsevier Amsterdam
J.H. Kalivas P.M. Lang (1994) Mathematical Analysis of Spectral Orthogonality Marcel Dekker New York
J.H. Kalivas (1999) Chemom. 13 111
J.H. Kalivas (1999) Chemom. Intell. Lab. Syst., 45 215
M. Goldstein A.F.M. Smith (1974) J. Royal Stat. Soc B, 36 284
R.F. Gunst R.L. Mason (1977) J. Am. Stat. Assoc., 72 616
P.C. Hansen (1988) Computing 40 185
J.M. Lowerre (1974) Technometrics 16 461
C. Bingham K. Larntz (1977) J. Am. Stat. Assoc. 72 97
R.R. Hocking F.M. Speed M.J. Lynn (1976) Technometrics 18 425
S. de Jong B.M. Wise N.L. Ricker (2001) J. Chemom., 15 85
S. de Jong H.A.L. Kiers (1992) Chemo. Intell. Lab. Syst., 14 155
M. Aldrin (2000) Am. Stat. Assoc., 54 29
T.R. Holcomb H. Hjakmarsson M. Morari M.L. Tyler (1997) J. Chemom., 11 282
J.H. Kalivas (2001) Anal. Chim. Acta, 428 31
Q.S. Xu Y.Z. Liang H.L. Shen (2001) J. Chemom., 15 135
R.L. Green J.H. Kalivas (2002) Chemom. Intell. Lab. Syst., 60 173
J.H. Kalivas R.L. Green (2001) Appl. Spectrosc., 55 1645
J.H. Kalivas (2004) Anal. Chim. Acta, 505 9
K.J. Anderson J.H. Kalivas (2003) Appl. Spectrosc., 57 309
J.L. Cohon (1978) Multiobjective Programming and Planning Academic Press New York
Y. Censor (1977) Appl. Math. Optimz, 4 41
N.O. Da Cunha E. Polak (1967) J. Math. Anal. Appl., 19 103
L.A. Zadeh (1963) IEEE Trans. Automat. Contr. AC-8 1
A.K. Smilde A. Knevelman P.M.J. Coenegracht (1968) J. Chromatogr., 369 1
A. Höskuldsson (1992) Chemom. Intell. Lab. Syst., 14 139
A. Höskuldsson (1996) Chemom. Intell. Lab. Syst., 32 37
P.C. Hansen (1990) SIAM Review 34 503
Hansen, P.C., In Johnston, P. (Ed.), Computational Inverse Problems in Electrocardiology, WIT Press, South Hampton, 2001.
C.L. Lawson R.J. Hanson (1974) Solving Least Squares Problems Prentice-Hall Englewood Cliffs, NJ
P.C. Hansen (1990) SIAM J. Sci. Stat. Comput., 11 503
K. Faber B.R. Kowalski (1997) J. Chemom., 11 181
K. Faber B.R. Kowalski (1996) Chemom. Intell. Lab. Syst., 34 283
A. Lorber B.R. Kowalski (1988) J. Chemom., 2 93
T. Næs T. Isaksson T. Fern T. Davies (2002) A User Friendly Guide to Multivariate Calibration and Classification NIR Publications Chichester
S. Weisberg (1985) Applied Linear Regression Wiley New York
P.C. Hansen (1998) Rank-deficient and Discrete Ill-posed Problems: Numerical Aspects of Linear Inversion SIAM Philadelphia, PA
A.N. Tikhonov (1963) Soviet Math. Dokl., 4 1035
A.E. Hoerl R.W. Kennard (1970) Technometrics 12 55
K. Baumann (2003) Trends Anal. Chem. 22 395
K. Baumann M. von Korff H. Albert (2002) J. Chemom. 16 351
Q.S. Xu Y.Z. Liang (2001) J. Chemom., 56 1
P. Burman (1989) Biometrika 76 503
J. Shao (1993) J. Am. Statist. Assoc. 88 486
B.E. Mattioni P.C. Jurs (2002) J. Chem. Inf. Comput. Sci. 42 94
B.E. Mattioni P.C. Jurs (2003) J. Mol. Graph. Model., 21 391
I.E. Frank J.H. Friedman (1993) Technometrics, 35 109
A. Lorber B.R. Kowalski (1988) J. Chemom., 2 67
H. Mark (1991) Principles and Practice of Spectroscopic Calibration Wiley New York
Geladi, P., In Andrews, D.L. and Davies, A.M.C. (Eds.), Frontiers in Analytical Spectroscopy, The Royal Society of Chemistry, London, 1995.
A. Dax (1992) SIAM J. Optimization, 2 602
H.L. Taylor S.C. Banks J.F. McCoy (1979) Geophysics, 44 39
F. Santosa W. Symes (1986) SIAM J. Sci. Stat. Comput., 7 1307
M. Song C.M. Breneman J. Bi N. Sukumar K.P. Bennett C. Cramer N. Tugcu (2002) J. Chem. Info. Comput. Sci., 42 1347
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kalivas, J.H., Forrester, J.B. & Seipel, H.A. QSAR modeling based on the bias/variance compromise: a harmonious. J Comput Aided Mol Des 18, 537–547 (2004). https://doi.org/10.1007/s10822-004-4063-5
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10822-004-4063-5