Abstract
Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed. The first is to replace the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing “area under the curve” (AUC) calculations. Doing so is a simple way to bias the statistic to favor identification of “hits” early in the recovery curve rather than late. A second suggested modification entails weighting each active based on the size of the lead series to which it belongs. Two weighting schemes are described: arithmetic, in which the weight for each active is inversely proportional to the size of the cluster from which it comes; and harmonic, in which weights are inversely proportional to the rank of each active within its class. Either scheme is able to distinguish biased from unbiased screening statistics, but the harmonically weighted AUC in particular emphasizes the ability to place representatives of each class of active early in the recovery curve.
Similar content being viewed by others
Notes
Note that βi can be calculated from the active and inactive ranks as r i − i, where r i is the rank of the ith-best active across all compounds. If ties occur, they should be resolved by assigning the average of the contested ranks: observations tied for ranks 5 and 6 each receive a rank of 5.5, for example [10].
It would be convenient to have a term to denote log(1/β) that parallels the related concept of “potency” in medicinal chemistry. “Stringency” is suggested here because of its extensive use in biochemistry in connection with semi-log plots of interactions with and between nucleic acids. It has no conflicting meaning in the standard statistical literature, though it has been used in connection with the logarithm of risk (See [11]).
Such semilog plots will be referred to as pROC plots here, because the abscissa is labeled in terms of β and not reversed in orientation, it is scaled logarithmically.
References
Jain AN (2000) J Comput-Aided Mol Des 14:199–213
Cuissart B, Touffet F, Cremilleux B, Bureau R, Raul S (2002) J Chem Inf Comput Sci 42:1043–1052
Jain AN (2004) J Med Chem 47:947–961
Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O (2005) J Med Chem 48:2534–2547
Egan JP (1975) Signal detection theory and ROC analysis. Academic Press, New York
Truchon J-F, Bayly CI (2007) J Chem Inf Model 47:488–508
Sheridan RP, Singh SB, Fluder EM, Kearsley SK (2001) J Chem Inf Comput Sci 41:1395–1406
Good AC, Hermsmeier MA, Hindle SA (2004) J Comput-Aided Mol Des 18:529–536
Good AC, Oprea TI (2008) J Comput Aided Mol Des 22. doi:10.1007/s10822-007-9167-2
Daniel WW (1978) Applied nonparametric statistics. Houghton-Mifflin Co., Boston
Hamilton JT, Viscusi WK (1999) Calculating Risks? The Spatial and Political Dimensions of Hazardous Waste Policy. MIT Press, Boston
Furet P, Bold G, Meyer T, Roesel J, Guagnano V (2006) J Med Chem 49:4451–4454
McGaughey GB, Sheridan RP, Bayly CI, Culberson JC, Kreatsoulas C, Lindsley S, Maiorov V, Truchon J-F, Cornell WD (2007) J Chem Inf Model 47:1504–1519
Halgren TA, Murphy RB, Friesner RB, Beard HS, Frye LL, Pollard WT, Banks JL (2004) J Med Chem 47:1750–1759
Schellhammer I, Rarey M (2007) J Comput Aided Mol Des 21:223–238
Shepphird JK, Clark RD (2006) J Comput Aided Mol Des 20:763–771
Snedecor GW, Cochran WG (1989) Statistical Methods, 8th edn. Iowa State Press, Ames IA
Cole JC, Murray CW, Nissink JWM, Taylor RD, Taylor R (2005) PROTEINS 60:325–332
Acknowledgements
The authors appreciate the helpful suggestions provided by Peter Willett of the University of Sheffield and by our anonymous reviewers, and wish to thank Ajay Jain (UCSF) and Anthony Nicholls (OpenEye) for organizing the American Chemical Society Symposium that led to this special issue of the Journal.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Clark, R.D., Webster-Clark, D.J. Managing bias in ROC curves. J Comput Aided Mol Des 22, 141–146 (2008). https://doi.org/10.1007/s10822-008-9181-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-008-9181-z