Managing bias in ROC curves
Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed. The first is to replace the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing “area under the curve” (AUC) calculations. Doing so is a simple way to bias the statistic to favor identification of “hits” early in the recovery curve rather than late. A second suggested modification entails weighting each active based on the size of the lead series to which it belongs. Two weighting schemes are described: arithmetic, in which the weight for each active is inversely proportional to the size of the cluster from which it comes; and harmonic, in which weights are inversely proportional to the rank of each active within its class. Either scheme is able to distinguish biased from unbiased screening statistics, but the harmonically weighted AUC in particular emphasizes the ability to place representatives of each class of active early in the recovery curve.
KeywordsEarly recognition ROC AUC Virtual screening
The authors appreciate the helpful suggestions provided by Peter Willett of the University of Sheffield and by our anonymous reviewers, and wish to thank Ajay Jain (UCSF) and Anthony Nicholls (OpenEye) for organizing the American Chemical Society Symposium that led to this special issue of the Journal.
- 5.Egan JP (1975) Signal detection theory and ROC analysis. Academic Press, New YorkGoogle Scholar
- 9.Good AC, Oprea TI (2008) J Comput Aided Mol Des 22. doi: 10.1007/s10822-007-9167-2
- 10.Daniel WW (1978) Applied nonparametric statistics. Houghton-Mifflin Co., Boston Google Scholar
- 11.Hamilton JT, Viscusi WK (1999) Calculating Risks? The Spatial and Political Dimensions of Hazardous Waste Policy. MIT Press, BostonGoogle Scholar
- 17.Snedecor GW, Cochran WG (1989) Statistical Methods, 8th edn. Iowa State Press, Ames IAGoogle Scholar