Managing bias in ROC curves

  • Robert D. Clark
  • Daniel J. Webster-Clark


Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed. The first is to replace the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing “area under the curve” (AUC) calculations. Doing so is a simple way to bias the statistic to favor identification of “hits” early in the recovery curve rather than late. A second suggested modification entails weighting each active based on the size of the lead series to which it belongs. Two weighting schemes are described: arithmetic, in which the weight for each active is inversely proportional to the size of the cluster from which it comes; and harmonic, in which weights are inversely proportional to the rank of each active within its class. Either scheme is able to distinguish biased from unbiased screening statistics, but the harmonically weighted AUC in particular emphasizes the ability to place representatives of each class of active early in the recovery curve.


Early recognition ROC AUC Virtual screening 



The authors appreciate the helpful suggestions provided by Peter Willett of the University of Sheffield and by our anonymous reviewers, and wish to thank Ajay Jain (UCSF) and Anthony Nicholls (OpenEye) for organizing the American Chemical Society Symposium that led to this special issue of the Journal.


  1. 1.
    Jain AN (2000) J Comput-Aided Mol Des 14:199–213CrossRefGoogle Scholar
  2. 2.
    Cuissart B, Touffet F, Cremilleux B, Bureau R, Raul S (2002) J Chem Inf Comput Sci 42:1043–1052CrossRefGoogle Scholar
  3. 3.
    Jain AN (2004) J Med Chem 47:947–961CrossRefGoogle Scholar
  4. 4.
    Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O (2005) J Med Chem 48:2534–2547CrossRefGoogle Scholar
  5. 5.
    Egan JP (1975) Signal detection theory and ROC analysis. Academic Press, New YorkGoogle Scholar
  6. 6.
    Truchon J-F, Bayly CI (2007) J Chem Inf Model 47:488–508CrossRefGoogle Scholar
  7. 7.
    Sheridan RP, Singh SB, Fluder EM, Kearsley SK (2001) J Chem Inf Comput Sci 41:1395–1406CrossRefGoogle Scholar
  8. 8.
    Good AC, Hermsmeier MA, Hindle SA (2004) J Comput-Aided Mol Des 18:529–536CrossRefGoogle Scholar
  9. 9.
    Good AC, Oprea TI (2008) J Comput Aided Mol Des 22. doi: 10.1007/s10822-007-9167-2
  10. 10.
    Daniel WW (1978) Applied nonparametric statistics. Houghton-Mifflin Co., Boston Google Scholar
  11. 11.
    Hamilton JT, Viscusi WK (1999) Calculating Risks? The Spatial and Political Dimensions of Hazardous Waste Policy. MIT Press, BostonGoogle Scholar
  12. 12.
    Furet P, Bold G, Meyer T, Roesel J, Guagnano V (2006) J Med Chem 49:4451–4454CrossRefGoogle Scholar
  13. 13.
    McGaughey GB, Sheridan RP, Bayly CI, Culberson JC, Kreatsoulas C, Lindsley S, Maiorov V, Truchon J-F, Cornell WD (2007) J Chem Inf Model 47:1504–1519CrossRefGoogle Scholar
  14. 14.
    Halgren TA, Murphy RB, Friesner RB, Beard HS, Frye LL, Pollard WT, Banks JL (2004) J Med Chem 47:1750–1759CrossRefGoogle Scholar
  15. 15.
    Schellhammer I, Rarey M (2007) J Comput Aided Mol Des 21:223–238CrossRefGoogle Scholar
  16. 16.
    Shepphird JK, Clark RD (2006) J Comput Aided Mol Des 20:763–771CrossRefGoogle Scholar
  17. 17.
    Snedecor GW, Cochran WG (1989) Statistical Methods, 8th edn. Iowa State Press, Ames IAGoogle Scholar
  18. 18.
    Cole JC, Murray CW, Nissink JWM, Taylor RD, Taylor R (2005) PROTEINS 60:325–332CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Tripos Informatics Research CenterSaint LouisUSA
  2. 2.Washington University in St. LouisSt. LouisUSA

Personalised recommendations