Skip to main content
Log in

Managing bias in ROC curves

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed. The first is to replace the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing “area under the curve” (AUC) calculations. Doing so is a simple way to bias the statistic to favor identification of “hits” early in the recovery curve rather than late. A second suggested modification entails weighting each active based on the size of the lead series to which it belongs. Two weighting schemes are described: arithmetic, in which the weight for each active is inversely proportional to the size of the cluster from which it comes; and harmonic, in which weights are inversely proportional to the rank of each active within its class. Either scheme is able to distinguish biased from unbiased screening statistics, but the harmonically weighted AUC in particular emphasizes the ability to place representatives of each class of active early in the recovery curve.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Note that βi can be calculated from the active and inactive ranks as r i  − i, where r i is the rank of the ith-best active across all compounds. If ties occur, they should be resolved by assigning the average of the contested ranks: observations tied for ranks 5 and 6 each receive a rank of 5.5, for example [10].

  2. It would be convenient to have a term to denote log(1/β) that parallels the related concept of “potency” in medicinal chemistry. “Stringency” is suggested here because of its extensive use in biochemistry in connection with semi-log plots of interactions with and between nucleic acids. It has no conflicting meaning in the standard statistical literature, though it has been used in connection with the logarithm of risk (See [11]).

  3. Such semilog plots will be referred to as pROC plots here, because the abscissa is labeled in terms of β and not reversed in orientation, it is scaled logarithmically.

References

  1. Jain AN (2000) J Comput-Aided Mol Des 14:199–213

    Article  CAS  Google Scholar 

  2. Cuissart B, Touffet F, Cremilleux B, Bureau R, Raul S (2002) J Chem Inf Comput Sci 42:1043–1052

    Article  CAS  Google Scholar 

  3. Jain AN (2004) J Med Chem 47:947–961

    Article  CAS  Google Scholar 

  4. Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O (2005) J Med Chem 48:2534–2547

    Article  CAS  Google Scholar 

  5. Egan JP (1975) Signal detection theory and ROC analysis. Academic Press, New York

    Google Scholar 

  6. Truchon J-F, Bayly CI (2007) J Chem Inf Model 47:488–508

    Article  CAS  Google Scholar 

  7. Sheridan RP, Singh SB, Fluder EM, Kearsley SK (2001) J Chem Inf Comput Sci 41:1395–1406

    Article  CAS  Google Scholar 

  8. Good AC, Hermsmeier MA, Hindle SA (2004) J Comput-Aided Mol Des 18:529–536

    Article  CAS  Google Scholar 

  9. Good AC, Oprea TI (2008) J Comput Aided Mol Des 22. doi:10.1007/s10822-007-9167-2

  10. Daniel WW (1978) Applied nonparametric statistics. Houghton-Mifflin Co., Boston

    Google Scholar 

  11. Hamilton JT, Viscusi WK (1999) Calculating Risks? The Spatial and Political Dimensions of Hazardous Waste Policy. MIT Press, Boston

    Google Scholar 

  12. Furet P, Bold G, Meyer T, Roesel J, Guagnano V (2006) J Med Chem 49:4451–4454

    Article  CAS  Google Scholar 

  13. McGaughey GB, Sheridan RP, Bayly CI, Culberson JC, Kreatsoulas C, Lindsley S, Maiorov V, Truchon J-F, Cornell WD (2007) J Chem Inf Model 47:1504–1519

    Article  CAS  Google Scholar 

  14. Halgren TA, Murphy RB, Friesner RB, Beard HS, Frye LL, Pollard WT, Banks JL (2004) J Med Chem 47:1750–1759

    Article  CAS  Google Scholar 

  15. Schellhammer I, Rarey M (2007) J Comput Aided Mol Des 21:223–238

    Article  CAS  Google Scholar 

  16. Shepphird JK, Clark RD (2006) J Comput Aided Mol Des 20:763–771

    Article  CAS  Google Scholar 

  17. Snedecor GW, Cochran WG (1989) Statistical Methods, 8th edn. Iowa State Press, Ames IA

    Google Scholar 

  18. Cole JC, Murray CW, Nissink JWM, Taylor RD, Taylor R (2005) PROTEINS 60:325–332

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors appreciate the helpful suggestions provided by Peter Willett of the University of Sheffield and by our anonymous reviewers, and wish to thank Ajay Jain (UCSF) and Anthony Nicholls (OpenEye) for organizing the American Chemical Society Symposium that led to this special issue of the Journal.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert D. Clark.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clark, R.D., Webster-Clark, D.J. Managing bias in ROC curves. J Comput Aided Mol Des 22, 141–146 (2008). https://doi.org/10.1007/s10822-008-9181-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-008-9181-z

Keywords

Navigation