Classifying the World Anti-Doping Agency’s 2005 Prohibited List Using the Chemistry Development Kit Fingerprint

  • Edward O. Cannon
  • John B. O. Mitchell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4216)


We used the freely available Chemistry Development Kit (CDK) fingerprint to classify 5235 representative molecules taken from ten banned classes in the 2005 World Anti-Doping Agency’s (WADA) prohibited list, including molecules taken from the corresponding activity classes in the MDL Drug Data Report (MDDR). We used both Random Forest and k-Nearest Neighbours (kNN) algorithms to generate classifiers. The kNN classifiers withk = 1 gave a very slightly better Matthews Correlation Coefficient than the Random Forest classifiers; the latter, however, predicted fewer false positives. The performance of kNN classifiers tended to decline with increasing k. The performance of the CDK fingerprint is essentially equivalent to that of Unity 2D. Our results suggest that it will be possible to use freely available chemoinformatics tools to aid the fight against drugs in sport, while minimising the risk of wrongfully penalising innocent athletes.


Chemical Space Random Forest Classifier Unity Fingerprint Query Molecule Similar Biological Effect 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    World Anti-Doping Agency (WADA), Stock Exchange Tower, 800 Place Victoria (Suite 1700), P.O. Box 120, Montreal, Quebec, H4Z 1B7, Canada,
  2. 2.
    Handelsman, D.J.: Designer Androgens in Sport: When too Much is Never Enough. Sci. STKE (244), 41 (2004)Google Scholar
  3. 3.
    Death, A.K., McGrath, K.C.Y., Kazlauskas, R., Handelsman, D.J.: Tetrahydrogestrinone is a Potent Androgen and Progestin. J. Clin. Endocrinol. Metab. 89, 2498–2500 (2004)CrossRefGoogle Scholar
  4. 4.
    Kontaxakis, S.G., Christodoulou, M.A.: A Neural Network System for Doping Detection in Athletes. In: Proceedings 4th International Conference on Technology and Automation, Thessaloniki, Greece (October 2002)Google Scholar
  5. 5.
    Cannon, E.O., Bender, A., Palmer, D.S., Mitchell, J.B.O.: Chemoinformatics-based Classification of Prohibited Substances Employed for Doping in Sport. J. Chem. Inf. Model (submitted)Google Scholar
  6. 6.
  7. 7.
    Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003)Google Scholar
  8. 8.
    Tripos Inc., 1699 South Hanley Road, St. Louis, MO 63144-2319, USA,
  9. 9.
    Elsevier MDL, 2440 Camino Ramon, San Ramon, CA 94583, USA,
  10. 10.
    Daylight Chemical Information Systems, Inc. 120 Vantis - Suite 550 - Aliso Viejo, CA 92656, USA,
  11. 11.
    Wild, D., Blankley, C.J.: Comparison of 2D Fingerprint Types and Hierarchy Level Selection Methods for Structural Grouping Using Ward’s Clustering. J. Chem. Inf. Comput. Sci. 40, 155–162 (2000)Google Scholar
  12. 12.
    R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005), ISBN 3-900051-07-0
  13. 13.
    Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  14. 14.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the Accuracy of Prediction Algorithms for Classification: An Overview. Bioinformatics 16, 412–424 (2000)CrossRefGoogle Scholar
  15. 15.
    Lam, L., Suen, C.Y.: Application of Majority Voting to Pattern Recognition: An Analysis of its Behavior and Performance. IEEE Trans. Systems, Man and Cybernetics 27, 553–567 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Edward O. Cannon
    • 1
  • John B. O. Mitchell
    • 1
  1. 1.Unilever Centre for Molecular Science Informatics, Department of ChemistryUniversity of CambridgeCambridgeUnited Kingdom

Personalised recommendations