Recent Advances in the Open Access Cheminformatics Toolkits, Software Tools, Workflow Environments, and Databases

  • Pravin Ambure
  • Rahul Balasaheb Aher
  • Kunal Roy
Part of the Methods in Pharmacology and Toxicology book series (MIPT)


Cheminformatics utilizes various computational techniques to solve a wide variety of drug discovery problems, including drug design and predictive toxicology. These computational exercises employ various toolkits/libraries, workflows, databases, etc. for their applications in lead optimization, virtual screening, chemical database mining, structure-activity/toxicity studies, etc. It is therefore important for such techniques to be freely available. Open-access resources permit free use and redistribution of a product via a free license, while open-source resources also provide source code that can be utilized to modify the product. In order to extract the knowledge from enormous amount of data that accumulates at a staggering rate, open-access or open-source cheminformatics packages also need to be efficient and user-friendly. In this chapter, we record the recent advances in freely available (including both open access and open source) cheminformatics toolkits, software (stand-alone and online applications), workflow environment, and databases. The objective of this chapter is to get the readers acquainted with the freely available resources, so that they can utilize those tools for solving different drug discovery challenges. We will start with the toolkit/libraries such as Chemistry Development Kit (CDK), Open Babel, RDKit, ChemmineR, Indigo, chemf, etc., which provide various functionalities that can aid researchers to develop their own cheminformatics software/applications. Next we will discuss various cheminformatics software tools, including iDrug, PharmDock, DecoyFinder, DemQSAR, Chembench, etc. which have recently been developed with a wide variety of applications. We will further discuss workflow environments, including Konstanz Information Miner (KNIME), Taverna, recent combinations, i.e., CDK-KNIME or CDK-Taverna and their contributions in the cheminformatics field. At the end, we will briefly touch various recent databases, such as QSAR DataBank, VAMMPIRE, CREDO, PubChem3D, MMsINC, etc., and their applications. The open-access resources covered in this chapter would enable the medicinal chemists and cheminformaticians to solve various problems encountered during their research.

Key words

Open source Open access Cheminformatics Tool kits Software Databases Stand-alone tools Online tools Workflows 


  1. 1.
    Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500CrossRefPubMedGoogle Scholar
  2. 2.
    O'boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Landrum G (2013) RDKit: cheminformatics and machine learning software. rdkit.orgGoogle Scholar
  4. 4.
  5. 5.
    Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) The blue obelisk interoperability in chemical informatics. J Chem Inf Model 46:991–998CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    O'Boyle NM, Guha R, Willighagen EL, Adams SE, Alvarsson J, Bradley J-C, Filippov IV, Hanson RM, Hanwell MD, Hutchison GR (2011) Open data, open source and open standards in chemistry: the Blue Obelisk five years on. J Cheminform 3:37CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JES (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics 8:59CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Hanwell MD, Curtis DE, Lonie DC, Vandermeersch T, Zurek E, Hutchison GR (2012) Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J Cheminform 4:17CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Guha R (2006) CDK descriptor calculator GUI.
  10. 10.
    Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12:2111–2120CrossRefPubMedGoogle Scholar
  11. 11.
    O'Boyle NM, Hutchison GR (2008) Cinfony—combining Open Source cheminformatics toolkits behind a common interface. Chem Cent J 2:24CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM (2009) Small Molecule Subgraph Detector (SMSD) toolkit. J Cheminform 1:12CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Hildebrandt A, Dehof AK, Rurainski A, Bertsch A, Schumann M, Toussaint NC, Moll A, Stockel D, Nickels S, Mueller SC (2010) BALL-biochemical algorithms library 1.3. BMC Bioinformatics 11:531CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Hinselmann G, Rosenbaum L, Jahn A, Fechner N, Zell A (2011) jCompoundMapper: an open source Java library and command-line tool for chemical fingerprints. J Cheminform 3:3CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Hock S, Riedl R (2012) chemf: a purely functional chemistry toolkit. J Cheminform 4:1–19CrossRefGoogle Scholar
  16. 16.
    Cao D-S, Xu Q-S, Hu Q-N, Liang Y-Z (2013) ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 29:1092–1094CrossRefPubMedGoogle Scholar
  17. 17.
    Cao Y, Charisi A, Cheng L-C, Jiang T, Girke T (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24:1733–1734CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Cao D-S, Xiao N, Xu Q-S, Chen AF (2014) Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds, and their interactions. Bioinformatics. doi: 10.1093/bioinformatics/btu1624 Google Scholar
  19. 19.
  20. 20.
    Herraez A (2006) Biomolecules in the computer: Jmol to the rescue. Biochem Mol Biol Educ 34:255–261CrossRefPubMedGoogle Scholar
  21. 21.
    Krause S, Willighagen E, Steinbeck C (2000) JChemPaint—using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 5:93–98CrossRefGoogle Scholar
  22. 22.
  23. 23.
    Bashton M, Nobeli I, Thornton JM (2006) Cognate ligand domain mapping for enzymes. J Mol Biol 364:836–852CrossRefPubMedGoogle Scholar
  24. 24.
    Rojas-Cherto M, Kasper PT, Willighagen EL, Vreeken RJ, Hankemeier T, Reijmers TH (2011) Elemental composition determination based on MSn. Bioinformatics 27:2376–2383CrossRefPubMedGoogle Scholar
  25. 25.
    Steinbeck C (2001) SENECA: a platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J Chem Inf Comput Sci 41:1500–1507CrossRefPubMedGoogle Scholar
  26. 26.
    Steinbeck C, Kuhn S (2004) NMRShiftDB—compound identification and structure elucidation support through a free community-built web database. Phytochemistry 65:2711–2717CrossRefPubMedGoogle Scholar
  27. 27.
    Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474CrossRefPubMedGoogle Scholar
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
    O'Boyle NM, Morley C, Hutchison GR (2008) Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem Cent J 2:5CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
  39. 39.
    Pavlov D, Rybalkin M, Karulin B, Kozhevnikov M, Savelyev A, Churinov A (2011) Indigo: universal cheminformatics API. J Cheminform 3:P4CrossRefPubMedCentralGoogle Scholar
  40. 40.
  41. 41.
  42. 42.
  43. 43.
    Jarvis RM, Broadhurst D, Johnson H, O'Boyle NM, Goodacre R (2006) PYCHEM: a multivariate analysis package for python. Bioinformatics 22:2565–2566CrossRefPubMedGoogle Scholar
  44. 44.
    Wang Y, Backman TWH, Horan K, Girke T (2013) fmcsR: mismatch tolerant maximum common substructure searching in R. Bioinformatics 29:2792–2794CrossRefPubMedGoogle Scholar
  45. 45.
    Cao Y, Jiang T, Girke T (2010) Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Bioinformatics 26:953–959CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Hoksza D, Skoda P, Vorsilak M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminform 3:32Google Scholar
  47. 47.
    Schling B (2011) The boost C++ libraries. XML Press, Laguna Hills, CAGoogle Scholar
  48. 48.
    Hu B, Lill MA (2014) PharmDock: a pharmacophore-based docking program. J Cheminform 6:1–14CrossRefGoogle Scholar
  49. 49.
    Plewczynski D, Lazniewski M, Augustyniak R, Ginalski K (2011) Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J Comput Chem 32:742–755CrossRefPubMedGoogle Scholar
  50. 50.
    Li X, Li Y, Cheng T, Liu Z, Wang R (2010) Evaluation of the performance of four molecular docking programs on a diverse set of protein–ligand complexes. J Comput Chem 31:2109–2125CrossRefPubMedGoogle Scholar
  51. 51.
    Cereto-Massague A, Ojeda MJ, Joosten RP, Valls C, Mulero M, Salvado MJ, Arola-Arnal A, Arola L, Garcia-Vallve S, Pujadas G (2013) The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J Cheminform 5:36CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Sehnal D, Varekova RS, Berka K, Pravda L, Navratilova V, Banas P, Ionescu C-M, Otyepka M, Koca J (2013) MOLE 2.0: advanced approach for analysis of biomacromolecular channels. J Cheminform 5:39CrossRefPubMedPubMedCentralGoogle Scholar
  53. 53.
    Petrek M, Kosinova P, Koca J, Otyepka M (2007) MOLE: a Voronoi diagram-based explorer of molecular channels, pores, and tunnels. Structure 15:1357–1363CrossRefPubMedGoogle Scholar
  54. 54.
    Yaffe E, Fishelovitch D, Wolfson HJ, Halperin D, Nussinov R (2008) MolAxis: efficient and accurate identification of channels in macromolecules. Proteins 73:72–86CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Yaffe E, Fishelovitch D, Wolfson HJ, Halperin D, Nussinov R (2008) MolAxis: a server for identification of channels in macromolecules. Nucleic Acids Res 36:W210–W215CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    Chovancova E, Pavelka A, Benes P, Strnad O, Brezovsky J, Kozlikova B, Gora A, Sustr V, Klvana M, Medek P (2012) CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput Biol 8:e1002708CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Khashan R (2012) FragVLib a free database mining software for generating “Fragment-based Virtual Library” using pocket similarity search of ligand-receptor complexes. J Cheminform 4:1–6CrossRefGoogle Scholar
  58. 58.
    Ekins S, Clark AM, Sarker M (2013) TB Mobile: a mobile app for anti-tuberculosis molecules with known targets. J Cheminform 5:13CrossRefPubMedPubMedCentralGoogle Scholar
  59. 59.
    Bienfait B, Ertl P (2013) JSME: a free molecule editor in JavaScript. J Cheminform 5:24CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Gutlein M, Karwath A, Kramer S (2012) CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminform 4:7CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Le Guilloux V, Arrault A, Colliandre L, Bourg SP, Vayer P, Morin-Allory L (2012) Mining collections of compounds with Screening Assistant 2. J Cheminform 4:1–16CrossRefGoogle Scholar
  62. 62.
    Sud M, Fahy E, Subramaniam S (2012) Template-based combinatorial enumeration of virtual compound libraries for lipids. J Cheminform 4:23CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Cereto-Massague A, Guasch L, Valls C, Mulero M, Pujadas G, Garcia-Vallve S (2012) DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets. Bioinformatics 28:1661–1662CrossRefPubMedGoogle Scholar
  64. 64.
    Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801CrossRefPubMedPubMedCentralGoogle Scholar
  65. 65.
    Irwin JJ (2008) Community benchmarks for virtual screening. J Comput Aided Mol Des 22:193–199CrossRefPubMedGoogle Scholar
  66. 66.
    Wallach I, Lilien R (2011) Virtual decoy sets for molecular docking benchmarks. J Chem Inf Model 51:196–202CrossRefPubMedGoogle Scholar
  67. 67.
    Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42:1273–1280CrossRefPubMedGoogle Scholar
  68. 68.
    Kerber A, Laue R, Gruner T, Meringer M (1998) MOLGEN 4.0. MATCH Commun Math Comput Chem 37:205–208Google Scholar
  69. 69.
    Peironcely JE, Rojas-Cherto M, Fichera D, Reijmers TH, Coulier L, Faulon J-L, Hankemeier T (2012) OMG: open molecule generator. J Cheminform 4:21CrossRefPubMedPubMedCentralGoogle Scholar
  70. 70.
    Brefo-Mensah EK, Palmer M (2012) mol2chemfig, a tool for rendering chemical structures from molfile or SMILES format to LATEX code. J Cheminform 4:24CrossRefPubMedPubMedCentralGoogle Scholar
  71. 71.
    Lawson KR, Lawson J (2012) LICSS—a chemical spreadsheet in microsoft excel. J Cheminform 4:1–7CrossRefGoogle Scholar
  72. 72.
    Wilhelm J-H (2011) MyChemise: a 2D drawing program that uses morphing for visualisation purposes. J Cheminform 3:53CrossRefPubMedPubMedCentralGoogle Scholar
  73. 73.
    Tosco P, Balle T, Shiri F (2011) Open3DALIGN: an open-source software aimed at unsupervised ligand alignment. J Comput Aided Mol Des 25:777–783CrossRefPubMedGoogle Scholar
  74. 74.
    Norgan AP, Coffman PK, Kocher J-P, Katzmann DJ, Sosa CP (2011) Multilevel parallelization of AutoDock 4.2. J Cheminform 3:12CrossRefPubMedPubMedCentralGoogle Scholar
  75. 75.
    Demir-Kavuk O, Bentzien J, Muegge I, Knapp E-W (2011) DemQSAR: predicting human volume of distribution and clearance of drugs. J Comput Aided Mol Des 25:1121–1133CrossRefPubMedGoogle Scholar
  76. 76.
    Jimmy R, Laurence M, Serge P (2009) Shape: automatic conformation prediction of carbohydrates using a genetic algorithm. J Cheminform 1:1–7CrossRefGoogle Scholar
  77. 77.
    Rijnbeek M, Steinbeck C (2010) OrChem: an open source chemistry search engine for Oracle. J Cheminform 2:P28CrossRefPubMedCentralGoogle Scholar
  78. 78.
    Gramatica P, Chirico N, Papa E, Cassani S, Kovarich S (2013) QSARINS: a new software for the development, analysis, and validation of QSAR MLR models. J Comput Chem 34:2121–2132CrossRefGoogle Scholar
  79. 79.
  80. 80.
  81. 81.
  82. 82.
  83. 83.
    Wang X, Chen H, Yang F, Gong J, Li S, Pei J, Liu X, Jiang H, Lai L, Li H (2014) iDrug: a web-accessible and interactive drug discovery and design platform. J Cheminform 6:1–8CrossRefGoogle Scholar
  84. 84.
    Oprisiu I, Novotarskyi S, Tetko IV (2013) Modeling of non-additive mixture properties using the Online CHEmical database and Modeling environment (OCHEM). J Cheminform 5:4CrossRefPubMedPubMedCentralGoogle Scholar
  85. 85.
    Walker T, Grulke CM, Pozefsky D, Tropsha A (2010) Chembench: a cheminformatics workbench. Bioinformatics 26:3000–3001CrossRefPubMedPubMedCentralGoogle Scholar
  86. 86.
    Zhang L, Zhu H, Oprea T, Golbraikh A, Tropsha A (2008) QSAR modeling of the blood-brain barrier permeability for diverse organic compounds. Pharm Res 25:1902–1914CrossRefPubMedGoogle Scholar
  87. 87.
    Breiman L (2001) Random forests. Mach Learn 1:5–32CrossRefGoogle Scholar
  88. 88.
    Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668–D672CrossRefPubMedGoogle Scholar
  89. 89.
    Olah M, Mracec M, Ostopovici L, Rad R, Bora A, Hadaruga N, Olah I, Banda M, Simon Z, Mracec M (2004) WOMBAT: world of molecular bioactivity. Chemoinf. Drug Disc., Wiley-VCH, New York, 223–239Google Scholar
  90. 90.
    Bradley J-C, Lancashire RJ, Lang ASID, Williams AJ (2009) The Spectral Game: leveraging Open Data and crowdsourcing for education. J Cheminform 1:1–10CrossRefGoogle Scholar
  91. 91.
  92. 92.
    Tiwari A, Sekhar AKT (2007) Workflow based framework for life science informatics. Comput Biol Chem 31:305–319CrossRefPubMedGoogle Scholar
  93. 93.
  94. 94.
  95. 95.
    Warr WA (2012) Scientific workflow systems: Pipeline pilot and KNIME. J Comput Aided Mol Des 26:1–4Google Scholar
  96. 96.
    Tan W, Madduri R, Nenadic A, Soiland-Reyes S, Sulakhe D, Foster I, Goble CA (2010) CaGrid Workflow Toolkit: a taverna based workflow tool for cancer grid. BMC Bioinformatics 11:542CrossRefPubMedPubMedCentralGoogle Scholar
  97. 97.
  98. 98.
    Kuhn T, Willighagen EL, Zielesny A, Steinbeck C (2010) CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinformatics 11:159CrossRefPubMedPubMedCentralGoogle Scholar
  99. 99.
  100. 100.
    Truszkowski A, Jayaseelan KV, Neumann S, Willighagen EL, Zielesny A, Steinbeck C (2011) New developments on the cheminformatics open workflow environment CDK-Taverna. J Cheminform 3:54CrossRefPubMedPubMedCentralGoogle Scholar
  101. 101.
    Fiannaca A, La Rosa M, Di Fatta G, Gaglio S, Rizzo R, Urso A (2014) The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration. J Cheminform 6:1–6CrossRefGoogle Scholar
  102. 102.
    Beisken S, Meinl T, Wiswedel B, de Figueiredo LF, Berthold MR, Steinbeck C (2013) KNIME-CDK: workflow-driven cheminformatics. BMC Bioinformatics 14:257CrossRefPubMedPubMedCentralGoogle Scholar
  103. 103.
    Ruusmann V, Sild S, Maran U (2014) QSAR DataBank—an approach for the digital organization and archiving of QSAR model information. J Cheminform 6:25CrossRefPubMedPubMedCentralGoogle Scholar
  104. 104.
    Weber J, Achenbach J, Moser D, Proschak E (2013) VAMMPIRE: a matched molecular pairs database for structure-based drug design and optimization. J Med Chem 56:5203–5207CrossRefPubMedGoogle Scholar
  105. 105.
    Bolton E, Chen J, Kim S, Han L, He S, Shi W, Simonyan V, Sun Y, Thiessen PA, Wang J (2011) PubChem3D: a new resource for scientists. J Cheminform 3:32CrossRefPubMedPubMedCentralGoogle Scholar
  106. 106.
    Masciocchi J, Frau G, Fanton M, Sturlese M, Floris M, Pireddu L, Palla P, Cedrati F, Rodriguez P, Moro S (2009) MMsINC: a large-scale chemoinformatics database. Nucleic Acids Res 37:D284–D290CrossRefPubMedGoogle Scholar
  107. 107.
    Schreyer A, Blundell T (2009) CREDO: a protein-ligand interaction database for drug discovery. Chem Biol Drug Des 73:157–167CrossRefPubMedGoogle Scholar
  108. 108.
    Seiler KP, George GA, Happ MP, Bodycombe NE, Carrinski HA, Norton S, Brudz S, Sullivan JP, Muhlich J, Serrano M (2008) ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res 36:D351–D359CrossRefPubMedGoogle Scholar
  109. 109.
    Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P (2005) ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics 21:4133–4139CrossRefPubMedGoogle Scholar
  110. 110.
    Girke T, Cheng L-C, Raikhel N (2005) ChemMine. A compound mining database for chemical genomics. Plant Physiol 138:573–577CrossRefPubMedPubMedCentralGoogle Scholar
  111. 111.
    Milne GWA, Nicklaus MC, Driscoll JS, Wang S, Zaharevitz D (1994) National Cancer Institute drug information system 3D database. J Chem Inf Comput Sci 34:1219–1224CrossRefPubMedGoogle Scholar
  112. 112.
    Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87:1123–1124CrossRefGoogle Scholar
  113. 113.
    Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Pravin Ambure
    • 1
  • Rahul Balasaheb Aher
    • 1
  • Kunal Roy
    • 1
  1. 1.Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical TechnologyJadavpur UniversityKolkataIndia

Personalised recommendations