Abstract
Cheminformatics utilizes various computational techniques to solve a wide variety of drug discovery problems, including drug design and predictive toxicology. These computational exercises employ various toolkits/libraries, workflows, databases, etc. for their applications in lead optimization, virtual screening, chemical database mining, structure-activity/toxicity studies, etc. It is therefore important for such techniques to be freely available. Open-access resources permit free use and redistribution of a product via a free license, while open-source resources also provide source code that can be utilized to modify the product. In order to extract the knowledge from enormous amount of data that accumulates at a staggering rate, open-access or open-source cheminformatics packages also need to be efficient and user-friendly. In this chapter, we record the recent advances in freely available (including both open access and open source) cheminformatics toolkits, software (stand-alone and online applications), workflow environment, and databases. The objective of this chapter is to get the readers acquainted with the freely available resources, so that they can utilize those tools for solving different drug discovery challenges. We will start with the toolkit/libraries such as Chemistry Development Kit (CDK), Open Babel, RDKit, ChemmineR, Indigo, chemf, etc., which provide various functionalities that can aid researchers to develop their own cheminformatics software/applications. Next we will discuss various cheminformatics software tools, including iDrug, PharmDock, DecoyFinder, DemQSAR, Chembench, etc. which have recently been developed with a wide variety of applications. We will further discuss workflow environments, including Konstanz Information Miner (KNIME), Taverna, recent combinations, i.e., CDK-KNIME or CDK-Taverna and their contributions in the cheminformatics field. At the end, we will briefly touch various recent databases, such as QSAR DataBank, VAMMPIRE, CREDO, PubChem3D, MMsINC, etc., and their applications. The open-access resources covered in this chapter would enable the medicinal chemists and cheminformaticians to solve various problems encountered during their research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500
O'boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33
Landrum G (2013) RDKit: cheminformatics and machine learning software. rdkit.org
Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) The blue obelisk interoperability in chemical informatics. J Chem Inf Model 46:991–998
O'Boyle NM, Guha R, Willighagen EL, Adams SE, Alvarsson J, Bradley J-C, Filippov IV, Hanson RM, Hanwell MD, Hutchison GR (2011) Open data, open source and open standards in chemistry: the Blue Obelisk five years on. J Cheminform 3:37
Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JES (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics 8:59
Hanwell MD, Curtis DE, Lonie DC, Vandermeersch T, Zurek E, Hutchison GR (2012) Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J Cheminform 4:17
Guha R (2006) CDK descriptor calculator GUI. http://www.rguha.net/code/java/cdkdesc.html
Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12:2111–2120
O'Boyle NM, Hutchison GR (2008) Cinfony—combining Open Source cheminformatics toolkits behind a common interface. Chem Cent J 2:24
Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM (2009) Small Molecule Subgraph Detector (SMSD) toolkit. J Cheminform 1:12
Hildebrandt A, Dehof AK, Rurainski A, Bertsch A, Schumann M, Toussaint NC, Moll A, Stockel D, Nickels S, Mueller SC (2010) BALL-biochemical algorithms library 1.3. BMC Bioinformatics 11:531
Hinselmann G, Rosenbaum L, Jahn A, Fechner N, Zell A (2011) jCompoundMapper: an open source Java library and command-line tool for chemical fingerprints. J Cheminform 3:3
Hock S, Riedl R (2012) chemf: a purely functional chemistry toolkit. J Cheminform 4:1–19
Cao D-S, Xu Q-S, Hu Q-N, Liang Y-Z (2013) ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 29:1092–1094
Cao Y, Charisi A, Cheng L-C, Jiang T, Girke T (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24:1733–1734
Cao D-S, Xiao N, Xu Q-S, Chen AF (2014) Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds, and their interactions. Bioinformatics. doi:10.1093/bioinformatics/btu1624
Herraez A (2006) Biomolecules in the computer: Jmol to the rescue. Biochem Mol Biol Educ 34:255–261
Krause S, Willighagen E, Steinbeck C (2000) JChemPaint—using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 5:93–98
Bashton M, Nobeli I, Thornton JM (2006) Cognate ligand domain mapping for enzymes. J Mol Biol 364:836–852
Rojas-Cherto M, Kasper PT, Willighagen EL, Vreeken RJ, Hankemeier T, Reijmers TH (2011) Elemental composition determination based on MSn. Bioinformatics 27:2376–2383
Steinbeck C (2001) SENECA: a platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J Chem Inf Comput Sci 41:1500–1507
Steinbeck C, Kuhn S (2004) NMRShiftDB—compound identification and structure elucidation support through a free community-built web database. Phytochemistry 65:2711–2717
Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474
http://www.rdkit.org/docs/Overview.html#the-contrib-directory
http://sourceforge.net/projects/openbabel/files/stats/timeline?dates=2001-11-25+to+2014-11-14
http://scholar.google.co.in/scholar?hl=en&as_sdt=0,5&q=openbabel
O'Boyle NM, Morley C, Hutchison GR (2008) Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem Cent J 2:5
Pavlov D, Rybalkin M, Karulin B, Kozhevnikov M, Savelyev A, Churinov A (2011) Indigo: universal cheminformatics API. J Cheminform 3:P4
Jarvis RM, Broadhurst D, Johnson H, O'Boyle NM, Goodacre R (2006) PYCHEM: a multivariate analysis package for python. Bioinformatics 22:2565–2566
Wang Y, Backman TWH, Horan K, Girke T (2013) fmcsR: mismatch tolerant maximum common substructure searching in R. Bioinformatics 29:2792–2794
Cao Y, Jiang T, Girke T (2010) Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Bioinformatics 26:953–959
Hoksza D, Skoda P, Vorsilak M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminform 3:32
Schling B (2011) The boost C++ libraries. XML Press, Laguna Hills, CA
Hu B, Lill MA (2014) PharmDock: a pharmacophore-based docking program. J Cheminform 6:1–14
Plewczynski D, Lazniewski M, Augustyniak R, Ginalski K (2011) Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J Comput Chem 32:742–755
Li X, Li Y, Cheng T, Liu Z, Wang R (2010) Evaluation of the performance of four molecular docking programs on a diverse set of protein–ligand complexes. J Comput Chem 31:2109–2125
Cereto-Massague A, Ojeda MJ, Joosten RP, Valls C, Mulero M, Salvado MJ, Arola-Arnal A, Arola L, Garcia-Vallve S, Pujadas G (2013) The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J Cheminform 5:36
Sehnal D, Varekova RS, Berka K, Pravda L, Navratilova V, Banas P, Ionescu C-M, Otyepka M, Koca J (2013) MOLE 2.0: advanced approach for analysis of biomacromolecular channels. J Cheminform 5:39
Petrek M, Kosinova P, Koca J, Otyepka M (2007) MOLE: a Voronoi diagram-based explorer of molecular channels, pores, and tunnels. Structure 15:1357–1363
Yaffe E, Fishelovitch D, Wolfson HJ, Halperin D, Nussinov R (2008) MolAxis: efficient and accurate identification of channels in macromolecules. Proteins 73:72–86
Yaffe E, Fishelovitch D, Wolfson HJ, Halperin D, Nussinov R (2008) MolAxis: a server for identification of channels in macromolecules. Nucleic Acids Res 36:W210–W215
Chovancova E, Pavelka A, Benes P, Strnad O, Brezovsky J, Kozlikova B, Gora A, Sustr V, Klvana M, Medek P (2012) CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput Biol 8:e1002708
Khashan R (2012) FragVLib a free database mining software for generating “Fragment-based Virtual Library” using pocket similarity search of ligand-receptor complexes. J Cheminform 4:1–6
Ekins S, Clark AM, Sarker M (2013) TB Mobile: a mobile app for anti-tuberculosis molecules with known targets. J Cheminform 5:13
Bienfait B, Ertl P (2013) JSME: a free molecule editor in JavaScript. J Cheminform 5:24
Gutlein M, Karwath A, Kramer S (2012) CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminform 4:7
Le Guilloux V, Arrault A, Colliandre L, Bourg SP, Vayer P, Morin-Allory L (2012) Mining collections of compounds with Screening Assistant 2. J Cheminform 4:1–16
Sud M, Fahy E, Subramaniam S (2012) Template-based combinatorial enumeration of virtual compound libraries for lipids. J Cheminform 4:23
Cereto-Massague A, Guasch L, Valls C, Mulero M, Pujadas G, Garcia-Vallve S (2012) DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets. Bioinformatics 28:1661–1662
Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801
Irwin JJ (2008) Community benchmarks for virtual screening. J Comput Aided Mol Des 22:193–199
Wallach I, Lilien R (2011) Virtual decoy sets for molecular docking benchmarks. J Chem Inf Model 51:196–202
Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42:1273–1280
Kerber A, Laue R, Gruner T, Meringer M (1998) MOLGEN 4.0. MATCH Commun Math Comput Chem 37:205–208
Peironcely JE, Rojas-Cherto M, Fichera D, Reijmers TH, Coulier L, Faulon J-L, Hankemeier T (2012) OMG: open molecule generator. J Cheminform 4:21
Brefo-Mensah EK, Palmer M (2012) mol2chemfig, a tool for rendering chemical structures from molfile or SMILES format to LATEX code. J Cheminform 4:24
Lawson KR, Lawson J (2012) LICSS—a chemical spreadsheet in microsoft excel. J Cheminform 4:1–7
Wilhelm J-H (2011) MyChemise: a 2D drawing program that uses morphing for visualisation purposes. J Cheminform 3:53
Tosco P, Balle T, Shiri F (2011) Open3DALIGN: an open-source software aimed at unsupervised ligand alignment. J Comput Aided Mol Des 25:777–783
Norgan AP, Coffman PK, Kocher J-P, Katzmann DJ, Sosa CP (2011) Multilevel parallelization of AutoDock 4.2. J Cheminform 3:12
Demir-Kavuk O, Bentzien J, Muegge I, Knapp E-W (2011) DemQSAR: predicting human volume of distribution and clearance of drugs. J Comput Aided Mol Des 25:1121–1133
Jimmy R, Laurence M, Serge P (2009) Shape: automatic conformation prediction of carbohydrates using a genetic algorithm. J Cheminform 1:1–7
Rijnbeek M, Steinbeck C (2010) OrChem: an open source chemistry search engine for Oracle. J Cheminform 2:P28
Gramatica P, Chirico N, Papa E, Cassani S, Kovarich S (2013) QSARINS: a new software for the development, analysis, and validation of QSAR MLR models. J Comput Chem 34:2121–2132
http://www.oecd.org/chemicalsafety/risk-assessment/theoecdqsartoolbox.htm
Wang X, Chen H, Yang F, Gong J, Li S, Pei J, Liu X, Jiang H, Lai L, Li H (2014) iDrug: a web-accessible and interactive drug discovery and design platform. J Cheminform 6:1–8
Oprisiu I, Novotarskyi S, Tetko IV (2013) Modeling of non-additive mixture properties using the Online CHEmical database and Modeling environment (OCHEM). J Cheminform 5:4
Walker T, Grulke CM, Pozefsky D, Tropsha A (2010) Chembench: a cheminformatics workbench. Bioinformatics 26:3000–3001
Zhang L, Zhu H, Oprea T, Golbraikh A, Tropsha A (2008) QSAR modeling of the blood-brain barrier permeability for diverse organic compounds. Pharm Res 25:1902–1914
Breiman L (2001) Random forests. Mach Learn 1:5–32
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668–D672
Olah M, Mracec M, Ostopovici L, Rad R, Bora A, Hadaruga N, Olah I, Banda M, Simon Z, Mracec M (2004) WOMBAT: world of molecular bioactivity. Chemoinf. Drug Disc., Wiley-VCH, New York, 223–239
Bradley J-C, Lancashire RJ, Lang ASID, Williams AJ (2009) The Spectral Game: leveraging Open Data and crowdsourcing for education. J Cheminform 1:1–10
Tiwari A, Sekhar AKT (2007) Workflow based framework for life science informatics. Comput Biol Chem 31:305–319
Warr WA (2012) Scientific workflow systems: Pipeline pilot and KNIME. J Comput Aided Mol Des 26:1–4
Tan W, Madduri R, Nenadic A, Soiland-Reyes S, Sulakhe D, Foster I, Goble CA (2010) CaGrid Workflow Toolkit: a taverna based workflow tool for cancer grid. BMC Bioinformatics 11:542
Kuhn T, Willighagen EL, Zielesny A, Steinbeck C (2010) CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinformatics 11:159
http://cdktaverna2.ts-concepts.de/wiki/index.php?title=Main_Page
Truszkowski A, Jayaseelan KV, Neumann S, Willighagen EL, Zielesny A, Steinbeck C (2011) New developments on the cheminformatics open workflow environment CDK-Taverna. J Cheminform 3:54
Fiannaca A, La Rosa M, Di Fatta G, Gaglio S, Rizzo R, Urso A (2014) The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration. J Cheminform 6:1–6
Beisken S, Meinl T, Wiswedel B, de Figueiredo LF, Berthold MR, Steinbeck C (2013) KNIME-CDK: workflow-driven cheminformatics. BMC Bioinformatics 14:257
Ruusmann V, Sild S, Maran U (2014) QSAR DataBank—an approach for the digital organization and archiving of QSAR model information. J Cheminform 6:25
Weber J, Achenbach J, Moser D, Proschak E (2013) VAMMPIRE: a matched molecular pairs database for structure-based drug design and optimization. J Med Chem 56:5203–5207
Bolton E, Chen J, Kim S, Han L, He S, Shi W, Simonyan V, Sun Y, Thiessen PA, Wang J (2011) PubChem3D: a new resource for scientists. J Cheminform 3:32
Masciocchi J, Frau G, Fanton M, Sturlese M, Floris M, Pireddu L, Palla P, Cedrati F, Rodriguez P, Moro S (2009) MMsINC: a large-scale chemoinformatics database. Nucleic Acids Res 37:D284–D290
Schreyer A, Blundell T (2009) CREDO: a protein-ligand interaction database for drug discovery. Chem Biol Drug Des 73:157–167
Seiler KP, George GA, Happ MP, Bodycombe NE, Carrinski HA, Norton S, Brudz S, Sullivan JP, Muhlich J, Serrano M (2008) ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res 36:D351–D359
Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P (2005) ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics 21:4133–4139
Girke T, Cheng L-C, Raikhel N (2005) ChemMine. A compound mining database for chemical genomics. Plant Physiol 138:573–577
Milne GWA, Nicklaus MC, Driscoll JS, Wang S, Zaharevitz D (1994) National Cancer Institute drug information system 3D database. J Chem Inf Comput Sci 34:1219–1224
Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87:1123–1124
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Ambure, P., Aher, R.B., Roy, K. (2014). Recent Advances in the Open Access Cheminformatics Toolkits, Software Tools, Workflow Environments, and Databases. In: Zhang, W. (eds) Computer-Aided Drug Discovery. Methods in Pharmacology and Toxicology. Humana Press, New York, NY. https://doi.org/10.1007/7653_2014_35
Download citation
DOI: https://doi.org/10.1007/7653_2014_35
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3519-2
Online ISBN: 978-1-4939-3521-5
eBook Packages: Springer Protocols