A practical Java tool for small-molecule compound appraisal

Amani, Parisa; Sneyd, Todd; Preston, Sarah; Young, Neil D; Mason, Lyndel; Bailey, Ulla-Maja; Baell, Jonathan; Camp, David; Gasser, Robin B; Gorse, Alain-Dominique; Taylor, Paul; Hofmann, Andreas

doi:10.1186/s13321-015-0079-1

A practical Java tool for small-molecule compound appraisal

Software
Open access
Published: 16 June 2015

Volume 7, article number 28, (2015)
Cite this article

Download PDF

You have full access to this open access article

Journal of Cheminformatics Aims and scope Submit manuscript

A practical Java tool for small-molecule compound appraisal

Download PDF

Parisa Amani¹,
Todd Sneyd¹,
Sarah Preston²,
Neil D Young²,
Lyndel Mason¹,
Ulla-Maja Bailey¹,
Jonathan Baell³,
David Camp⁴,
Robin B Gasser²,
Alain-Dominique Gorse⁵,
Paul Taylor⁶ &
…
Andreas Hofmann ORCID: orcid.org/0000-0003-4408-5467^1,2

5118 Accesses
12 Citations
5 Altmetric
1 Mention
Explore all metrics

Abstract

Background

The increased use of small-molecule compound screening by new users from a variety of different academic backgrounds calls for adequate software to administer, appraise, analyse and exchange information obtained from screening experiments. While software and spreadsheet solutions exist, there is a need for software that can be easily deployed and is convenient to use.

Results

The Java application cApp addresses this need and aids in the handling and storage of information on small-molecule compounds. The software is intended for the appraisal of compounds with respect to their physico-chemical properties, analysis in relation to adherence to likeness rules as well as recognition of pan-assay interference components and cross-linking with identical entries in the PubChem Compound Database. Results are displayed in a tabular form in a graphical interface, but can also be written in an HTML or PDF format. The output of data in ASCII format allows for further processing of data using other suitable programs. Other features include similarity searches against user-provided compound libraries and the PubChem Compound Database, as well as compound clustering based on a MaxMin algorithm.

Conclusions

cApp is a personal database solution for small-molecule compounds which can handle all major chemical formats. Being a standalone software, it has no other dependency than the Java virtual machine and is thus conveniently deployed. It streamlines the analysis of molecules with respect to physico-chemical properties and drug discovery criteria; cApp is distributed under the GNU Affero General Public License version 3 and available from http://www.structuralchemistry.org/pcsb/. To download cApp, users will be asked for their name, institution and email address. A detailed manual can also be downloaded from this site, and online tutorials are available at http://www.structuralchemistry.org/pcsb/capp.php.

Cheminformatic Analysis of High-Throughput Compound Screens

Statistical Molecular Design: A Tool to Follow Up Hits from Small-Molecule Screening

In Silico Screening of Compound Libraries Using a Consensus of Orthogonal Methodologies

Background

Screening of organic small-molecule compounds has been a pivotal activity in the pharmaceutical industry as part of the drug discovery process. In the last decade, compound screening has increasingly been established and employed by academic laboratories due to many disease areas not being tackled by commercially oriented pharmaceutical industry, and also due to the availability of advanced technologies for the probing of biological systems [1].

The use of chemical tools and compound screening has therefore found new user clienteles, not all of whom are expert medicinal chemists and thus familiar with the properties of organic molecules. Recently, Baell and colleagues [2] highlighted a significant problem arising from the massively increased, non-expert compound screening in that molecules with promiscuous activities (pan-assay interference compounds, PAINs) are frequently being reported in the literature as (potential) hits in an undiscriminating fashion.

The concept of chemical spreadsheets is well established, and several different products have been developed in the past [3] that will store chemical data and present in a tabular form. Most such software is available from commercial providers, but there have also been freeware products, and increasingly web services provided by databases, such as ChemSpider [4] and the CCD Vault [5].

In the recent past, the concept of workflow has been implemented in many bio- and chemo-informatics approaches [6, 7]. Here, activities are classified into generic tasks that can be addressed by modular algorithms and thus combined by the end-user in a flexible fashion. Products in this category include the commercially available Pipeline Pilot (Accelrys, US) or InforSense (InforSense, UK). A freeware alternative is KNIME (Knime.com, Switzerland), based on the open source Eclipse platform, and CDK-Taverna [8] which builds on the Java libraries of the Chemistry Development Kit (CDK) [9].

Our own experience in collaborative work among medicinal chemistry, structural biology and biochemistry laboratories shows that data exchange, collection, archiving and publishing is very much done on a case-by-case basis, whereby simple tasks are often done repetitively and in many cases redundantly. Although the above spreadsheet or workflow software is able to deal with the requirements arising from drug screening projects in the academic setting, the actual deployment of such software by end-users is often hampered by access/availability, difficulty of installation and/or the perceived or real difficulty to learn how to use the software.

We set out to design a platform-independent Java application, based on our in-house developed collection PCSB [10], that should appeal to non-expert laboratories engaged in the handling of medium-sized compound libraries. Particular attention has been paid to making the learning and use of this software as convenient as possible. The portable Java application cApp enables the appraisal of compounds sourced from the commonly used formats of SMILES (simplified molecular-input line entry system; see specifications at [11]), InChI (International Chemical Identifier; see specification at [12]) and SDF (Structure Data Format; see Chemical Table File specification from December 2011 at [13]) files with respect to adherence to likeness rules. Compounds can also be input or manipulated via the embedded JChemPaint [14] chemical editor. Particular innovative features built into cApp are the identification of PAIN components in the appraised compounds, direct queries of the PubChem Compound Database [15] as well as similarity searches initiated with one mouse click.

Implementation

cApp has been implemented in Java for maximum portability, capitalising on existing chemo- and bio-informatic Java libraries, namely the CDK [9], JChemPaint [14] and PCSB [10]. The data structure within cApp rests on the custom-programmed Compound object that handles all data relating to individual small-molecule compounds for this software. Access to the PubChem Compound Database is through the PubChem Power User Gate (PUG), which is an XML-based communication gateway to interrogate the database.

Results

Software features

cApp is a personal compound database software that allows the user to compare chemical descriptors and similarities of compounds, but also to annotate compound lists with their own data and information. A cApp project comprises all data and compound sets of a software session; a compound set is a particular list of compounds. In the GUI, a compound set is displayed as a table on a particular tab (see Figure 1). Automatically generated HTML, PDF and ASCII presentations of compound sets are identified by their set number. Conceptually, its functionality is divided up into tasks, presentation of results and convenience features. In the present version, the tasks of compound appraisal, similarity search and clustering can be performed. The compound appraisal task calculates physico-chemical properties and structural features, an analysis for compliance with various likeness criteria (drug-, lead- or fragment-like) [16] and the identification of PAINs components [17] using the SMSD maximum common subgraph (MCS) Tanimoto coefficient as criterion [18]. Similarity searches against user-provided libraries can be conducted using an MCS approach which builds on the CDK Fingerprint Tanimoto coefficient [18] or the PubChem Compound Database. For compound clustering, a MaxMin algorithm with subsequent k-Means clustering [19] has been implemented, based on the CDK Fingerprint Tanimoto coefficient as property. The user can annotate compounds with extra information by adding three types of data in additional columns containing either free text, a file link or a URL. Linked files and web content are available with a mouse click from the cApp GUI via the user’s preferred web browser.

The individual features of cApp are described in a detailed manual that is available together with the application (see also the Additional file 1). Online tutorials for typical scenarios have been prepared and can be accessed at the project web site.

Assessment of similarity with pan-assay interference compounds (PAINs)

Baell and Holloway [17] have identified a set of chemical substructures that are frequently observed as effectors in compound screening and thus deemed to be promiscuous. In the compound appraisal task, cApp conducts SMARTS queries using 480 PAINs substructure filters that have been translated from the original rules in Sybyl Line Notation (sln) by Dr Rajarshi Guha (http://blog.rguha.net/?p=850). This conversion of the PAINs substructure filters from sln to SMARTS does not reproduce the original rules perfectly. For the present version of cApp, we have combined the three filters sets obtained from [20] into one set (pains.smt).

We have subjected a library of 50,000 compounds from the ChemBridge catalogue to PAINs filtering using the same SMARTS filters in cApp and PipelinePilot [21]. We also compared the results of PAINs-filtering in cApp with those obtained by the original sln rules. The results from this benchmarking indicate that there are small variations in the queries conducted by different software (see Table 1).

Table 1 Comparison of PAINs identification by different software/methodologies using a library of 50,000 compounds from the ChemBridge catalogue

Full size table

Conclusions

With cApp, we have developed a personal, small-molecule database management software that should appeal to the non-expert user due to its ease of installation, intuitive handling and convenient execution of tasks. In future versions, we plan to include additional functionality, such as identification of duplicate entries, and direct query capability of further public compound repositories, such as ChEMBL and others.

Availability and requirements

Project name: cApp.

Project home page: http://www.structuralchemistry.org/pcsb/capp.php.

Operating system(s): Platform independent.

Programming language: Java.

Other requirements: Java 1.7 or higher.

License: GNU AGPL v3.

Any restrictions to use by non-academics: None.

References

Hofmann A, Wang CK, Osman A, Camp D (2010) Merging structural biology with chemical biology: structural chemistry at Eskitis. Struct Chem 21:1117–1129
Article CAS Google Scholar
Baell J, Walters MA (2014) Chemical con artists foil drug discovery. Nature 513:481–483
Article CAS Google Scholar
Apodaca R (2008) Your favorite chemical spreadsheet. Depth First. http://www.depth-first.com/articles/2008/09/12/your-favorite-chemical-spreadsheet
ChemSpider. http://www.chemspider.com
CCD Vault. http://www.collaborativedrug.com
Tiwari A, Sekhar AKT (2007) Workflow based framework for life science informatics. Comput Biol Chem 31:305–319
Article CAS Google Scholar
Warr WA (2012) Scientific workflow systems: Pipeline Pilot and KNIME. J Comput Aided Mol Des 26:801–804
Article CAS Google Scholar
Kuhn T, Willighagen EL, Zielesny A, Steinbeck C (2010) CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinform 11:159
Article Google Scholar
Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo-and bioinformatics. Curr Pharm Des 12:2111–2120
Article CAS Google Scholar
Hofmann A, Wlodawer A (2002) PCSB—a program collection for structural biology and biophysical chemistry. Bioinformatics 18:209–210
Article CAS Google Scholar
OpenSMILES. http://www.opensmiles.org
The IUPAC International Chemical Identifier (InChI). http://www.iupac.org/inchi
Chemical Table File specification.L. http://www.download.accelrys.com/freeware/ctfile-formats/ctfile-formats.zip
Krause S, Willighagen E, Steinbeck C (2000) JChemPaint—using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 5:93–98
Article CAS Google Scholar
Bolton E, Wang Y, Thiessen PA, Bryant SH (2008) PubChem: integrated platform of small molecules and biological activities. In: Annual reports in computational chemistry, vol 4. Elsevier, Oxford, pp 217–240
Google Scholar
Barker J, Hesterkamp T, Whittaker M (2008) Integrating HTS and fragment-based drug discovery. Drug Discov World 9:69–75
Google Scholar
Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53:2719–2740
Article CAS Google Scholar
Asad Rahman S, Bashton M, Holliday GL, Schrader R, Thornton JM (2009) Small Molecule Subgraph Detector (SMSD) Toolkit. J. Cheminform 1:12
Article Google Scholar
Gorse D, Rees A, Kaczorek M, Lahana R (1999) Molecular diversity and its analysis. Drug Discov Today 4:257–264
Article CAS Google Scholar
Guha R (2010) PAINs SMARTS filters. http://blog.rguha.net/?p=850. Accessed 9 Feb 2015
BIOVIA (2013) Pipeline Pilot V9.1. Dassault Systèmes, San Diego

Download references

Authors’ contributions

AH, RBG, SP and PA designed the project with critical input from all authors; PA, TS, AH and PT wrote and compiled the code; all authors tested the software. PA and AH wrote the paper with contributions from all authors. All authors read and approved the final manuscript.

Acknowledgements

AH’s research is funded by the National Health and Medical Research Council (NHMRC), the Australian Research Council (ARC) and the Rebecca L Cooper Medical Research Foundation. RBG’s research is funded mainly through the ARC, NHMRC, Melbourne Water Corporation and Yourgene Bioscience, and supported by a Victoria Life Sciences Computation Initiative (VLSCI; grant number VR0007) on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government. We gratefully acknowledge advice from Duncan Bucknell (http://www.duncanbucknell.com/).

Compliance with ethical guidelines

Competing interests The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Structural Chemistry Program, Eskitis Institute, Griffith University, Nathan, QLD, Australia
Parisa Amani, Todd Sneyd, Lyndel Mason, Ulla-Maja Bailey & Andreas Hofmann
Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, Australia
Sarah Preston, Neil D Young, Robin B Gasser & Andreas Hofmann
Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences (MIPS), Monash University, Parkville, VIC, Australia
Jonathan Baell
Griffith School of Environment, Griffith University, Nathan, QLD, Australia
David Camp
Queensland Facility for Advanced Bioinformatics, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD, Australia
Alain-Dominique Gorse
School of Biological Sciences, The University of Edinburgh, Edinburgh, Scotland, UK
Paul Taylor

Authors

Parisa Amani
View author publications
You can also search for this author in PubMed Google Scholar
Todd Sneyd
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Preston
View author publications
You can also search for this author in PubMed Google Scholar
Neil D Young
View author publications
You can also search for this author in PubMed Google Scholar
Lyndel Mason
View author publications
You can also search for this author in PubMed Google Scholar
Ulla-Maja Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Baell
View author publications
You can also search for this author in PubMed Google Scholar
David Camp
View author publications
You can also search for this author in PubMed Google Scholar
Robin B Gasser
View author publications
You can also search for this author in PubMed Google Scholar
Alain-Dominique Gorse
View author publications
You can also search for this author in PubMed Google Scholar
Paul Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Hofmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Hofmann.

Additional file

Additional file 1: The software manual accompanies this paper as supplementary information.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Amani, P., Sneyd, T., Preston, S. et al. A practical Java tool for small-molecule compound appraisal. J Cheminform 7, 28 (2015). https://doi.org/10.1186/s13321-015-0079-1

Download citation

Received: 31 March 2015
Accepted: 27 May 2015
Published: 16 June 2015
DOI: https://doi.org/10.1186/s13321-015-0079-1

A practical Java tool for small-molecule compound appraisal