Background

Information about chemical compounds and their activity against whole organisms or specific molecular targets is available from the literature or from specialized databases. However, there are few resources that effectively integrate such large chemical datasets with genome data and provide a mechanism to link active compounds to potential target genes. Here, we showcase the integration of chemoinformatic tools for querying chemical datasets and linking chemicals to genes in TDR Targets database (tdrtargets.org), a web accessible resource that integrates a wide range of functional genomic datasets from tropical disease pathogens and provides a ranking mechanism for identifying and prioritising novel therapeutic targets [1].

Materials and methods

Chemical datasets were obtained from three different resources: DrugBank, PubChem and StARlite (ChEMBL). A pipeline was developed to calculate a number of properties (molecular weight; number of flexible bonds; polar surface area; H bond donors/acceptors; and predicted octanol/water partition coefficient) and descriptors (InChi, IUPAC's standard and open chemical identifiers; SMILES; and molecular formula) for each molecule, to facilitate querying and linking to other databases. We have also calculated a number of binary fingerprints and molecular statistics to accelerate searches.

Results

A dataset of 504,020 chemicals, enriched in drugs and drug-like compounds, integrated into TDRTargets.org can be queried using: a textual search on molecular descriptors or chemical properties; a substructure search to find molecules containing the query structure; and a similarity search to find similar molecules (using Tanimoto distance) (see Figure 1). In the Starlite database 438,791 compounds are associated with 3,512 known druggable targets, and 2,224 of these could be linked to 3,043 pathogen targets based on sequence similarity. These relationships are available at TDRTargets.org.

figure 1

Figure 1

Conclusions

A comprehensive collection of chemical data can be queried in various ways, including by chemical properties, structure and descriptors in TDRTargets.org. More importantly, one can also link compounds of interest to novel target genes in tropical disease causing parasitic organisms based on sequence similarity to known targets of these compounds.