Abstract
Databases that provide links between bioactive compounds and their protein targets are increasingly important in drug discovery and chemical biology. They join the expanding universes of cheminformatics via chemical structures on the one hand and bioinformatics via sequences on the other. However, it is difficult to assess the relative utility of databases without the explicit comparison of content. We have exemplified an approach to this by comparing resources that each has a different focus on bioactive chemistry (ChEMBL, DrugBank, Human Metabolome Database, and Therapeutic Target Database) both at the chemical structure and protein levels. We compared the compound sets at different representational stringencies using NCI/CADD Structure Identifiers. The overlap and uniqueness in chemical content can be broadly interpreted in the context of different data capture strategies. However, we recorded apparent anomalies, such as many compounds-in-common between the metabolite and drug databases. We also compared the content of sequences mapped to the compounds via their UniProt protein identifiers. While these were also generally interpretable in the context of individual databases we discerned differences in coverage and the types of supporting data used. For example, the target concept is applied differently between DrugBank and the Therapeutic Target Database. In ChEMBL it encompasses a broader range of mappings from chemical biology and species orthologue cross-screening in addition to drug targets per se. Our analysis should assist users not only in exploiting the synergies between these four high-value resources but also in assessing the utility of other databases at the interface of chemistry and biology.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chemical Structure Lookup Service (CSLS). http://cactus.nci.nih.gov/lookup. Accessed 27 Oct 2010
The UniProt Consortium (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38:D142–D148
Southan C, Varkonyi P, Muresan S (2009) Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. J Cheminfo. http://www.jcheminf.com/content/1/1/10. doi:10.1186/1758-2946-1-10
ChEMBL. http://www.ebi.ac.uk/chembl. Accessed 19 Sept 2010
Wishart DS, Knox C, Guo AC et al (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36:D901–D906
Wishart DS, Knox C, Guo AC et al (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37:D603–D610
Zhu F, Han B, Kumar P et al (2010) Update of TTD: therapeutic target database. Nucleic Acids Res 38:D787–D791
The Protein Identifier Cross-Reference Service. http://www.ebi.ac.uk/Tools/picr/init.do. Accessed 27 Oct 2010
The IUPAC International Chemical Identifier (Version 1.03). http://www.iupac.org/inchi/release103.html. Accessed 27 Oct 2010
NCI/CADD Chemical Identifier Resolver. http://cactus.nci.nih.gov/chemical/structure. Accessed 27 Oct 2010
Oliveros JC (2007) VENNY: an interactive tool for comparing lists with Venn diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html. Accessed 27 Oct 2010
de Matos P, Alcántara R, Dekker A et al (2010) Chemical entities of biological interest: an update. Nucleic Acids Res 38:D249–D254
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36
Daylight Chemical Information Systems Inc. http://www.daylight.com. Accessed 27 Oct 2010
Dalby A, Nourse JG, Hounshell WD et al (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Comput Sci 32:244–255
CTfile Formats. http://www.symyx.com/downloads/public/ctfile/ctfile.jsp. Accessed 27 Oct 2010
Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29:97–101
Ihlenfeldt WD, Gasteiger J (1994) Hash codes for the identification and classification of molecular structure elements. J Comput Chem 15:793–813
InChI TRUST—History of InChI. http://www.inchi-trust.org/index.php?q=node/2. Accessed 10 Sept 2010
CAS Registry Numbers. http://www.cas.org/index.html. Accessed 10 Sept 2010
ChemSpider. http://www.chemspider.com. Accessed 10 Oct 2010
Li Q, Cheng T, Wang Y et al (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15:1052–1057
Sitzmann M, Filippov IV, Nicklaus MC (2008) Internet resources integrating many small-molecule databases. SAR QSAR Environ Res 19:1–9
Ihlenfeldt WD, Takahashi Y, Abe H et al (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci 34:109–116
Xemistry GmbH. http://xemistry.com. Accessed 10 Oct 2010
ChemNavigator—iResearch Library. http://www.chemnavigator.com/cnc/products/iRL.asp. Accessed 10 Oct 2010
PubChem Substance Set. ftp://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full. Accessed 10 Oct 2010
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182
eMolecules. http://www.emolecules.com. Accessed 8 Sept 2010
Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5:993–996
Wang X, Chen C-F, Baker PR et al (2007) Mass spectrometric characterization of the affinity-purified human 26S proteasome complex. Biochemistry 46:3553–3565
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media New York
About this protocol
Cite this protocol
Muresan, S., Sitzmann, M., Southan, C. (2012). Mapping Between Databases of Compounds and Protein Targets. In: Larson, R. (eds) Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol 910. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-965-5_8
Download citation
DOI: https://doi.org/10.1007/978-1-61779-965-5_8
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-964-8
Online ISBN: 978-1-61779-965-5
eBook Packages: Springer Protocols