Abstract
With the availability of a multitude of databases that contain information on the bioactivity between compounds and proteins, several fundamental tasks arise. These include parsing of the original data in order to filter out unusable data, merging of multiple databases, identification of the sets of unique molecules, and selection of subsets of parsed data.
In this chapter, we address these issues by providing solutions to each of the problems. Solutions are presented using standardized and freely available data processing tools, as well as computer program code.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gaulton A, Hersey A, Nowotka M et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45:D945–D954. https://doi.org/10.1093/nar/gkw1074
Kim S, Thiessen PA, Bolton EE et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202–D1213. https://doi.org/10.1093/nar/gkv951
Wang Y, Bryant SH, Cheng T et al (2017) PubChem BioAssay: 2017 update. Nucleic Acids Res 45:D955–D963. https://doi.org/10.1093/nar/gkw1118
Chan WKB, Zhang H, Yang J et al (2015) GLASS: a comprehensive database for experimentally-validated GPCR-ligand associations. Bioinformatics 31:btv302. https://doi.org/10.1093/bioinformatics/btv302
Roth BL, Lopez E, Patel S, Kroeze WK (2000) The multiplicity of serotonin receptors: uselessly diverse molecules or an embarrassment of riches? Neuroscience 6:252–262. https://doi.org/10.1177/107385840000600408
Hewett M, Oliver DE, Rubin DL et al (2002) PharmGKB: the pharmacogenetics knowledge base. Nucleic Acids Res 30:163–165
Szklarczyk D, Santos A, von Mering C et al (2015) STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res 44:gkv1277. https://doi.org/10.1093/nar/gkv1277
Kuhn M, Szklarczyk D, Pletscher-Frankild S et al (2014) STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res 42:D401–D407. https://doi.org/10.1093/nar/gkt1207
Tanabe M, Kanehisa M (2012) Using the KEGG database resource. Curr Protoc Bioinformatics. https://doi.org/10.1002/0471250953.bi0112s38
Kanehisa M, Sato Y, Kawashima M et al (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–D462. https://doi.org/10.1093/nar/gkv1070
Fabregat A, Sidiropoulos K, Garapati P et al (2016) The reactome pathway knowledgebase. Nucleic Acids Res 44:D481–D487. https://doi.org/10.1093/nar/gkv1351
Joshi-Tope G, Gillespie M, Vastrik I et al (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33(Database issue):D428–D432. https://doi.org/10.1093/nar/gki072
Shinbo Y, Nakamura Y, Altaf-Ul-Amin M et al (2006) KNApSAcK: a comprehensive species-metabolite relationship database. In: Plant metabolomics. Biotechnology in agriculture and forestry. Springer, Berlin, Heidelberg, pp 165–181
Nakamura K, Shimura N, Otabe Y et al (2013) KNApSAcK-3D: a three-dimensional structure database of plant metabolites. Plant Cell Physiol 54(2):e4. https://doi.org/10.1093/pcp/pcs186
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Brown, J.B. (2018). Parsing Compound–Protein Bioactivity Tables. In: Brown, J. (eds) Computational Chemogenomics. Methods in Molecular Biology, vol 1825. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8639-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8639-2_4
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8638-5
Online ISBN: 978-1-4939-8639-2
eBook Packages: Springer Protocols