Cheminformatics analysis and learning in a data pipelining environment
- 421 Downloads
- 103 Citations
Summary
Workflow technology is being increasingly applied in discovery information to organize and analyze data. SciTegic's Pipeline Pilot is a chemically intelligent implementation of a workflow technology known as data pipelining. It allows scientists to construct and execute workflows using components that encapsulate many cheminformatics based algorithms. In this paper we review SciTegic's methodology for molecular fingerprints, molecular similarity, molecular clustering, maximal common subgraph search and Bayesian learning. Case studies are described showing the application of these methods to the analysis of discovery data such as chemical series and high throughput screening results. The paper demonstrates that the methods are well suited to a wide variety of tasks such as building and applying predictive models of screening data, identifying molecules for lead optimization and the organization of molecules into families with structural commonality.
Key words
Bayesian models bioactivity prediction data mining data pipelining maximal common substructure search molecular fingerprints molecular similarity virtual screeningAbbreviations
- MCSS
maximal common substructure search
- ECFP
extended connectivity fingerprints
- FCFP
functional class fingerprints
- MDDR
MDL drug data report
- WDI
world drug index
- CATS
chemically advanced template search
- BKD
binary kernel discrimination
- CDK2
cyclin-dependent kinase 2
- DHFR
escherichia coli dihydrofolate reductase
Preview
Unable to display preview. Download preview PDF.
References
- 1.SciTegic, Inc. 10188 Telesis Court, Suite 100, San Diego, CA 92121, USA, http://www.scitegic.com/products_services/pipeline_pilot.htm
- 2.Todeschini, R. and Consonni, V., Handbook of Molecular Descriptors, Wiley-VCH, Weinheim, Germany, 2000.Google Scholar
- 3.Mark Johnson, M., Maggiora, G., (Eds.) Concepts and Applications of Molecular Similarity. Wiley, New York, 1990.Google Scholar
- 4.McGregor, M.J. and Pallai, P.V., Clustering of large databases of compounds: Using the MDL ‘keys’ as structural descriptors, J. Chem. Inf. Comput. Sci., 37 (1997) 443–448.CrossRefGoogle Scholar
- 5.Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J., Classification and Regression Trees, Wadsworth and Brooks/Cole, Monterey, CA, 1984.Google Scholar
- 6.Dubois, J. E., In Chemical Applications of Graph Theory, In Balaban, A.T. (Ed.) Academic Press, London, 1976, p. 161.Google Scholar
- 7.Randic, M., Fragment search in acyclic structures, J. Chem. Inf. Comput.Sci., 18 (1978) 101–107.CrossRefGoogle Scholar
- 8.Willett, P., A screen set generation algorithm, J. Chem. Inf. Comp. Sci., 19 (1979) 159–162.CrossRefGoogle Scholar
- 9.Marie, T., Gannon and Willett, P., Sampling considerations in the selection of fragments screens for chemical substructure search systems, J. Chem. Inf. Comp. Sci., 19 (1979) 251–253.CrossRefGoogle Scholar
- 10.Willett, P., The effect of screen set size on retrieval from chemical substructure search systems, J. Chem. Inf. Comp. Sci., 19 (1979) 253–255.CrossRefGoogle Scholar
- 11.Schubert, W. and Ugi, I., Constitutional symmetry and unique descriptors of molecules, J. Amer. Chem. Soc., 100 (1978) 37–41.CrossRefGoogle Scholar
- 12.Bremser, W., HOSE – A novel substructure code, Anal. Chim. Acta, 103 (1978) 355–365.CrossRefGoogle Scholar
- 13.Bender, A., Mussa, H.Y., Glen, R.C. and Reiling, S. Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier, J.Chem. Inf. Comput. Sci., 44 (2004) 170–178.PubMedCrossRefGoogle Scholar
- 14.Morgan, H. L., The generation of a unique machine description for chemical structures-A technique developed at chemical sbstracts service, J. Chem. Doc., 5 (1965) 107–112.CrossRefGoogle Scholar
- 15.Weininger, D., Weininger, A. and Weininger, J.L., SMILES. 2. Algorithm for generation of unique SMILES notation, J. Chem. Inf. Comp. Sci., 29 (1989) 97–101.CrossRefGoogle Scholar
- 16.Rogers, D. and Hahn, M., Extended connectivity fingerprints, J. Chem. Inf. Model., in preparation.Google Scholar
- 17.Bender, A. and Glen, R.C., Molecular similarity: A key technique in molecular informatics, Org. Biomol. Chem., 2 (2004) 3204–3218.PubMedCrossRefGoogle Scholar
- 18.Hert, J., Willett, P., Wilton, D.J., Acklin P., Azzaoui, K., Jacoby, E. and Schuffenhauer, A., Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures, J. Chem. Inf. Comput. Sci., 44 (2004) 1177–1185.PubMedCrossRefGoogle Scholar
- 19.Everitt and Brian S., Cluster Analysis, Edward Arnold, A division of Hodder & Stoughton, London, 1997.Google Scholar
- 20.Kaufman, L. and Rousseeu, P., Finding Groups in Data, Wiley-Interscience, New York, 1990.Google Scholar
- 21.Hassan, M., Bielawski, J.P., Hempel, J.C. and Waldman, M., Optimization and visualization of molecular diversity and combinatorial libraries, Molecular Diversity, 2 (1996) 64–74.PubMedCrossRefGoogle Scholar
- 22.Asinex, Incorporated, 6 Schukinskaya St, Moscow 123182, Russia; http://www.asinex.com
- 23.Raymond, J.W., Gardiner, E.J. and Willett, P. Rascal, calculation of graph similarity using maximum common edge subgraphs, Comput. J., 45 (2002) 631–644.CrossRefGoogle Scholar
- 24.Raymond, J.W., Gardiner, E.J. and Willett, P., Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm, J. Chem. Inf. Comput. Sci., 42 (2002) 305–316.PubMedCrossRefGoogle Scholar
- 25.Xia, X., Maliski E.G., Gallant, P. and Rogers, D., Classification of kinase inhibitors using a Bayesian model, J. Med. Chem., 47 (2004) 4463–4470.PubMedCrossRefGoogle Scholar
- 26.Hert, J., Willett, P., David J.W., Acklin P., Azzaoui K., Jacoby E. and Schuffenhauer A., New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching, J. Chem. Inf. Model. (2006), in press.Google Scholar
- 27.Robertson, S.E. and Sparck J.K., Relevance weighting of search terms, J. Amer. Soc. Inform. Sci., 27 (1976) 129–146.CrossRefGoogle Scholar
- 28.Avidon, V.V., Arolovich, V.S., Kozlava, S.P. and Piruzyan, L.A., Statistical study of information file on biologically active compounds. II. Choice of decision rule for biologically active prediction, Khim. Farm. Zh., 12 (1978) 88–93.Google Scholar
- 29.Hert, J., Willett, P., Wilton, D.J., Acklin P., Azzaoui, K., Jacoby E. and Schuffenhauer A., Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures, Org. Biomol. Chem., 2 (2004) 3256–3266.PubMedCrossRefGoogle Scholar
- 30.Barnard Chemical Information Ltd. is at http://www.bci.gb.com/
- 31.Daylight Chemical Information Systems, 27401 Los Altos, Suite 360, Mission Viejo, CA, USA 92691; http://www.daylight.com
- 32.Tripos Inc. is at http://www.tripos.com
- 33.Schuffenhauer, P., Floersheim, P., Acklin, P. and Jacoby, E., Similarity metrics for ligands reflecting the similarity of the target proteins, J. Chem. Inf. Comput. Sci., 43 (2003) 391–405.PubMedCrossRefGoogle Scholar
- 34.Schneider, G., Neidhart, W., Giller, T. and Schmid, G., Scaffold-hopping by topological pharmacophore search: A contribution to virtual screening, Angew. Chem. Int. Ed. Engl., 38 (1999) 2894–896.PubMedCrossRefGoogle Scholar
- 35.The MDL Drug Data Report database is available from MDL Information Systems Inc. at http://www.mdli.com/
- 36.Bemis, G.M. and Murcko, M.A., The properties of known drugs. 1. Molecular frameworks, J. Med. Chem., 39 (1996) 2887–2893.PubMedCrossRefGoogle Scholar
- 37.National Cancer Institute database, available at http://dtp.nci.nih.gov/
- 38.Sielecki, T.M., Boylan, J.F., Benfield, P.A. and Trainor, G.L., Cyclin-dependent kinase inhibitors: Useful targets in cell cycle regulation. J. Med. Chem., 43 (2000) 1–18.PubMedCrossRefGoogle Scholar
- 39.Buolamwini, J.K., Cell cycle molecular targets in novel anticancer drug discovery. Curr. Pharm. Des., 6 (2000) 379–392.PubMedCrossRefGoogle Scholar
- 40.Meijer, L., Cyclin-dependent kinases inhibitors as potential anticancer, antineurodegenerative, antiviral and antiparasitic agents, Drug Resist. Updates, 3 (2000) 83–88.CrossRefGoogle Scholar
- 41.Sausville, E.A., Johnson, J., Alley, M., Zaharevitz, D. and Senderowicz, A.M., Inhibition of CDKs as a therapeutic modality, Ann. N. Y. Acad. Sci., 910, Colorectal Cancer (2000) 207–222.PubMedCrossRefGoogle Scholar
- 42.Mani, S., Wang, C., Wu, K., Francis, R. and Pestell, R., Cyclin-dependent kinase inhibitors: Novel anticancer agents. Exp. Opin. Invest. Drugs 9 (2000) 1849–1870.CrossRefGoogle Scholar
- 43.Fischer, P.M. and Lane, D.P., Inhibitors of cyclin-dependent kinases as anti-cancer therapeutics, Curr. Med. Chem., 7 (2000) 1213–1245.PubMedGoogle Scholar
- 44.Senderowicz, A.M., Small molecule modulators of cyclin-dependent kinases for cancer therapy, Oncogene, 19 (2000) 6600–6606.PubMedCrossRefGoogle Scholar
- 45.Senderowicz, A.M., Development of cyclin-dependent kinase modulators as novel therapeutic approaches for hematological malignancies. Leukemia, 15 (2001) 1–9.PubMedCrossRefGoogle Scholar
- 46.Senderowicz, A.M., Cyclin-Dependent Kinase Modulators: A Novel Class of Cell Cycle Regulators for Cancer Therapy. In Cancer Chemotherapy and Biological Response Modifiers, Annual 19; Giaccone, G., Schilsky, R., Sondel, P., (Eds.), Elsevier Science: New York, 2001, pp 165–188.Google Scholar
- 47.Roy, K.K. and Sausville, E.A., Early development of cyclin dependent kinase modulators, Curr. Pharm. Des., 7 (2001) 1669–1687.PubMedCrossRefGoogle Scholar
- 48.Fischer, P.M., Recent advances and new directions in the discovery and development of cyclin-dependent kinase inhibitors, Curr. Opin. Drug Discovery Dev., 4 (2001) 623–634.Google Scholar
- 49.Bradley, E.K., Miller J.L., Saiah, E. and Grootenhuis, P.D.J., Informative library design as an efficient strategy to identify and optimize leads: Application to cyclin-dependent kinase 2 antagonists, J. Med. Chem., 46 (2003) 4360–4364.PubMedCrossRefGoogle Scholar
- 50.Parker, C.N., McMaster university data-mining and docking competition. Computational models on the catwalk, J. Biomol. Screening, 10 (2005) 647–649.CrossRefGoogle Scholar
- 51.Rogers, D., Brown, R.D and Hahn, M., Using extended-connectivity fingerprints with laplacian-modified Bayesian analysis in high-throughput screening follow-up, J. Biomol. Screening, 10 (2005), 682–686.CrossRefGoogle Scholar
- 52.Klon, A.E., Glick, M., Thomas, M., Acklin, P. and Davies, J. W., Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results, J. Med. Chem., 47 (2004) 2743–2749.PubMedCrossRefGoogle Scholar
- 53.Klon, A.E., Glick, M. and Davies, J.W., Combination of a Naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results, J. Med. Chem., 47 (2004) 4356–4359.PubMedCrossRefGoogle Scholar