Integration of Automated Workflow in Chemoinformatics for Drug Discovery

Chapter

Abstract

The ever-increasing data and restricted execution time require automated computational workflow systems to handle it. Several tools are emerging to support this activity. Automated workflow systems require scripting to define the repetitive tasks on new data to generate desired output. They help in focussing on what a particular virtual experiment will achieve rather than how the process is executed. The theme of this chapter is identification of the repetitive tasks which can be automated to employ workflows for streamlining a series of computational tasks efficiently. A brief introduction to workflows and their components is followed by in-depth tutorials using today’s state-of-art workflow-based applications in the field of chemoinformatics for drug discovery research. An in-house-developed stand-alone application for chemo-bioinformatics workflow for performing protein–ligand networks J-ProLINE is also presented.

Keywords

Workflow Chemoinformatics Drug design Pipeline 

References

  1. 1.
    Wyrzykowski R, Dongarra J, Karczewski K et al (2008) Scientific workflow: a survey and research directions. Parallel processing and applied mathematics. Springer Berlin Heidelberg, pp 746–753Google Scholar
  2. 2.
  3. 3.
    Taylor IJ, Deelman E, Gannon DB, Shields M (eds) (2007) Workflows for e-science—scientific workflows for grids. XXI, p 523Google Scholar
  4. 4.
    Zhao Y, Raicu I, Foster I (2008) Scientific workflow systems for 21st century, new bottle or new wine? 2008 IEEE Congress on Services 2008-Part I, pp 467–471Google Scholar
  5. 5.
  6. 6.
    Aranguren ME, Fernandez-Breis JT, Mungall C et al (2013) OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows. J Biomed Semantics 4:2CrossRefGoogle Scholar
  7. 7.
    Elhai J, Taton A, Massar JP et al (2009) BioBIKE: A Web-based, programmable, integrated biological knowledge base. Nucleic Acids Res 37:W28–W32Google Scholar
  8. 8.
    Kallio MA, Tuimala JT, Hupponen T et al (2011) Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics 12:507CrossRefGoogle Scholar
  9. 9.
    Ovaska K, Laakso M, Haapa-Paananen S et al (2010) Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2:65CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Oinn T, Addis M, Ferris J et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20:3045–3054CrossRefGoogle Scholar
  14. 14.
    Mazanetz MP, Marmon RJ, Reisser CBT, Morao I (2012) Drug discovery applications for KNIME: an open source data mining platform. Curr Top Med Chem 12:1965–1979CrossRefGoogle Scholar
  15. 15.
    Warr WA (2012) Scientific workflow systems: Pipeline Pilot and KNIME. J Compu Aided Mol Des 26:801–804CrossRefGoogle Scholar
  16. 16.
    www.chemaxon.com. Accessed 30 Oct 2013
  17. 17.
    Kuhn T, Willighagen EL, Zielesny A, Steinbeck C (2010) CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinform 11:159CrossRefGoogle Scholar
  18. 18.
    Thorsten M, Wiswedel B, Berthold, Michael R (2012) Workflow tools for managing biological and chemical data. In: Guha R, Bender A (eds) Computational approaches in chemoinformatics and bioinformatics, pp 179–209Google Scholar
  19. 19.
    Fourches D, Muratov E, Pu D, Tropsha, A (2011) Boosting predictive power of QSAR models Alexander Abstracts of Papers, 241st ACS National Meeting & Exposition, Anaheim, CA, United States, March 27–31Google Scholar
  20. 20.
  21. 21.
  22. 22.
  23. 23.
    Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito, EX, Delproposto J, Chinnaswamy K, Kang Y-N, Kubish G, Gestwicki JE (2013) CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. J Chem Inf Model 53(8):1842–1852CrossRefGoogle Scholar
  24. 24.
    Chan AWE, Overington JP (2003) Recent development in chemoinformatics and chemogenomics. Annu Rep Med Chem 38:285–294CrossRefGoogle Scholar
  25. 25.
    Hwang KY, Chung JH, Kim SH, Han YS, Cho Y (1999) Structure-based identification of a novel NTPase from methanococcus jannaschii. Nat Struct Biol 6:691–696CrossRefGoogle Scholar
  26. 26.
    Martin YC, Willett P, Heller SR (eds) (1995) In designing bioactive molecules. American Chemical Society, Washington DCGoogle Scholar
  27. 27.
    Koshland DE Jr (1994) The key-lock theory and the induced fit theory. Chem Int Ed Engl 33:2375–2378Google Scholar
  28. 28.
    Todd AE, Orengo CA, Thornton JM (1999) Evolution of protein function, from a structural perspective. Curr Opin Chem Biol 3:548–556CrossRefGoogle Scholar
  29. 29.
    Eckers E, Petrungaro C, Gross D, Riemer J, Hell K, Deponte M (2013) Divergent molecular evolution of the mitochondrial sulfhydryl: cytochrome c oxidoreductase Erv in opisthokonts and parasitic protists. J Biol Chem 288(4):2676–2688CrossRefGoogle Scholar
  30. 30.
    Gaston, Daniel;Roger, Andrew J (2013) Functional divergence and convergent evolution in the plastid-targeted glyceraldehyde-3-phosphate dehydrogenases of diverse eukaryotic algae. PLoS One 8(7):e70396CrossRefGoogle Scholar
  31. 31.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H (2000) The protein data bank. Nucl Acids Res 28:235–242CrossRefGoogle Scholar
  32. 32.
    Ewing T, Baber JC, Feher M (2006) Novel 2D fingerprints for ligand-based virtual screening. J Chem Inf Mod 46:2423–2431CrossRefGoogle Scholar
  33. 33.
    Deng Z, Chuaqui C, Singh J (2003) Structural Interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein ligand binding interactions. J Med Chem 47:337–344CrossRefGoogle Scholar
  34. 34.
    Deng Z, Chuaqui C, Singh J (2007) Generation of profile-structural interaction fingerprints for representing and analyzing three-dimensional target molecule-ligand interactions. U.S. Pat Appl Publ US 20070020642A120070125Google Scholar
  35. 35.
    Nandigam RK, Kim S, Singh J, Chuaqui C (2009) Position specific interaction dependent scoring technique for virtual screening based on weighted protein-ligand interaction fingerprint profiles. J Chem Inf Mod 49(5):1185–1192CrossRefGoogle Scholar
  36. 36.
    Tan L, Bajorath J (2009) Utilizing target–ligand interaction information in fingerprint searching for ligands of related targets. Chem Biol Drug Des 74:25–32CrossRefGoogle Scholar
  37. 37.
    Klepsch F, Chiba P, Ecker GF (2011) Exhaustive sampling of docking poses reveals binding hypotheses for propafenone type inhibitors of P-Glycoprotein. PLoS Comput Biol 7(5):e1002036CrossRefGoogle Scholar
  38. 38.
    Weisel M, Bitter H-M, Diederich F (2012) PROLIX: rapid mining of protein ligand interactions in large crystal structure databases. J Chem Inf Model 52:1450–1461CrossRefGoogle Scholar
  39. 39.
    Unpublished resultsGoogle Scholar
  40. 40.
    Karthikeyan M, Krishnan S, Pandey AK, Bender A (2006) Harvesting chemical information from the internet using a distributed approach: vhemXtreme. J Chem Inf Model 46:452–461CrossRefGoogle Scholar
  41. 41.
    Karthikeyan, M, Krishnan S, Pandey AK, Andreas B, Alexander Tropsha A (2008) Distributed chemical computing using chemstar: an open source java remote method invocation architecture applied to large scale molecular data from pubchem. J Chem Inf Model 48(4):691–703CrossRefGoogle Scholar
  42. 42.
  43. 43.
  44. 44.
  45. 45.
    Unpublished workGoogle Scholar
  46. 46.
    http://www.pdbbind.org.cn/. Accessed 30 Oct 2013
  47. 47.
  48. 48.
    http://www.hupo.org/. Accessed 30 Oct 2013
  49. 49.
    http://string-db.org/. Accessed 30 Oct 2013
  50. 50.
    http://thebiogrid.org/. Accessed 30 Oct 2013
  51. 51.
    http://www.hprd.org/. Accessed 30 Oct 2013
  52. 52.
    Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G (2007) FEBS letters MINT: a molecular INTeraction database. Nucleic Acid Res 35:D572–574CrossRefGoogle Scholar
  53. 53.
    Xenarios, I, Salwinski L, Duan XJ, Higney P, Kim S-M, Eisenberg D (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30(1):303–305CrossRefGoogle Scholar
  54. 54.
  55. 55.
    http://www.cytoscape.org/. Accessed 30 Oct 2013
  56. 56.

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Digital Information Resource CentreNational Chemical LaboratoryPuneIndia
  2. 2.Scientist (DST) Division of Chemical Engineering and Process DevelopmentNational Chemical LaboratoryPuneIndia

Personalised recommendations