Skip to main content

Open-Source Tools, Techniques, and Data in Chemoinformatics

  • Chapter
  • First Online:
Practical Chemoinformatics

Abstract

Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Leach A (2007) An introduction to chemoinformatics. Springer

    Google Scholar 

  2. Gasteiger J, Engel T (eds) (2003) Chemoinformatics: a textbook. Wiley-VCH

    Google Scholar 

  3. Gasteiger J(ed) (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCH

    Google Scholar 

  4. Umashankar V, Gurunathan S (2011) Chemoinformatics and its applications. General applied and systems toxicology. Wiley

    Google Scholar 

  5. Acton A(ed) (2011) Issues in biotechnology and medical technology research and application (Scholarly Editions)

    Google Scholar 

  6. Muffatto M (2006) Open source: a multidisciplinary approach. Imperial College Press

    Google Scholar 

  7. http://www.openbsdindia.org/

  8. Ortega JM (1994) An introduction to fortran 90 for scientific computing. Oxford University Press

    Google Scholar 

  9. http://www.computerhope.com/unix.htm. Accessed on 22 Oct 2013

  10. Douglas EC Internetworking with TCP/IP—Principles, Protocols and Architecture

    Google Scholar 

  11. Kernighan BW, Ritchie DM (1978) The C programming language, 1st ed. Prentice Hall, Englewood Cliffs

    Google Scholar 

  12. Stroustrup B (1997) “1”. The C++ Programming Language, 3rd ed. Addison-Wesley

    Google Scholar 

  13. Fan Li (2006) Developing chemical information systems: an object oriented approach using enterprise Java. Wiley

    Google Scholar 

  14. http://www.perl.org/

  15. http://www.python.org/

  16. http://www.r-project.org/

  17. http://www.nvidia.com/object/cuda_home_new.html

  18. Schatz MC, Trapnell C, Delcher AL, Varshaney A (2007) High through put sequence alignment using graphics processing units. BMC Bioinformat 8:474

    Article  Google Scholar 

  19. Ash JE, Warr WA, Willett P (1991) Chemical structure systems: computational techniques for representation, searching, and process of structural information. Ellis Horwood, New York

    Google Scholar 

  20. Gluck DJ (1964) A chemical structure storage and search systems developed at Du Pont. J Chem Informat Model 5:43–51

    Google Scholar 

  21. Warr WA (2011) Representation of chemical structures. WIREs Comput Mol Sci 1(4):557–579

    Article  CAS  Google Scholar 

  22. Krause S, Willighagen E, Steinbeck C (2000) Using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Mol 5:93–98

    Article  CAS  Google Scholar 

  23. https://github.com/features/projects

  24. http://www.xml-cml.org/

  25. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann EE, Willighagen E (2003) The chemistry development kit(CDK): an open source JAVA library for Chemo-and Bioinformatics. J Chem Informat Model 43:493–500

    Article  CAS  Google Scholar 

  26. http://mcdl.sourceforge.net/

  27. Ertl P (2010) Molecular structure input on the web. J Cheminformatics 2:1

    Article  Google Scholar 

  28. Bienfait B, Ertl, P (2013) JSME: a free molecule editor in JavaScript. J Cheminformat 5:24

    Article  CAS  Google Scholar 

  29. http://www.molinspiration.com/. Accessed on 22 Oct 2013

  30. http://www.chemaxon.com/. Accessed on 22 Oct 2013

  31. http://www.acdlabs.com/resources/freeware/chemsketch/. Accessed on 22 Oct 2013

  32. http://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/. Accessed on 22 Oct 2013

  33. http://www.schrodinger.com/. Accessed on 22 Oct 2013

  34. http://www.chemcomp.com/. Accessed on 22 Oct 2013

  35. http://accelrys.com/products/informatics/cheminformatics/draw/ . Accessed on 22 Oct 2013

  36. https://www.cas.org/products/scifinder. Accessed on 22 Oct 2013

  37. http://www.chemspider.com/. Accessed on 22 Oct 2013

  38. http://www.nih.gov/. Accessed on 22 Oct. 2013

  39. http://www.beilstein-journals.org/bjoc/home/home.htm. Accessed on 22 Oct 2013

  40. Sorter PF, Granito CE, Gilmer JC, Alan G, Metcalf EA (1963) Rapid structure searches via permutated chemical line notation. J Chem Doc 4(1):56–60

    Article  Google Scholar 

  41. Fritts LE, Schwind MM (1982) Using the Wiswesser line Notation (WLN) for online, interactive searching of chemical structures. J Chem Inf Comput Sci 22:106–109

    Article  CAS  Google Scholar 

  42. Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland B A, Laufer J (1992) Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Informat Model 32(3):244

    Article  CAS  Google Scholar 

  43. Weininger D (1990) SMILES Graphical depiction of chemical structures J Chem Inf Comput Sci 30:237–243

    Article  CAS  Google Scholar 

  44. www.daylight.com/dayhtml/doc/theory/theory.smarts.html

  45. Cline AS, Homer MA, Hurst RW, Smith T, Gregory B (1997) SYBYL Line Notation (SLN): a versatile language for chemical structure representation. J Chem Inf Comput. Sci 37:71–79

    Google Scholar 

  46. Alan M (2006) The IUPAC international chemical identifier: In Chl. Chemistry International (IUPAC) 28 (6) http://www.iupac.org/publications/ci/2006/2806/4_tools.html.

  47. King RB (ed) (1983) Chemical applications of topology and graph theory. Elsevier

    Google Scholar 

  48. Grave K D, Costa F (2010) Molecular graph augmentation with rings and functional groups. J Chem Inf Model 50:1660–1668

    Article  Google Scholar 

  49. Santagata LN, Suvire FD, Enriz RD (2001) A matrix representation for the geometrical algorithm to search the chemical space. J Mol Struct Theochem 571:91–98

    Article  CAS  Google Scholar 

  50. http://www.ccl.net/cca/documents/molecular-modeling/node3.html

  51. www.lohninger.com/helpcsuite/connection_table.htmm

  52. http://www.cas.org/content/chemical-substances

  53. http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php

  54. http://www.wolfram.com/

  55. http://cactus.nci.nih.gov/SDF_toolkit/

  56. http://www.cgl.ucsf.edu/chimera/docs/UsersGuide/xyz.html

  57. http://www.wwpdb.org/docs.html

  58. Phadungsukanan W, Kraft M, Townsend JA, Murray-Rust P (2012) The semantics of chemical markup language(CML) for computational chemistry. J Cheminform 4(1):15

    Article  CAS  Google Scholar 

  59. http://www.tripos.com/tripos_resources/fileroot/pdfs/mol2_format.pdf

  60. http://www.molsoft.com/2dto3d.html

  61. http://www.molecular-networks.com

  62. Barnard JM, Lynch MF, Welford S M (1981) Computer storage and retrieval of generic chemical structures in patents. GENSAL, a formal language for the description of generic chemical structures. J Chem Inf Comput Sci 21:151–161

    Article  CAS  Google Scholar 

  63. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3:33

    Article  Google Scholar 

  64. http://www.chemaxon.com/marvin/help/applications/molconvert.html

  65. Bath, PAP, Andrew R, Willett P, Allen, FH (1994) Similarity searching in files of three-dimensional chemical structures: comparison of fragment-based measures of shape similarity. J Chem Inf Comput Sci 34:141–147

    Article  CAS  Google Scholar 

  66. Wang Y, Bajorath J (2010) Advanced Fingerprint methods for similarity searching: balancing molecular complexity effects. Comb Chem High Throughput Screen 13:220–228

    Article  CAS  Google Scholar 

  67. Wipke W T, Krishnan S, Ouchi G I (1978) Hash functions for rapid storage and retrieval of chemical structures. J Chem Inf Comput Sci 18:32–37

    Article  CAS  Google Scholar 

  68. Takahashi Y, Sukekawa M, Sasaki S (1992) Automatic identification of molecular similarity using reduced-graph representation of chemical structure. J Chem Inf Comput Sci 32:639–43

    Article  CAS  Google Scholar 

  69. http://www.cas.org/etrain/stn/exactfamilysearch.html

  70. http://www.chemaxon.com/jchem/intro/index.html

  71. http://www2.chemie.uni-erlangen.de/software/wodca/subsearch.html

  72. Vogt M, Bajorath J (2013) Similarity searching for potent compounds using feature selection. J Chem Inf Model 53(7):1613–1619

    Article  CAS  Google Scholar 

  73. Sayle RA, Batista JJ, Grant A (2013) An efficient maximum common subgraph(MCS) searching of large chemical databases. J Cheminformat 5(1):O15

    Article  Google Scholar 

  74. Chen X, Reynolds CH (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci 42:1407–1414

    Article  CAS  Google Scholar 

  75. Holliday JD, Salim N, Whittle M, Willett P (2003) Analysis and display of the size dependence of chemical similarity coefficients. J Chem Inf Comput Sci 43:819–828

    Article  CAS  Google Scholar 

  76. Weiss G (2007) Exploring the milky way of molecular diversity combinatorial chemistry and molecular diversity. Curr opin chem biolo 11:241–243

    Article  CAS  Google Scholar 

  77. Karthikeyan M, Vyas R (2012) Chemical structure representation and applications in computational toxicology. In: Reisfield B, Mayeno AN (ed) Computational toxicology. Springer, pp 167–192

    Google Scholar 

  78. Karthikeyan M, Uzagare D, Krishnan S (2003) Compressed chemical markup language for compact storage and inventory applications. 225th ACS Meeting New Orleans. CG ACS, pp 23–27

    Google Scholar 

  79. Karthikeyan M, Krishnan S, Pandey AK (2006) Harvesting chemical information from the internet using a distributed approach. Chem Extreme J Chem Inf Model 46:452–461

    Article  CAS  Google Scholar 

  80. Karthikeyan M, Bender, A (2005) Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes. J Chem Inform Model 45:572–580

    Article  CAS  Google Scholar 

  81. http://www.moltable.ncl.res.in

  82. Valko AT, Johnson AP (2009) CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. J Chem Inform Model 49:780–787

    Article  CAS  Google Scholar 

  83. Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information OSRA, an open source solution. J Chem Inf Model 49(3):740–743

    Article  CAS  Google Scholar 

  84. http://infochem.de/products/index.shtml

  85. Karthikeyan M, Krishnan S, Pandey AK, Bender A (2008) Distributed chemical computing using Chemstar: an open source Java Remote Method Invocation architecture applied to large scale molecular data from Pubchem. J Chem Info Model 48:691–703

    Article  CAS  Google Scholar 

  86. Song CM, Bernardo PH, Chai CL, Tong JC (2009) CLEVER: pipeline for designing insilico chemical libraries. J Mol Graph Model 27(5):578–583

    Article  CAS  Google Scholar 

  87. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304

    Article  Google Scholar 

  88. Hoon MJL, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinforma 20(9):1453–1454

    Article  Google Scholar 

  89. Saldanha AJ (2004) JAVA treeview extensible visualization of microarray data. Bioinforma 20:3246–3248

    Article  CAS  Google Scholar 

  90. http://www.chemaxon.com/products/jklustor/

  91. Ullman J (1997) First course in database systems. Prentice-Hall Inc., Simon & Schuster, p 1

    Google Scholar 

  92. Mike C SQL Fundamentals

    Google Scholar 

  93. http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=712

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muthukumarasamy Karthikeyan .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this chapter

Cite this chapter

Karthikeyan, M., Vyas, R. (2014). Open-Source Tools, Techniques, and Data in Chemoinformatics. In: Practical Chemoinformatics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1780-0_1

Download citation

Publish with us

Policies and ethics