Open-Source Tools, Techniques, and Data in Chemoinformatics

  • Muthukumarasamy Karthikeyan
  • Renu Vyas


Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.


Chemical structure Molecular modelling Chemical databases Open-source software Drug discovery 


  1. 1.
    Leach A (2007) An introduction to chemoinformatics. SpringerGoogle Scholar
  2. 2.
    Gasteiger J, Engel T (eds) (2003) Chemoinformatics: a textbook. Wiley-VCHGoogle Scholar
  3. 3.
    Gasteiger J(ed) (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCHGoogle Scholar
  4. 4.
    Umashankar V, Gurunathan S (2011) Chemoinformatics and its applications. General applied and systems toxicology. WileyGoogle Scholar
  5. 5.
    Acton A(ed) (2011) Issues in biotechnology and medical technology research and application (Scholarly Editions)Google Scholar
  6. 6.
    Muffatto M (2006) Open source: a multidisciplinary approach. Imperial College PressGoogle Scholar
  7. 7.
  8. 8.
    Ortega JM (1994) An introduction to fortran 90 for scientific computing. Oxford University PressGoogle Scholar
  9. 9. Accessed on 22 Oct 2013
  10. 10.
    Douglas EC Internetworking with TCP/IP—Principles, Protocols and ArchitectureGoogle Scholar
  11. 11.
    Kernighan BW, Ritchie DM (1978) The C programming language, 1st ed. Prentice Hall, Englewood CliffsGoogle Scholar
  12. 12.
    Stroustrup B (1997) “1”. The C++ Programming Language, 3rd ed. Addison-WesleyGoogle Scholar
  13. 13.
    Fan Li (2006) Developing chemical information systems: an object oriented approach using enterprise Java. WileyGoogle Scholar
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
    Schatz MC, Trapnell C, Delcher AL, Varshaney A (2007) High through put sequence alignment using graphics processing units. BMC Bioinformat 8:474CrossRefGoogle Scholar
  19. 19.
    Ash JE, Warr WA, Willett P (1991) Chemical structure systems: computational techniques for representation, searching, and process of structural information. Ellis Horwood, New YorkGoogle Scholar
  20. 20.
    Gluck DJ (1964) A chemical structure storage and search systems developed at Du Pont. J Chem Informat Model 5:43–51Google Scholar
  21. 21.
    Warr WA (2011) Representation of chemical structures. WIREs Comput Mol Sci 1(4):557–579CrossRefGoogle Scholar
  22. 22.
    Krause S, Willighagen E, Steinbeck C (2000) Using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Mol 5:93–98CrossRefGoogle Scholar
  23. 23.
  24. 24.
  25. 25.
    Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann EE, Willighagen E (2003) The chemistry development kit(CDK): an open source JAVA library for Chemo-and Bioinformatics. J Chem Informat Model 43:493–500CrossRefGoogle Scholar
  26. 26.
  27. 27.
    Ertl P (2010) Molecular structure input on the web. J Cheminformatics 2:1CrossRefGoogle Scholar
  28. 28.
    Bienfait B, Ertl, P (2013) JSME: a free molecule editor in JavaScript. J Cheminformat 5:24CrossRefGoogle Scholar
  29. 29. Accessed on 22 Oct 2013
  30. 30. Accessed on 22 Oct 2013
  31. 31.
  32. 32.
  33. 33. Accessed on 22 Oct 2013
  34. 34. Accessed on 22 Oct 2013
  35. 35.
  36. 36. Accessed on 22 Oct 2013
  37. 37. Accessed on 22 Oct 2013
  38. 38. Accessed on 22 Oct. 2013
  39. 39.
  40. 40.
    Sorter PF, Granito CE, Gilmer JC, Alan G, Metcalf EA (1963) Rapid structure searches via permutated chemical line notation. J Chem Doc 4(1):56–60CrossRefGoogle Scholar
  41. 41.
    Fritts LE, Schwind MM (1982) Using the Wiswesser line Notation (WLN) for online, interactive searching of chemical structures. J Chem Inf Comput Sci 22:106–109CrossRefGoogle Scholar
  42. 42.
    Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland B A, Laufer J (1992) Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Informat Model 32(3):244CrossRefGoogle Scholar
  43. 43.
    Weininger D (1990) SMILES Graphical depiction of chemical structures J Chem Inf Comput Sci 30:237–243CrossRefGoogle Scholar
  44. 44.
  45. 45.
    Cline AS, Homer MA, Hurst RW, Smith T, Gregory B (1997) SYBYL Line Notation (SLN): a versatile language for chemical structure representation. J Chem Inf Comput. Sci 37:71–79Google Scholar
  46. 46.
    Alan M (2006) The IUPAC international chemical identifier: In Chl. Chemistry International (IUPAC) 28 (6)
  47. 47.
    King RB (ed) (1983) Chemical applications of topology and graph theory. ElsevierGoogle Scholar
  48. 48.
    Grave K D, Costa F (2010) Molecular graph augmentation with rings and functional groups. J Chem Inf Model 50:1660–1668CrossRefGoogle Scholar
  49. 49.
    Santagata LN, Suvire FD, Enriz RD (2001) A matrix representation for the geometrical algorithm to search the chemical space. J Mol Struct Theochem 571:91–98CrossRefGoogle Scholar
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
    Phadungsukanan W, Kraft M, Townsend JA, Murray-Rust P (2012) The semantics of chemical markup language(CML) for computational chemistry. J Cheminform 4(1):15CrossRefGoogle Scholar
  59. 59.
  60. 60.
  61. 61.
  62. 62.
    Barnard JM, Lynch MF, Welford S M (1981) Computer storage and retrieval of generic chemical structures in patents. GENSAL, a formal language for the description of generic chemical structures. J Chem Inf Comput Sci 21:151–161CrossRefGoogle Scholar
  63. 63.
    O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3:33CrossRefGoogle Scholar
  64. 64.
  65. 65.
    Bath, PAP, Andrew R, Willett P, Allen, FH (1994) Similarity searching in files of three-dimensional chemical structures: comparison of fragment-based measures of shape similarity. J Chem Inf Comput Sci 34:141–147CrossRefGoogle Scholar
  66. 66.
    Wang Y, Bajorath J (2010) Advanced Fingerprint methods for similarity searching: balancing molecular complexity effects. Comb Chem High Throughput Screen 13:220–228CrossRefGoogle Scholar
  67. 67.
    Wipke W T, Krishnan S, Ouchi G I (1978) Hash functions for rapid storage and retrieval of chemical structures. J Chem Inf Comput Sci 18:32–37CrossRefGoogle Scholar
  68. 68.
    Takahashi Y, Sukekawa M, Sasaki S (1992) Automatic identification of molecular similarity using reduced-graph representation of chemical structure. J Chem Inf Comput Sci 32:639–43CrossRefGoogle Scholar
  69. 69.
  70. 70.
  71. 71.
  72. 72.
    Vogt M, Bajorath J (2013) Similarity searching for potent compounds using feature selection. J Chem Inf Model 53(7):1613–1619CrossRefGoogle Scholar
  73. 73.
    Sayle RA, Batista JJ, Grant A (2013) An efficient maximum common subgraph(MCS) searching of large chemical databases. J Cheminformat 5(1):O15CrossRefGoogle Scholar
  74. 74.
    Chen X, Reynolds CH (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci 42:1407–1414CrossRefGoogle Scholar
  75. 75.
    Holliday JD, Salim N, Whittle M, Willett P (2003) Analysis and display of the size dependence of chemical similarity coefficients. J Chem Inf Comput Sci 43:819–828CrossRefGoogle Scholar
  76. 76.
    Weiss G (2007) Exploring the milky way of molecular diversity combinatorial chemistry and molecular diversity. Curr opin chem biolo 11:241–243CrossRefGoogle Scholar
  77. 77.
    Karthikeyan M, Vyas R (2012) Chemical structure representation and applications in computational toxicology. In: Reisfield B, Mayeno AN (ed) Computational toxicology. Springer, pp 167–192Google Scholar
  78. 78.
    Karthikeyan M, Uzagare D, Krishnan S (2003) Compressed chemical markup language for compact storage and inventory applications. 225th ACS Meeting New Orleans. CG ACS, pp 23–27Google Scholar
  79. 79.
    Karthikeyan M, Krishnan S, Pandey AK (2006) Harvesting chemical information from the internet using a distributed approach. Chem Extreme J Chem Inf Model 46:452–461CrossRefGoogle Scholar
  80. 80.
    Karthikeyan M, Bender, A (2005) Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes. J Chem Inform Model 45:572–580CrossRefGoogle Scholar
  81. 81.
  82. 82.
    Valko AT, Johnson AP (2009) CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. J Chem Inform Model 49:780–787CrossRefGoogle Scholar
  83. 83.
    Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information OSRA, an open source solution. J Chem Inf Model 49(3):740–743CrossRefGoogle Scholar
  84. 84.
  85. 85.
    Karthikeyan M, Krishnan S, Pandey AK, Bender A (2008) Distributed chemical computing using Chemstar: an open source Java Remote Method Invocation architecture applied to large scale molecular data from Pubchem. J Chem Info Model 48:691–703CrossRefGoogle Scholar
  86. 86.
    Song CM, Bernardo PH, Chai CL, Tong JC (2009) CLEVER: pipeline for designing insilico chemical libraries. J Mol Graph Model 27(5):578–583CrossRefGoogle Scholar
  87. 87.
    Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304CrossRefGoogle Scholar
  88. 88.
    Hoon MJL, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinforma 20(9):1453–1454CrossRefGoogle Scholar
  89. 89.
    Saldanha AJ (2004) JAVA treeview extensible visualization of microarray data. Bioinforma 20:3246–3248CrossRefGoogle Scholar
  90. 90.
  91. 91.
    Ullman J (1997) First course in database systems. Prentice-Hall Inc., Simon & Schuster, p 1Google Scholar
  92. 92.
    Mike C SQL FundamentalsGoogle Scholar
  93. 93.

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Digital Information Resource CentreNational Chemical LaboratoryPuneIndia
  2. 2.Scientist (DST) Division of Chemical Engineering and Process DevelopmentNational Chemical LaboratoryPuneIndia

Personalised recommendations