Open-Source Tools, Techniques, and Data in Chemoinformatics

Karthikeyan, Muthukumarasamy; Vyas, Renu

doi:10.1007/978-81-322-1780-0_1

Muthukumarasamy Karthikeyan³ &
Renu Vyas⁴

2336 Accesses
1 Altmetric

Abstract

Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Leach A (2007) An introduction to chemoinformatics. Springer
Google Scholar
Gasteiger J, Engel T (eds) (2003) Chemoinformatics: a textbook. Wiley-VCH
Google Scholar
Gasteiger J(ed) (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCH
Google Scholar
Umashankar V, Gurunathan S (2011) Chemoinformatics and its applications. General applied and systems toxicology. Wiley
Google Scholar
Acton A(ed) (2011) Issues in biotechnology and medical technology research and application (Scholarly Editions)
Google Scholar
Muffatto M (2006) Open source: a multidisciplinary approach. Imperial College Press
Google Scholar
http://www.openbsdindia.org/
Ortega JM (1994) An introduction to fortran 90 for scientific computing. Oxford University Press
Google Scholar
http://www.computerhope.com/unix.htm. Accessed on 22 Oct 2013
Douglas EC Internetworking with TCP/IP—Principles, Protocols and Architecture
Google Scholar
Kernighan BW, Ritchie DM (1978) The C programming language, 1st ed. Prentice Hall, Englewood Cliffs
Google Scholar
Stroustrup B (1997) “1”. The C++ Programming Language, 3rd ed. Addison-Wesley
Google Scholar
Fan Li (2006) Developing chemical information systems: an object oriented approach using enterprise Java. Wiley
Google Scholar
http://www.perl.org/
http://www.python.org/
http://www.r-project.org/
http://www.nvidia.com/object/cuda_home_new.html
Schatz MC, Trapnell C, Delcher AL, Varshaney A (2007) High through put sequence alignment using graphics processing units. BMC Bioinformat 8:474
Article Google Scholar
Ash JE, Warr WA, Willett P (1991) Chemical structure systems: computational techniques for representation, searching, and process of structural information. Ellis Horwood, New York
Google Scholar
Gluck DJ (1964) A chemical structure storage and search systems developed at Du Pont. J Chem Informat Model 5:43–51
Google Scholar
Warr WA (2011) Representation of chemical structures. WIREs Comput Mol Sci 1(4):557–579
Article CAS Google Scholar
Krause S, Willighagen E, Steinbeck C (2000) Using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Mol 5:93–98
Article CAS Google Scholar
https://github.com/features/projects
http://www.xml-cml.org/
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann EE, Willighagen E (2003) The chemistry development kit(CDK): an open source JAVA library for Chemo-and Bioinformatics. J Chem Informat Model 43:493–500
Article CAS Google Scholar
http://mcdl.sourceforge.net/
Ertl P (2010) Molecular structure input on the web. J Cheminformatics 2:1
Article Google Scholar
Bienfait B, Ertl, P (2013) JSME: a free molecule editor in JavaScript. J Cheminformat 5:24
Article CAS Google Scholar
http://www.molinspiration.com/. Accessed on 22 Oct 2013
http://www.chemaxon.com/. Accessed on 22 Oct 2013
http://www.acdlabs.com/resources/freeware/chemsketch/. Accessed on 22 Oct 2013
http://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/. Accessed on 22 Oct 2013
http://www.schrodinger.com/. Accessed on 22 Oct 2013
http://www.chemcomp.com/. Accessed on 22 Oct 2013
http://accelrys.com/products/informatics/cheminformatics/draw/ . Accessed on 22 Oct 2013
https://www.cas.org/products/scifinder. Accessed on 22 Oct 2013
http://www.chemspider.com/. Accessed on 22 Oct 2013
http://www.nih.gov/. Accessed on 22 Oct. 2013
http://www.beilstein-journals.org/bjoc/home/home.htm. Accessed on 22 Oct 2013
Sorter PF, Granito CE, Gilmer JC, Alan G, Metcalf EA (1963) Rapid structure searches via permutated chemical line notation. J Chem Doc 4(1):56–60
Article Google Scholar
Fritts LE, Schwind MM (1982) Using the Wiswesser line Notation (WLN) for online, interactive searching of chemical structures. J Chem Inf Comput Sci 22:106–109
Article CAS Google Scholar
Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland B A, Laufer J (1992) Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Informat Model 32(3):244
Article CAS Google Scholar
Weininger D (1990) SMILES Graphical depiction of chemical structures J Chem Inf Comput Sci 30:237–243
Article CAS Google Scholar
www.daylight.com/dayhtml/doc/theory/theory.smarts.html
Cline AS, Homer MA, Hurst RW, Smith T, Gregory B (1997) SYBYL Line Notation (SLN): a versatile language for chemical structure representation. J Chem Inf Comput. Sci 37:71–79
Google Scholar
Alan M (2006) The IUPAC international chemical identifier: In Chl. Chemistry International (IUPAC) 28 (6) http://www.iupac.org/publications/ci/2006/2806/4_tools.html.
King RB (ed) (1983) Chemical applications of topology and graph theory. Elsevier
Google Scholar
Grave K D, Costa F (2010) Molecular graph augmentation with rings and functional groups. J Chem Inf Model 50:1660–1668
Article Google Scholar
Santagata LN, Suvire FD, Enriz RD (2001) A matrix representation for the geometrical algorithm to search the chemical space. J Mol Struct Theochem 571:91–98
Article CAS Google Scholar
http://www.ccl.net/cca/documents/molecular-modeling/node3.html
www.lohninger.com/helpcsuite/connection_table.htmm
http://www.cas.org/content/chemical-substances
http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php
http://www.wolfram.com/
http://cactus.nci.nih.gov/SDF_toolkit/
http://www.cgl.ucsf.edu/chimera/docs/UsersGuide/xyz.html
http://www.wwpdb.org/docs.html
Phadungsukanan W, Kraft M, Townsend JA, Murray-Rust P (2012) The semantics of chemical markup language(CML) for computational chemistry. J Cheminform 4(1):15
Article CAS Google Scholar
http://www.tripos.com/tripos_resources/fileroot/pdfs/mol2_format.pdf
http://www.molsoft.com/2dto3d.html
http://www.molecular-networks.com
Barnard JM, Lynch MF, Welford S M (1981) Computer storage and retrieval of generic chemical structures in patents. GENSAL, a formal language for the description of generic chemical structures. J Chem Inf Comput Sci 21:151–161
Article CAS Google Scholar
O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3:33
Article Google Scholar
http://www.chemaxon.com/marvin/help/applications/molconvert.html
Bath, PAP, Andrew R, Willett P, Allen, FH (1994) Similarity searching in files of three-dimensional chemical structures: comparison of fragment-based measures of shape similarity. J Chem Inf Comput Sci 34:141–147
Article CAS Google Scholar
Wang Y, Bajorath J (2010) Advanced Fingerprint methods for similarity searching: balancing molecular complexity effects. Comb Chem High Throughput Screen 13:220–228
Article CAS Google Scholar
Wipke W T, Krishnan S, Ouchi G I (1978) Hash functions for rapid storage and retrieval of chemical structures. J Chem Inf Comput Sci 18:32–37
Article CAS Google Scholar
Takahashi Y, Sukekawa M, Sasaki S (1992) Automatic identification of molecular similarity using reduced-graph representation of chemical structure. J Chem Inf Comput Sci 32:639–43
Article CAS Google Scholar
http://www.cas.org/etrain/stn/exactfamilysearch.html
http://www.chemaxon.com/jchem/intro/index.html
http://www2.chemie.uni-erlangen.de/software/wodca/subsearch.html
Vogt M, Bajorath J (2013) Similarity searching for potent compounds using feature selection. J Chem Inf Model 53(7):1613–1619
Article CAS Google Scholar
Sayle RA, Batista JJ, Grant A (2013) An efficient maximum common subgraph(MCS) searching of large chemical databases. J Cheminformat 5(1):O15
Article Google Scholar
Chen X, Reynolds CH (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci 42:1407–1414
Article CAS Google Scholar
Holliday JD, Salim N, Whittle M, Willett P (2003) Analysis and display of the size dependence of chemical similarity coefficients. J Chem Inf Comput Sci 43:819–828
Article CAS Google Scholar
Weiss G (2007) Exploring the milky way of molecular diversity combinatorial chemistry and molecular diversity. Curr opin chem biolo 11:241–243
Article CAS Google Scholar
Karthikeyan M, Vyas R (2012) Chemical structure representation and applications in computational toxicology. In: Reisfield B, Mayeno AN (ed) Computational toxicology. Springer, pp 167–192
Google Scholar
Karthikeyan M, Uzagare D, Krishnan S (2003) Compressed chemical markup language for compact storage and inventory applications. 225th ACS Meeting New Orleans. CG ACS, pp 23–27
Google Scholar
Karthikeyan M, Krishnan S, Pandey AK (2006) Harvesting chemical information from the internet using a distributed approach. Chem Extreme J Chem Inf Model 46:452–461
Article CAS Google Scholar
Karthikeyan M, Bender, A (2005) Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes. J Chem Inform Model 45:572–580
Article CAS Google Scholar
http://www.moltable.ncl.res.in
Valko AT, Johnson AP (2009) CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. J Chem Inform Model 49:780–787
Article CAS Google Scholar
Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information OSRA, an open source solution. J Chem Inf Model 49(3):740–743
Article CAS Google Scholar
http://infochem.de/products/index.shtml
Karthikeyan M, Krishnan S, Pandey AK, Bender A (2008) Distributed chemical computing using Chemstar: an open source Java Remote Method Invocation architecture applied to large scale molecular data from Pubchem. J Chem Info Model 48:691–703
Article CAS Google Scholar
Song CM, Bernardo PH, Chai CL, Tong JC (2009) CLEVER: pipeline for designing insilico chemical libraries. J Mol Graph Model 27(5):578–583
Article CAS Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304
Article Google Scholar
Hoon MJL, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinforma 20(9):1453–1454
Article Google Scholar
Saldanha AJ (2004) JAVA treeview extensible visualization of microarray data. Bioinforma 20:3246–3248
Article CAS Google Scholar
http://www.chemaxon.com/products/jklustor/
Ullman J (1997) First course in database systems. Prentice-Hall Inc., Simon & Schuster, p 1
Google Scholar
Mike C SQL Fundamentals
Google Scholar
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=712

Download references

Author information

Authors and Affiliations

Digital Information Resource Centre, National Chemical Laboratory, Pune, India
Muthukumarasamy Karthikeyan
Scientist (DST) Division of Chemical Engineering and Process Development, National Chemical Laboratory, 411008, Pune, India
Renu Vyas

Authors

Muthukumarasamy Karthikeyan
View author publications
You can also search for this author in PubMed Google Scholar
Renu Vyas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muthukumarasamy Karthikeyan .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Karthikeyan, M., Vyas, R. (2014). Open-Source Tools, Techniques, and Data in Chemoinformatics. In: Practical Chemoinformatics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1780-0_1

Download citation

DOI: https://doi.org/10.1007/978-81-322-1780-0_1
Published: 07 May 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1779-4
Online ISBN: 978-81-322-1780-0
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)

Publish with us

Policies and ethics