Definition
Given a set of chemical compounds, chemical data mining is to characterize the compounds present in the data set and apply a variety of mining methods to discover relationships between the compounds and their biological and chemical activities.
Historical Background
In 1969, Hansch [6] introduced quantitative structure-activity relationship (QSAR) analysis which attempts to correlate physicochemical or structural properties of compounds with biological and chemical activities. These physicochemical and structural properties are determined empirically or by computational methods. QSAR prefers vectorial mappings of compounds, which are usually coded by existing physicochemical and structural fingerprints. Dehaspe et al. [3] applied inductive logic programming to predict chemical carcinogenicity by mining frequent substructures in chemical datasets, which identifies new structural fingerprints so that QSAR could build comprehensive analytical models.
Foundations
Chemical...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Bunke H, Shearer K. A graph distance metric based on the maximal common subgraph. Pattern Recogn Lett. 1998;19(3):255–9.
Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Dehaspe L, Toivonen H, King R. Finding frequent substructures in chemical compounds. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining; 1998. p. 30–6.
Deshpande M, Kuramochi M, Wale N, Karypis G. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng. 2005;17(8):1036–50.
Fröhlich H, Wegner J, Sieker F, Zell A. Optimal assignment kernels for attributed molecular graphs. In: Proceedings of the 22nd International Conference on Machine Learning; 2005. p. 225–32.
Hansch C. A quantitative approach to biochemical structure-activity relationships. Acc Chem Res. 1969;2(8):232–9.
Kashima H, Tsuda K, Inokuchi A. Marginalized kernels between labeled graphs. In: Proceedings of the 20th International Conference on Machine Learning; 2003. p. 321–28.
Kramer S, Raedt L, Helma C. Molecular feature mining in HIV data. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2001. p. 136–43.
Yan X, Yu PS, Han J. Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 335–46.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Yan, X. (2018). Mining of Chemical Data. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1299
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1299
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering