A Study on Distance Based Representation of Molecules for Statistical Learning

Wasee, Abdul; Chaudhuri, Rajib Ghosh; Kumar, Prakash; Iype, Eldhose

doi:10.1007/978-3-030-41862-5_56

Abdul Wasee⁵,
Rajib Ghosh Chaudhuri⁶,
Prakash Kumar⁵ &
…
Eldhose Iype⁵

26 Accesses

Abstract

Statistical learning of molecular structure properties is gaining interests among researchers. These methods are faster compared to traditional QM based methods. In addition, the physical properties can be incorporated as feature sets and a properly trained model can predict the desired properties of a molecular system. For this, a number of machine learning regressors are used to predict molecular energies of Si − n (n = 1, 2, ⋯25) clusters, water, methane and ethane molecules. For the Si_n cluster, six out of eight regressors seem to predict the energies accurately. For other data sets, Decision Tree Regressor prediction resulted fairly good, in general, compared to others. However, through the addition of atomic charges as an extra feature improved the performance of other regressors, this did not cause any improvement for the Decision Tree regressor. Since calculating atomic charges in itself is an expensive task, we summarize that decision tree regressor is suitable for predicting molecular properties compared to other regressors tested here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Christopher M Bishop. Pattern Recognition and Machine Learning, volume 4. 2006.
Google Scholar
Abdellaziz Doghmane, Linda Achou, and Zahia Hadjoub. Determination of an analytical relation for binding energy dependence on small size silicon nanoclusters (nSi ≤ 10 at.). Journal of Optoelectronics and Advanced Materials, 18(7–8):685–690, 2016.
Google Scholar
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, 2001.
Google Scholar
R. Dutter. Algorithms for the Huber estimator in multiple regression. Computing, 18(2):167–176, 1977.
Article MathSciNet Google Scholar
Håkan Ekblom. A new algorithm for the Huber estimator in linear models. BIT, 28(1):123–132, 1988.
Article MathSciNet Google Scholar
Martin a Fischler and Robert C Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6):381–395, 1981.
Google Scholar
Célia Fonseca Guerra, J. G. Snijders, G. Te Velde, and Evert Jan Baerends. Towards an order-N DFT method. Theoretical Chemistry Accounts, 99:391–403, 1998.
Google Scholar
Yoel Haitovsky and Yohanan Wax. Generalized ridge regression, least squares with stochastic prior information, and Bayesian estimators. Applied Mathematics and Computation, 7(2):125–154, 1980.
Article MathSciNet Google Scholar
Douglas M. Hawkins, Subhash C. Basak, and Xiaofang Shi. QSAR with Few Compounds and Many Features. Journal of Chemical Information and Computer Sciences, 41(3):663–670, 2001.
Article Google Scholar
P J Huber. Robust Statistics. Statistics, 60(1986):1–11, 2004.
Google Scholar
David J C MacKay. Information Theory, Inference, and Learning Algorithms David J.C. MacKay, volume 100. 2005.
Google Scholar
Jan Mielniczuk and Joanna Tyrcha. Consistency of multilayer perceptron regression estimators. Neural Networks, 6(7):1019–1022, 1993.
Article Google Scholar
Tom M Mitchell. Machine Learning. Number 1. 1997.
Google Scholar
Gregoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen, Alexandre Tkatchenko, Klaus Robert Muller, and O. Anatole Von Lilienfeld. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics, 15, 2013.
Google Scholar
Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to Linear Regression Analysis (5th ed.). Technometrics, 49(December):232–233, 2011.
Google Scholar
Leena Pasanen, Lasse Holmström, and Mikko J. Sillanpää. Bayesian LASSO, scale space and decision making in association genetics. PLoS ONE, 10(4):1–26, 2015.
Article Google Scholar
John P Perdew and Yue Wang. Accurate and simple analytical representation of the electron-gas correlation energy. 45(23):244–249, 1992.
Google Scholar
J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.
Article Google Scholar
Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan Michael Frahm. USAC: A universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):2022–2038, 2013.
Google Scholar
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Big Data meets Quantum Chemistry Approximations: The Delta-Machine Learning Approach. Journal of Chemical Theory and Computation, 2015.
Google Scholar
Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O. Anatole Von Lilienfeld. Electronic spectra from TDDFT and machine learning in chemical space. Journal of Chemical Physics, 143(8), 2015.
Google Scholar
David E Rumelhart, Geoffrey E Hinton, and R J Williams. Learning Internal Representations by Error Propagation, 1986.
Google Scholar
I. Sammut, Claude and Webb, Geoffrey. Encyclopedia of Machine Learning and Data Mining. Springer, 2 edition, 2017.
Google Scholar
G. te Velde, F. M. Bickelhaupt, E. J. Baerends, C. Fonseca Guerra, S. J.A. van Gisbergen, J. G. Snijders, and T. Ziegler. Chemistry with ADF. Journal of Computational Chemistry, 22(9):931–967, 2001.
Article Google Scholar
E. Van Lenthe and E. J. Baerends. Optimized Slater-type basis sets for the elements 1–118. Journal of Computational Chemistry, 24(9):1142–1156, 2003.
Article Google Scholar
O. Anatole Von Lilienfeld. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. International Journal of Quantum Chemistry, 113(12):1676–1689, 2013.
Article Google Scholar
Yan Xin and Xiao Gang Su. Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co., Inc., River Edge, NJ, USA.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemical Engineering, BITS Pilani, Dubai Campus, Dubai, United Arab Emirates
Abdul Wasee, Prakash Kumar & Eldhose Iype
Department of Chemical Engineering, National Institute of Technology Durgapur, Durgapur, India
Rajib Ghosh Chaudhuri

Authors

Abdul Wasee
View author publications
You can also search for this author in PubMed Google Scholar
Rajib Ghosh Chaudhuri
View author publications
You can also search for this author in PubMed Google Scholar
Prakash Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Eldhose Iype
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eldhose Iype .

Editor information

Editors and Affiliations

Department of CSE, RVS Technical Campus, Coimbatore, Tamil Nadu, India
S. Smys
Tokyo Institute of Technology, School of Computing, Tokyo, Japan
Abdullah M. Iliyasu
Department of Telecommunication Engineering, Czech Technical University in Prague, Prague, Czech Republic
Robert Bestak
College of Information Science & Engineering, Wenzhou Medical University, Wenzhou, China
Fuqian Shi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wasee, A., Chaudhuri, R.G., Kumar, P., Iype, E. (2020). A Study on Distance Based Representation of Molecules for Statistical Learning. In: Smys, S., Iliyasu, A.M., Bestak, R., Shi, F. (eds) New Trends in Computational Vision and Bio-inspired Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-41862-5_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-41862-5_56
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41861-8
Online ISBN: 978-3-030-41862-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics