Abstract
Statistical learning of molecular structure properties is gaining interests among researchers. These methods are faster compared to traditional QM based methods. In addition, the physical properties can be incorporated as feature sets and a properly trained model can predict the desired properties of a molecular system. For this, a number of machine learning regressors are used to predict molecular energies of Si − n (n = 1, 2, ⋯25) clusters, water, methane and ethane molecules. For the Sin cluster, six out of eight regressors seem to predict the energies accurately. For other data sets, Decision Tree Regressor prediction resulted fairly good, in general, compared to others. However, through the addition of atomic charges as an extra feature improved the performance of other regressors, this did not cause any improvement for the Decision Tree regressor. Since calculating atomic charges in itself is an expensive task, we summarize that decision tree regressor is suitable for predicting molecular properties compared to other regressors tested here.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Christopher M Bishop. Pattern Recognition and Machine Learning, volume 4. 2006.
Abdellaziz Doghmane, Linda Achou, and Zahia Hadjoub. Determination of an analytical relation for binding energy dependence on small size silicon nanoclusters (nSi ≤ 10 at.). Journal of Optoelectronics and Advanced Materials, 18(7–8):685–690, 2016.
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, 2001.
R. Dutter. Algorithms for the Huber estimator in multiple regression. Computing, 18(2):167–176, 1977.
Håkan Ekblom. A new algorithm for the Huber estimator in linear models. BIT, 28(1):123–132, 1988.
Martin a Fischler and Robert C Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6):381–395, 1981.
Célia Fonseca Guerra, J. G. Snijders, G. Te Velde, and Evert Jan Baerends. Towards an order-N DFT method. Theoretical Chemistry Accounts, 99:391–403, 1998.
Yoel Haitovsky and Yohanan Wax. Generalized ridge regression, least squares with stochastic prior information, and Bayesian estimators. Applied Mathematics and Computation, 7(2):125–154, 1980.
Douglas M. Hawkins, Subhash C. Basak, and Xiaofang Shi. QSAR with Few Compounds and Many Features. Journal of Chemical Information and Computer Sciences, 41(3):663–670, 2001.
P J Huber. Robust Statistics. Statistics, 60(1986):1–11, 2004.
David J C MacKay. Information Theory, Inference, and Learning Algorithms David J.C. MacKay, volume 100. 2005.
Jan Mielniczuk and Joanna Tyrcha. Consistency of multilayer perceptron regression estimators. Neural Networks, 6(7):1019–1022, 1993.
Tom M Mitchell. Machine Learning. Number 1. 1997.
Gregoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen, Alexandre Tkatchenko, Klaus Robert Muller, and O. Anatole Von Lilienfeld. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics, 15, 2013.
Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to Linear Regression Analysis (5th ed.). Technometrics, 49(December):232–233, 2011.
Leena Pasanen, Lasse Holmström, and Mikko J. Sillanpää. Bayesian LASSO, scale space and decision making in association genetics. PLoS ONE, 10(4):1–26, 2015.
John P Perdew and Yue Wang. Accurate and simple analytical representation of the electron-gas correlation energy. 45(23):244–249, 1992.
J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.
Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan Michael Frahm. USAC: A universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):2022–2038, 2013.
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Big Data meets Quantum Chemistry Approximations: The Delta-Machine Learning Approach. Journal of Chemical Theory and Computation, 2015.
Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O. Anatole Von Lilienfeld. Electronic spectra from TDDFT and machine learning in chemical space. Journal of Chemical Physics, 143(8), 2015.
David E Rumelhart, Geoffrey E Hinton, and R J Williams. Learning Internal Representations by Error Propagation, 1986.
I. Sammut, Claude and Webb, Geoffrey. Encyclopedia of Machine Learning and Data Mining. Springer, 2 edition, 2017.
G. te Velde, F. M. Bickelhaupt, E. J. Baerends, C. Fonseca Guerra, S. J.A. van Gisbergen, J. G. Snijders, and T. Ziegler. Chemistry with ADF. Journal of Computational Chemistry, 22(9):931–967, 2001.
E. Van Lenthe and E. J. Baerends. Optimized Slater-type basis sets for the elements 1–118. Journal of Computational Chemistry, 24(9):1142–1156, 2003.
O. Anatole Von Lilienfeld. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. International Journal of Quantum Chemistry, 113(12):1676–1689, 2013.
Yan Xin and Xiao Gang Su. Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co., Inc., River Edge, NJ, USA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wasee, A., Chaudhuri, R.G., Kumar, P., Iype, E. (2020). A Study on Distance Based Representation of Molecules for Statistical Learning. In: Smys, S., Iliyasu, A.M., Bestak, R., Shi, F. (eds) New Trends in Computational Vision and Bio-inspired Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-41862-5_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-41862-5_56
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41861-8
Online ISBN: 978-3-030-41862-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)