Skip to main content

A Study on Distance Based Representation of Molecules for Statistical Learning

  • Chapter
New Trends in Computational Vision and Bio-inspired Computing

Abstract

Statistical learning of molecular structure properties is gaining interests among researchers. These methods are faster compared to traditional QM based methods. In addition, the physical properties can be incorporated as feature sets and a properly trained model can predict the desired properties of a molecular system. For this, a number of machine learning regressors are used to predict molecular energies of Si − n (n = 1, 2, ⋯25) clusters, water, methane and ethane molecules. For the Sin cluster, six out of eight regressors seem to predict the energies accurately. For other data sets, Decision Tree Regressor prediction resulted fairly good, in general, compared to others. However, through the addition of atomic charges as an extra feature improved the performance of other regressors, this did not cause any improvement for the Decision Tree regressor. Since calculating atomic charges in itself is an expensive task, we summarize that decision tree regressor is suitable for predicting molecular properties compared to other regressors tested here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Christopher M Bishop. Pattern Recognition and Machine Learning, volume 4. 2006.

    Google Scholar 

  2. Abdellaziz Doghmane, Linda Achou, and Zahia Hadjoub. Determination of an analytical relation for binding energy dependence on small size silicon nanoclusters (nSi ≤ 10 at.). Journal of Optoelectronics and Advanced Materials, 18(7–8):685–690, 2016.

    Google Scholar 

  3. Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, 2001.

    Google Scholar 

  4. R. Dutter. Algorithms for the Huber estimator in multiple regression. Computing, 18(2):167–176, 1977.

    Article  MathSciNet  Google Scholar 

  5. Håkan Ekblom. A new algorithm for the Huber estimator in linear models. BIT, 28(1):123–132, 1988.

    Article  MathSciNet  Google Scholar 

  6. Martin a Fischler and Robert C Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6):381–395, 1981.

    Google Scholar 

  7. Célia Fonseca Guerra, J. G. Snijders, G. Te Velde, and Evert Jan Baerends. Towards an order-N DFT method. Theoretical Chemistry Accounts, 99:391–403, 1998.

    Google Scholar 

  8. Yoel Haitovsky and Yohanan Wax. Generalized ridge regression, least squares with stochastic prior information, and Bayesian estimators. Applied Mathematics and Computation, 7(2):125–154, 1980.

    Article  MathSciNet  Google Scholar 

  9. Douglas M. Hawkins, Subhash C. Basak, and Xiaofang Shi. QSAR with Few Compounds and Many Features. Journal of Chemical Information and Computer Sciences, 41(3):663–670, 2001.

    Article  Google Scholar 

  10. P J Huber. Robust Statistics. Statistics, 60(1986):1–11, 2004.

    Google Scholar 

  11. David J C MacKay. Information Theory, Inference, and Learning Algorithms David J.C. MacKay, volume 100. 2005.

    Google Scholar 

  12. Jan Mielniczuk and Joanna Tyrcha. Consistency of multilayer perceptron regression estimators. Neural Networks, 6(7):1019–1022, 1993.

    Article  Google Scholar 

  13. Tom M Mitchell. Machine Learning. Number 1. 1997.

    Google Scholar 

  14. Gregoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen, Alexandre Tkatchenko, Klaus Robert Muller, and O. Anatole Von Lilienfeld. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics, 15, 2013.

    Google Scholar 

  15. Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to Linear Regression Analysis (5th ed.). Technometrics, 49(December):232–233, 2011.

    Google Scholar 

  16. Leena Pasanen, Lasse Holmström, and Mikko J. Sillanpää. Bayesian LASSO, scale space and decision making in association genetics. PLoS ONE, 10(4):1–26, 2015.

    Article  Google Scholar 

  17. John P Perdew and Yue Wang. Accurate and simple analytical representation of the electron-gas correlation energy. 45(23):244–249, 1992.

    Google Scholar 

  18. J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.

    Article  Google Scholar 

  19. Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan Michael Frahm. USAC: A universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):2022–2038, 2013.

    Google Scholar 

  20. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Big Data meets Quantum Chemistry Approximations: The Delta-Machine Learning Approach. Journal of Chemical Theory and Computation, 2015.

    Google Scholar 

  21. Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O. Anatole Von Lilienfeld. Electronic spectra from TDDFT and machine learning in chemical space. Journal of Chemical Physics, 143(8), 2015.

    Google Scholar 

  22. David E Rumelhart, Geoffrey E Hinton, and R J Williams. Learning Internal Representations by Error Propagation, 1986.

    Google Scholar 

  23. I. Sammut, Claude and Webb, Geoffrey. Encyclopedia of Machine Learning and Data Mining. Springer, 2 edition, 2017.

    Google Scholar 

  24. G. te Velde, F. M. Bickelhaupt, E. J. Baerends, C. Fonseca Guerra, S. J.A. van Gisbergen, J. G. Snijders, and T. Ziegler. Chemistry with ADF. Journal of Computational Chemistry, 22(9):931–967, 2001.

    Article  Google Scholar 

  25. E. Van Lenthe and E. J. Baerends. Optimized Slater-type basis sets for the elements 1–118. Journal of Computational Chemistry, 24(9):1142–1156, 2003.

    Article  Google Scholar 

  26. O. Anatole Von Lilienfeld. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. International Journal of Quantum Chemistry, 113(12):1676–1689, 2013.

    Article  Google Scholar 

  27. Yan Xin and Xiao Gang Su. Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co., Inc., River Edge, NJ, USA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eldhose Iype .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Wasee, A., Chaudhuri, R.G., Kumar, P., Iype, E. (2020). A Study on Distance Based Representation of Molecules for Statistical Learning. In: Smys, S., Iliyasu, A.M., Bestak, R., Shi, F. (eds) New Trends in Computational Vision and Bio-inspired Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-41862-5_56

Download citation

Publish with us

Policies and ethics