Skip to main content
Log in

Symbolic regression in materials science

  • Artificial Intelligence Prospective
  • Published:
MRS Communications Aims and scope Submit manuscript

Abstract

The authors showcase the potential of symbolic regression as an analytic method for use in materials research. First, the authors briefly describe the current state-of-the-art method, genetic programming-based symbolic regression (GPSR), and recent advances in symbolic regression techniques. Next, the authors discuss industrial applications of symbolic regression and its potential applications in materials science. The authors then present two GPSR use-cases: formulating a transformation kinetics law and showing the learning scheme discovers the well-known Johnson–Mehl–Avrami–Kolmogorov form, and learning the Landau free energy functional form for the displacive tilt transition in perovskite LaNiO3. Finally, the authors propose that symbolic regression techniques should be considered by materials scientists as an alternative to other machine learning-based regression models for learning from data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Table I
Figure 4
Table II
Table III
Figure 5
Figure 6

Similar content being viewed by others

References

  1. E. Deelman, C. Carothers, A. Mandal, B. Tierney, J.S. Vetter, I. Baldin, C. Castillo, G. Juve, D. Król, V. Lynch, B. Mayer, J. Meredith, T. Proffen, P. Ruth, and R.F. da Silva: PANORAMA: an approach to performance modeling and diagnosis of extreme-scale workflows. Int. J. High Perform. Comput. Appl. 31, 4–18 (2017).

    Article  Google Scholar 

  2. A.R. Lupini, M.P. Oxley, and S.V. Kalinin: Pushing the limits of electron ptychography. Science 362, 399–400 (2018).

    Article  CAS  Google Scholar 

  3. F. Ren, R. Pandolfi, D. Van Campen, A. Hexemer, and A. Mehta: On-the-fly data assessment for high-throughput X-ray diffraction measurements. ACS Comb. Sci. 19, 377–385 (2017).

    Article  CAS  Google Scholar 

  4. H.S. Stein, D. Guevarra, P.F. Newhouse, E. Soedarmadji, and J.M. Gregoire: Machine learning of optical properties of materials predicting spectra from images and images from spectra. Chem. Sci. 10, 47–55 (2019).

    Article  CAS  Google Scholar 

  5. K. Alberi, M.B. Nardelli, A. Zakutayev, L. Mitas, S. Curtarolo, A. Jain, M. Fornari, N. Marzari, I. Takeuchi, M.L. Green, M. Kanatzidis, M.F. Toney, S. Butenko, B. Meredig, S. Lany, U. Kattner, A. Davydov, E.S. Toberer, V. Stevanovic, A. Walsh, N.-G. Park, A. Aspuru-Guzik, D.P. Tabor, J. Nelson, J. Murphy, A. Setlur, J. Gregoire, H. Li, R. Xiao, A. Ludwig, L.W. Martin, A.M. Rappe, S.-H. Wei, and J. Perkins: The 2019 materials by design roadmap. J. Phys. D: Appl. Phys. 52, 013001 (2019).

    Article  CAS  Google Scholar 

  6. M.L. Green, C.L. Choi, J.R. Hattrick-Simpers, A.M. Joshi, I. Takeuchi, S.C. Barron, E. Campo, T. Chiang, S. Empedocles, J.M. Gregoire, A.G. Kusne, J. Martin, A. Mehta, K. Persson, Z. Trautt, J. Van Duren, and A. Zakutayev: Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies. Appl. Phys. Rev. 4, 011105 (2017). doi:10.1063/1.4977487

    Article  CAS  Google Scholar 

  7. W. Ye, C. Chen, S. Dwaraknath, A. Jain, S.P. Ong, and K.A. Persson: Harnessing the Materials Project for machine-learning and accelerated discovery. MRS Bull. 43, 664–669 (2018).

    Article  Google Scholar 

  8. I. Tanaka, K. Rajan, and C. Wolverton: Data-centric science for materials innovation. MRS Bull. 43, 659–663 (2018).

    Article  Google Scholar 

  9. E. Kim, K. Huang, A. Saunders, A. McCallum, G. Ceder, and E. Olivetti: Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater. 29, 9436–9444 (2017).

    Article  CAS  Google Scholar 

  10. M. Krallinger, O. Rabal, A. Lourenço, J. Oyarzabal, and A. Valencia: Information retrieval and text mining technologies for chemistry. Chem. Rev. 117, 7673–7761 (2017).

    Article  CAS  Google Scholar 

  11. U.S. Government: Materials Genome Initiative National Science and Technology Council Committee on Technology Subcommittee on the Materials Genome Initiative; Whitehouse.Gov, June 2014.

    Google Scholar 

  12. A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K.A. Persson: Commentary: the Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).

    Article  CAS  Google Scholar 

  13. J.E. Saal, S. Kirklin, M. Aykol, B. Meredig, and C. Wolverton: Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).

    Article  CAS  Google Scholar 

  14. S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R.H. Taylor, L.J. Nelson, G.L.W. Hart, S. Sanvito, M. Buongiorno-Nardelli, N. Mingo, and O. Levy: AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).

    Article  CAS  Google Scholar 

  15. S.S. Borysov, R.M. Geilhufe, and A.V. Balatsky: Organic materials database: an open-access online database for data mining. PLoS ONE 12, e0171501 (2017).

    Article  CAS  Google Scholar 

  16. G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, and B. Kozinsky: AiiDA: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218–230 (2016).

    Article  Google Scholar 

  17. Y. Zhuo, A.M. Tehrani, A.O. Oliynyk, A.C. Duke, and J. Brgoch: Identifying an efficient, thermally robust inorganic phosphor host via machine learning. Nat. Commun. 9, 4377 (2018).

    Article  CAS  Google Scholar 

  18. P. Hall and N. Gill: An Introduction to Machine Learning Interpretability, 1st ed. (O’Reilly Media, Inc., Sebastopol, California, 2018).

    Google Scholar 

  19. Https://apps.webofknowledge.com/ (Clarivate Analytics, Philadelphia, PA).

  20. D.A. Augusto and H.J.C. Barbosa: Symbolic regression via genetic programming. In Proceedings - Brazilian Symposium on Neural Networks, SBRN, Vol. 2000, Janua; IEEE Computer Society, 2000; pp. 173–178.

    Chapter  Google Scholar 

  21. G.A.F. Seber and A.J. Lee: Linear Regression Analysis (Wiley-Interscience, Hoboken, New Jersey, 2003), pp. 557.

    Book  Google Scholar 

  22. J.R. Koza: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4, 87–112 (1994).

    Article  Google Scholar 

  23. S. Forrest: Genetic algorithms: principles of natural selection applied to computation. Science 261, 872–878 (1993).

    Article  CAS  Google Scholar 

  24. B. Meredig and C. Wolverton: A hybrid computational-experimental approach for automated crystal structure solution. Nat. Mater. 12, 123–127 (2013).

    Article  CAS  Google Scholar 

  25. A.L.-S. Chua, N.A. Benedek, L. Chen, M.W. Finnis, and A.P. Sutton: A genetic algorithm for predicting the structures of interfaces in multicomponent systems. Nat. Mater. 9, 418–422 (2010).

    Article  CAS  Google Scholar 

  26. C.E. Mohn, S. Stølen, and W. Kob: Predicting the structure of alloys using genetic algorithms. Mater. Manuf. Processes 26, 348–353 (2011).

    Article  CAS  Google Scholar 

  27. I. Arnaldo, K. Krawiec, and U.-M. O’Reilly: Multiple regression genetic programming. In Proceedings of the 2014 Conference on Genetic and Evolutionary Computation - GECCO’14; ACM Press, New York, NY, 2014; pp. 879–886.

    Chapter  Google Scholar 

  28. J.A. Moore, R. Ma, A.G. Domel, and W.K. Liu: An efficient multiscale model of damping properties for filled elastomers with complex microstructures. Compos. Part B: Eng. 62, 262–270 (2014).

    Article  CAS  Google Scholar 

  29. M. Castelli, S. Silva, and L. Vanneschi: A C++ framework for geometric semantic genetic programming. Genet. Program. Evol. Mach. 16, 73–81 (2015).

    Article  Google Scholar 

  30. J.F. Miller, D. Job, and V.K. Vassilev: Principles in the evolutionary design of digital circuits part I. Genet. Program. Evol. Mach. 1, 7–35 (2000).

    Article  Google Scholar 

  31. H.I. Rad, J. Feng, and H. Iba: GP-RVM: Genetic Programing-based Symbolic Regression Using Relevance Vector Machine. (2018). arXiv:1806.02502v2

    Google Scholar 

  32. O. Giustolisi and D.A. Savic: Advances in data-driven analyses and modelling using EPR-MOGA. J. Hydroinform. 11, 225 (2009).

    Article  Google Scholar 

  33. T. McConaghy: FFX: Fast, Scalable, Deterministic Symbolic Regression Technology (Springer, New York, NY, 2011) pp. 235–260.

    Google Scholar 

  34. P. Orzechowski, W. La Cava, and J.H. Moore: Where are we now? In Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO’18; ACM Press, New York, NY, 2018; pp. 1183–1190. arXiv:1804.09331

    Chapter  Google Scholar 

  35. I. Icke and J.C. Bongard, Improving genetic programming based symbolic regression using deterministic machine learning. In 2013 IEEE Congress on Evolutionary Computation; IEEE, 2013; pp. 1763–1770.

    Chapter  Google Scholar 

  36. K. Krawiec: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evol. Mach. 3, 329–343 (2002).

    Article  Google Scholar 

  37. Q. Lu, J. Ren, and Z. Wang: Using genetic programming with prior formula knowledge to solve symbolic regression problem. Comput. Intell. Neurosci. 2016, 1 (2016).

    Google Scholar 

  38. L. Li, M. Fan, R. Singh, and P. Riley: Neural-guided symbolic regression with semantic prior. (2019). arXiv preprint: arXiv:1901.07714.

    Google Scholar 

  39. C.A. Tolman: The 16 and 18 electron rule in organometallic chemistry and homogeneous catalysis. Chem. Soc. Rev. 1, 337–353 (1972).

    Article  CAS  Google Scholar 

  40. B.W.H. Van Beest, G.J. Kramer, and R.A. Van Santen: Force fields for silicas and aluminophosphates based on ab initio calculations. Phys. Rev. Lett. 64, 1955 (1990).

    Article  Google Scholar 

  41. T. Yanai, D.P. Tew, and N.C. Handy: A new hybrid exchange–correlation functional using the coulomb-attenuating method (cam-b3lyp). Chem. Phys. Lett. 393, 51–57 (2004).

    Article  CAS  Google Scholar 

  42. M. Schmidt and H. Lipson: Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).

    Article  CAS  Google Scholar 

  43. J. Gout, M. Quade, K. Shafi, R.K. Niven, and M. Abel: Synchronization control of oscillator networks using symbolic regression. Nonlin. Dyn. 91, 1001–1021 (2018).

    Article  Google Scholar 

  44. V. Arkov, C. Evans, P.J. Fleming, D.C. Hill, J.P. Norton, I. Pratt, D. Rees, and K. Rodríguez-Vázquez: System identification strategies applied to aircraft gas turbine engines. Annu. Rev. Control 24, 67–81 (2000).

    Article  Google Scholar 

  45. L. Berardi, O. Giustolisi, Z. Kapelan, and D.A. Savic: Development of pipe deterioration models for water distribution systems using EPR. J. Hydroinform. 10, 113 (2008).

    Article  Google Scholar 

  46. J. Bongard and H. Lipson: Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 104, 9943–9948 (2007).

    Article  CAS  Google Scholar 

  47. W. Cai, A. Pacheco-Vega, M. Sen, and K.T. Yang: Heat transfer correlations by symbolic regression. Int. J. Heat Mass Transf. 49, 4352–4359 (2006).

    Article  Google Scholar 

  48. B. Can and C. Heavey: Comparison of experimental designs for simulation-based symbolic regression of manufacturing systems. Comput. Ind. Eng. 61, 447–462 (2011).

    Article  Google Scholar 

  49. B. McKay, M. Willis, and G. Barton: Steadystate modelling of chemical process systems using genetic programming. Comput. Chem. Eng. 21, 981–996 (1997).

    Article  CAS  Google Scholar 

  50. W. La Cava, K. Danai, and L. Spector: Inference of compact nonlinear dynamic models by epigenetic local search. Eng. Appl. Artif. Intell. 55, 292–306 (2016).

    Article  Google Scholar 

  51. W. La Cava, K. Danai, L. Spector, P. Fleming, A. Wright, and M. Lackner: Automatic identification of wind turbine models using evolutionary multiobjective optimization. Renew. Energy 87, 892–902 (2016).

    Article  Google Scholar 

  52. S.-H. Chen and C.-H. Yeh: Toward a computable approach to the efficient market hypothesis: an application of genetic programming. J. Econ. Dyn. Control 21, 1043–1063 (1997).

    Article  Google Scholar 

  53. G.J. Gray, D.J. Murray-Smith, Y. Li, K.C. Sharman, and T. Weinbrenner: Nonlinear model structure identification using genetic programming. Control Eng. Pract. 6, 1341–1352 (1998).

    Article  Google Scholar 

  54. S.T. Khu, S.Y. Liong, V. Babovic, H. Madsen, and N. Muttil: Genetic programming and its application in real-time runoff forecasting. J. Am. Water Resour. Assoc. 37, 439–451 (2001).

    Article  Google Scholar 

  55. S.-Y. Liong, T.R. Gautam, S.T. Khu, V. Babovic, M. Keijzer, and N. Muttil: Genetic programming: a new paradigm in rainfall runoff modeling. J. Am. Water Resour. Assoc. 38, 705–718 (2002).

    Article  Google Scholar 

  56. M. Quade, M. Abel, K. Shafi, R.K. Niven, and B.R. Noack: Prediction of dynamical systems by symbolic regression. Phys. Rev. E 94, 012214 (2016).

    Article  CAS  Google Scholar 

  57. M.D. Schmidt, R.R. Vallabhajosyula, J.W. Jenkins, J.E. Hood, A.S. Soni, J.P. Wikswo, and H. Lipson: Automated refinement and inference of analytical models for metabolic networks. Phys. Biol. 8, 055011 (2011).

    Article  CAS  Google Scholar 

  58. K. Stanislawska, K. Krawiec, and Z.W. Kundzewicz: Modeling global temperature changes with genetic programming. Comput. Math. Appl. 64, 3717–3728 (2012).

    Article  Google Scholar 

  59. K. Uesaka and M. Kawamata: Synthesis of low-sensitivity second-order digital filters using genetic programming with automatically defined functions. IEEE Signal Process. Lett. 7, 83–85 (2000).

    Article  Google Scholar 

  60. R. Vyas, P. Goel, and S.S. Tambe, Genetic programming applications in chemical sciences and engineering. In Handbook of Genetic Programming Applications; Springer International Publishing, Cham, 2015; pp. 99–140.

    Chapter  Google Scholar 

  61. W.B. Langdon and S.J. Barrett: Genetic programming in data mining for drug discovery. In Evolutionary Computation in Data Mining, Vol. 163; Springer-Verlag, Berlin/Heidelberg, 2005; pp. 211–235.

    Chapter  Google Scholar 

  62. R. Vyas, P. Goel, M. Karthikeyan, S.S. Tambe, and B.D. Kulkarni: Pharmacokinetic modeling of Caco-2 cell permeability using genetic programming (GP) method. Lett. Drug Des. Discov. 11, 1112–1118 (2014).

    Article  CAS  Google Scholar 

  63. P. Barmpalexis, K. Kachrimanis, A. Tsakonas, and E. Georgarakis: Symbolic regression via genetic programming in the optimization of a controlled release pharmaceutical formulation. Chemom. Intell. Lab. Syst. 107, 75–82 (2011).

    Article  CAS  Google Scholar 

  64. C.D. Muzny, M.L. Huber, and A.F. Kazakov: Correlation for the viscosity of normal hydrogen obtained from symbolic regression. J. Chem. Eng. Data 58, 969–979 (2013).

    Article  CAS  Google Scholar 

  65. A.A. Markov, M.V. Patrakeev, V.V. Kharton, Y.V. Pivak, I.A. Leonidov, and V.L. Kozhevnikov: Oxygen nonstoichiometry and ionic conductivity of Sr3Fe2-xScxO7-d. Chem. Mater. 19, 3980–3987 (2007).

    Article  CAS  Google Scholar 

  66. A. Nakamura and J.B. Wagner: Defect Structure, Ionic Conductivity, and Diffusion in Yttria Stabilized Zirconia and Related Oxide Electrolytes with Fluorite Structure, Technical Report.

  67. L. Daza, C.M. Rangel, J. Baranda, M.T. Casais, M.J. Mart´inez, and J.A. Alonso: Modified nickel oxides as cathode materials for MCFC. J. Power Sources 86, 329–333 (2000).

    Article  CAS  Google Scholar 

  68. M. Maslyaev, A. Hvatov, and A. Kalyuzhnaya, Data-driven PDE discovery with evolutionary approach. (2019). arXiv:1903.08011

    Google Scholar 

  69. S. Gaucel, M. Keijzer, E. Lutton, and A. Tonda, Learning dynamical systems using standard symbolic regression. In Genetic Programming, edited by M. Nicolau, K. Krawiec, M. I. Heywood, M. Castelli, P. García-Sánchez, J.J. Merelo, V.M. Rivas Santos, and K. Sim (Springer, Berlin/Heidelberg, 2014) pp. 25–36.

    Google Scholar 

  70. M. Schmidt and H. Lipson: Symbolic regression of implicit equations. Genet. Program. Theory Pract. 7, 73–85 (2009).

    Google Scholar 

  71. U. von Barth and L. Hedin: A local exchange correlation potential for the spin polarized case: I. J. Phys. C: Solid State Phys. 5, 1629 (1972).

    Article  Google Scholar 

  72. The Minerals, Metals & Materials Society: Modeling Across Scales: A Roadmapping Study for Connecting Materials Models and Simulations Across Length and Time Scales, Technical Report (2015), 2015.

    Google Scholar 

  73. A. Yadollahi, N. Shamsaei, S.M. Thompson, and D.W. Seely: Effects of process time interval and heat treatment on the mechanical and microstructural properties of direct laser deposited 316L stainless steel. Mater. Sci. Eng. A 644, 171–183 (2015).

    Article  CAS  Google Scholar 

  74. L. Ward and C. Wolverton: Atomistic calculations and materials informatics: a review. Curr. Opin. Solid State Mater. Sci. 21, 167–176 (2017).

    Article  CAS  Google Scholar 

  75. L.M. Ghiringhelli, J. Vybiral, S.V. Levchenko, C. Draxl, and M. Scheffler: Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015). doi:10.1103/PhysRevLett.114.105503

    Article  CAS  Google Scholar 

  76. L.M. Ghiringhelli, J. Vybiral, E. Ahmetcik, R. Ouyang, S.V. Levchenko, C. Draxl, and M. Scheffler: Learning physical descriptors for materials science by compressed sensing. New J. Phys. 19, 023017 (2017).

    Article  Google Scholar 

  77. R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, and L.M. Ghiringhelli: SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2, 083802 (2018).

    Article  CAS  Google Scholar 

  78. G.N. Vanderplaats: Numerical Optimization Techniques for Engineering Design (Vanderplaats Research and Development, Inc., Novi, MI), 2005.

    Google Scholar 

  79. M. Shimada, H. Kokawa, Z.J. Wang, Y.S. Sato, and I. Karibe: Optimization of grain boundary character distribution for intergranular corrosion resistant 304 stainless steel by twin-induced grain boundary engineering. Acta Mater. 50, 2331–2341 (2002).

    Article  CAS  Google Scholar 

  80. B.F. Decker and D. Harker: Activation energy for recrystallization in rolled copper. JOM 2, 887–890 (1950).

    Article  Google Scholar 

  81. Trevor Stephens: Genetic Programming in Python, with a scikit-learn inspired API: gplearn, 2016.

    Google Scholar 

  82. G. Gou, I. Grinberg, A.M. Rappe, and J.M. Rondinelli: Lattice normal modes and electronic properties of the correlated metal LaNiO3. Phys. Rev. B 84, 144101 (2011).

    Article  CAS  Google Scholar 

  83. H. Yu, J. Young, H. Wu, W. Zhang, J.M. Rondinelli, and P. Shiv Halasyamani: Electronic, crystal chemistry, and nonlinear optical property relationships in the dugganite A3B3CD2O14 family. J. Am. Chem. Soc. 138, 4984–4989 (2016).

    Article  CAS  Google Scholar 

Download references

Acknowledgments

Y.W. acknowledges partial support from the Predictive Science and Engineering Design (PS&ED) program at Northwestern University. All authors acknowledge support from the National Science Foundation (NSF) through the Designing Materials to Revolutionize and Engineer our Future (DMREF) program under award no. DMR-1729303.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James M. Rondinelli.

Additional information

These authors contributed equally to this work

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Wagner, N. & Rondinelli, J.M. Symbolic regression in materials science. MRS Communications 9, 793–805 (2019). https://doi.org/10.1557/mrc.2019.85

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1557/mrc.2019.85

Navigation