Skip to main content

Growing field of materials informatics: databases and artificial intelligence


The paradigm of molecular discovery in the chemical and pharmaceutical industry has followed a repetitive succession of screening and synthesis, involving the analysis of individual molecules that were both natural and produced. This ability to generate and screen libraries of compounds has found an echo in solid-state physics with the demand to explore and produce new materials for testing. In response to this demand, a golden age of materials discovery is being developed, with progress on important areas of both basic science and device applications. The confluence of theoretical and simulation methods, together with the availability of computation resources, has established the “materials genome” approach that is used by a growing number of research groups around the world with the goal of innovating on materials through systematic discovery. In this Prospective, an overview of this group of methodologies in tackling the ever-increasing complexity of computational materials science simulations is provided. Computational simulation is highlighted as a major component of rational design and synthesis of new materials with targeted properties, describing progress on databases and large data treatment. Tools for new materials discovery, including progress on the deployment of new data repositories, the implementation of high-throughput simulation approaches, and the development of artificial intelligence algorithms, are discussed.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4


  1. 1.

    CERN Data Centre passes the 200-petabyte milestone.

  2. 2.

    D. Xue, P.V. Balachandran, J. Hogden, J. Theiler, D. Xue, and T. Lookman: Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016).

    CAS  Google Scholar 

  3. 3.

    D. Xue, P.V. Balachandran, R. Yuan, T. Hu, X. Qian, E.R. Dougherty, and T. Lookman: Accelerated search for BaTiO3-based piezoelectrics with vertical morphotropic phase boundary using Bayesian learning. Proc. Natl. Acad. Sci. 113, 13301 (2016).

    CAS  Google Scholar 

  4. 4.

    A. Belsky, M. Hellenbrandt, V.L. Karen, and P. Luksch: New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. B 58, 364 (2002).

    Google Scholar 

  5. 5.

    S. Grazulis, D. Chateigner, R.T. Downs, A.F.T. Yokochi, M. Quiros, L. Lutterotti, E. Manakova, J. Butkus, P. Moeck, and A. Le Bail: Crystallography Open Database — an open-access collection of crystal structures. J. Appl. Crystallogr. 42, 726 (2009).

    CAS  Google Scholar 

  6. 6.

    A. Le Bail: Inorganic structure prediction with GRINSP. J. Appl. Crystallogr. 38, 389 (2005).

    Google Scholar 

  7. 7.

    Materials Genome Initiative for Global Competitiveness, white paper, Group on Advanced Materials, June 2011.

  8. 8.

    U. Fayyad, G. PiatetskyShapiro, and P. Smyth: From data mining to knowledge discovery in databases. AI Mag. 17, 37 (1996).

    Google Scholar 

  9. 9.

    K.J. Kuhn et al.: The ultimate CMOS device and beyond. In Electron Devices Meeting (IEDM), 2012 IEEE International (IEEE, 2012).

    Google Scholar 

  10. 10.

    G. Crabtree, E. Kocs, and L. Trahey: The energy-storage frontier: Lithium-ion batteries and beyond. MRS Bull. 40, 1067–1078 (2015).

    CAS  Google Scholar 

  11. 11.

    M. Aroyo, J. Perez-Mato, D. Orobengoa, E. Tasci, G. De La Flor, and A. Kirov: Crystallography online: Bilbao crystallographic server. Chem. Commun. 43, 183 (2011), cited By 165.

    CAS  Google Scholar 

  12. 12.

    J.E. Saal, S. Kirklin, M. Aykol, B. Meredig, and C. Wolverton: Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM 65, 1501 (2013).

    CAS  Google Scholar 

  13. 13.

    S. Kirklin, J.E. Saal, B. Meredig, A. Thompson, J.W. Doak, M. Aykol, S. Ruehl, and C. Wolverton: The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. NPJ Comput. Mater. 1, 1501 (2015).

    Google Scholar 

  14. 14.

    P. Villars, N. Onodera, and S. Iwata: The Linus Pauling file (LPF) and its application to materials design. J. Alloys. Compd. 279, 1 (1998).

    CAS  Google Scholar 

  15. 15.

  16. 16.

    A. van de Walle, C. Nataraj, and Z.-K. Liu: The thermodynamic database. Calphad 61, 173 (2018).

    Google Scholar 

  17. 17.


  18. 18.

    B.G. Sumpter, R.K. Vasudevan, T. Potok, and S.V. Kalinin: A bridge for accelerating materials by design. NPJ Comput. Mater. 1 (2015).

  19. 19.

    A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K.A. Persson: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).

    Google Scholar 

  20. 20.

    F. Legrain, J. Carrete, A. van Roekeghem, S. Curtarolo, and N. Mingo: How chemical composition alone can predict vibrational free energies and entropies of solids. Chem. Mater. 29, 6220 (2017).

    CAS  Google Scholar 

  21. 21.

    N. Mounet, M. Gibertini, P. Schwaller, D. Campi, A. Merkys, A. Marrazzo, T. Sohier, I.E. Castelli, A. Cepellotti, G. Pizzi, and N. Marzari: Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds. Nat. Nanotechnol. 13, 246 (2018).

    CAS  Google Scholar 

  22. 22.

    C.J. Court and J.M. Cole: Auto-generated aterials database of Curie and Neél temperatures via semisupervised relationship extraction. Sci. Data 5, 180111 (2018).

    CAS  Google Scholar 

  23. 23.

    F.A. Rasmussen and K.S. Thygesen: Computational 2D materials database: Electronic structure of transition-metal dichalcogenides and oxides. J. Phys. Chem. C 119, 13169 (2015).

    CAS  Google Scholar 

  24. 24.

    V.O. Özçelik, J.G. Azadani, C. Yang, S.J. Koester, and T. Low: Band alignment of two-dimensional semiconductors for designing heterostructures with momentum space matching. Phys. Rev. B 94, 035125 (2016).

    Google Scholar 

  25. 25.

    Computational Materials Repository.

  26. 26.

    S. Haastrup, M. Strange, M. Pandey, T. Deilmann, P.S. Schmidt, N.F. Hinsche, M.N. Gjerding, D. Torelli, P.M. Larsen, A.C. Riis-Jensen, J. Gath, K.W. Jacobsen, J.J. Mortensen, T. Olsen, and K.S. Thygesen: The Computational 2D Materials Database: high-throughput modeling and discovery of atomically thin crystals. 2D Mater. 5, 042002 (2018).

    CAS  Google Scholar 

  27. 27.

    R. Ramakrishnan, P.O. Dral, M. Rupp, and O.A. von Lilienfeld: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014)

    CAS  Google Scholar 

  28. 28.

    T.D. Huan, A. Mannodi-Kanakkithodi, C. Kim, V. Sharma, G. Pilania, and R. Ramprasad: A polymer dataset for accelerated property prediction and design. Sci. Data 3, 160012 (2016)

    CAS  Google Scholar 

  29. 29.

    I. Petousis, D. Mrdjenovich, E. Ballouz, M. Liu, D. Winston, W. Chen, T. Graf, T.D. Schladt, K.A. Persson, and F.B. Prinz: High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials. Sci. Data 4, 160134 (2017).

    CAS  Google Scholar 

  30. 30.

    M. de Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C.K. Ande, S. van der Zwaag, J.J. Plata, C. Toher, S. Curtarolo, G. Ceder, K.A. Persson, and M. Asta: Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2, 150009 (2015).

    Google Scholar 

  31. 31.

    C. Draxl and M. Scheffler: NOMAD: The FAIR concept for big data-driven materials science. MRS Bull. 43, 676–682 (2018).

    Google Scholar 

  32. 32.


  33. 33.


  34. 34.

    The Materials Data Facility (MDF):

  35. 35.

    NIST: NIST Materials Resource Registry.

  36. 36.

    The AI platform for materials development.

  37. 37.

    NIST: NIST Materials Data Curation System.

  38. 38.

    S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R.H. Taylor, L.J. Nelson, G.L. Hart, S. Sanvito, M. Buongiorno-Nardelli, N. Mingo, and O. Levy: AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227 (2012).

    CAS  Google Scholar 

  39. 39.

    A. Jain, S.P. Ong, W. Chen, B. Medasani, X. Qu, M. Kocher, M. Brafman, G. Petretto, G.-M. Rignanese, G. Hautier, D. Gunter, and K.A. Persson: FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr. Comput. Pract. Exper. 27, 5037 (2015), cPE-14-0307.R2.

    Google Scholar 

  40. 40.

    I. Takeuchi, R.B.V. Dover, and H. Koinuma: Combinatorial synthesis and evaluation of functional inorganic materials using thin-film techniques. MRS Bull. 27, 301–308 (2002).

    CAS  Google Scholar 

  41. 41.

    S. Curtarolo, W. Setyawan, G.L. Hart, M. Jahnatek, R.V. Chepulskii, R.H. Taylor, S. Wang, J. Xue, K. Yang, O. Levy, M.J. Mehl, H.T. Stokes, D.O. Demchenko, and D. Morgan: AFLOW: An automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218 (2012).

    CAS  Google Scholar 

  42. 42.

    W. Setyawan, R.M. Gaume, S. Lam, R.S. Feigelson, and S. Curtarolo: High-throughput combinatorial database of electronic band structures for inorganic scintillator materials. ACS Comb. Sci. 13, 382 (2011).

    CAS  Google Scholar 

  43. 43.

    K. Kuhar, M. Pandey, K.S. Thygesen, and K.W. Jacobsen: High-throughput computational assessment of previously synthesized semiconductors for photovoltaic and photoelectrochemical devices. ACS Energy Lett. 3, 436 (2018).

    CAS  Google Scholar 

  44. 44.

    J.B. Varley, A. Miglio, V.-A. Ha, M.J. van Setten, G.-M. Rignanese, and G. Hautier: High-throughput design of non-oxide p-type transparent conducting materials: Data mining, search strategy, and identification of boron phosphide. Chem. Mater. 29, 2568 (2017).

    CAS  Google Scholar 

  45. 45.

    G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, and B. Kozinsky: AiiDA: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218 (2016).

    Google Scholar 

  46. 46.

    A. Singh, K. Mathew, A.V. Davydov, R.G. Hennig, and F. Tavazza: High throughput screening of substrates for synthesis and functionalization of 2D materials (2015)

    Google Scholar 

  47. 47.

    R. Yuan, Z. Liu, P.V. Balachandran, D. Xue, Y. Zhou, X. Ding, J. Sun, D. Xue, and T. Lookman: Accelerated discovery of large electrostrains in BaTiO3-based piezoelectrics using active learning. Adv. Mater. 30, 1702884 (2018).

    Google Scholar 

  48. 48.

    C.B. Cooper, E.J. Beard, I. Vazquez-Mayagoitia, L. Stan, G.B.G. Stenning, D.W. Nye, J.A. Vigil, T. Tomar, J. Jia, G.B. Bodedla, S. Chen, L. Gallego, S. Franco, A. Carella, K.R.J. Thomas, S. Xue, X. Zhu, and J.M. Cole: Design-to-device approach affords panchromatic co-sensitized solar cells. Adv. Energy Mater. 9, 1802820 (2019).

    Google Scholar 

  49. 49.

    K. Mathew, A.K. Singh, J.J. Gabriel, K. Choudhary, S.B. Sinnott, A.V. Davydov, F. Tavazza, and R.G. Hennig: MPInterfaces: A Materials Project based Python tool for high-throughput computational screening of interfacial systems. Comput. Mater. Sci. 122, 183 (2016).

    CAS  Google Scholar 

  50. 50.

    L. Ward, A. Dunn, A. Faghaninia, N.E. Zimmermann, S. Bajaj, Q. Wang, J. Montoya, J. Chen, K. Bystrom, M. Dylla, K. Chard, M. Asta, K.A. Persson, G.J. Snyder, I. Foster, and A. Jain: Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60 (2018).

    Google Scholar 

  51. 51.

    D. Broberg, B. Medasani, N.E. Zimmermann, G. Yu, A. Canning, M. Haranczyk, M. Asta, and G. Hautier: PyCDT: A Python toolkit for modeling point defects in semiconductors and insulators. Comput. Phys. Commun. 226, 165 (2018).

    CAS  Google Scholar 

  52. 52.

    G. van Rossum: Scripting the Web with Python. World Wide Web J. 2, 97 (1997).

    Google Scholar 

  53. 53.

    J.E. Gubernatis and T. Lookman: Machine learning in materials design and discovery: Examples from the present and suggestions for the future. Phys. Rev. Mater. 2, 120301 (2018).

    CAS  Google Scholar 

  54. 54.

    M. Rupp, A. Tkatchenko, K.-R. Müller, and O.A. von Lilienfeld: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).

    Google Scholar 

  55. 55.

    G. Pilania, C. Wang, X. Jiang, S. Rajasekaran, and R. Ramprasad: Accelerating materials property predictions using machine learning. Sci. Rep. 3 (2013).

  56. 56.

    T.D. Huan, A. Mannodi-Kanakkithodi, and R. Ramprasad: Accelerated materials property predictions and design using motif-based fingerprints. Phys. Rev. B 92, 014106 (2015).

    Google Scholar 

  57. 57.

    G. Pilania, A. Mannodi-Kanakkithodi, B.P. Uberuaga, R. Ramprasad, J.E. Gubernatis, and T. Lookman: Machine learning bandgaps of double perovskites. Sci. Rep. 6 (2016).

  58. 58.

    A. Mannodi-Kanakkithodi, G.M. Treich, T.D. Huan, R. Ma, M. Tefferi, Y. Cao, G.A. Sotzing, and R. Ramprasad: Rational co-design of polymer dielectrics for energy storage. Adv. Mater. 28, 6277 (2016).

    CAS  Google Scholar 

  59. 59.

    F. Rosenblatt: The perception — a probabilistic model for information-storage and organization in the brain. Psychol. Rev. 65, 386 (1958).

    CAS  Google Scholar 

  60. 60.

    C. Cortes and V. Vapnik: Support-vector networks. Mach. Learn. 20, 273 (1995).

    Google Scholar 

  61. 61.

    G. De’ath and K. Fabricius: Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 81, 3178 (2000).

    Google Scholar 

  62. 62.

    H. Rao and A. Mukherjee: Artificial neural networks for predicting the macromechanical behaviour of ceramic-matrix composites. Comput. Mater. Sci. 5, 307 (1996).

    CAS  Google Scholar 

  63. 63.

    Y. Reich and N. Travitzky: Machine learning of material behaviour knowledge from empirical data. Mater. Des. 16, 251 (1995).

    Google Scholar 

  64. 64.

    L. Chonghe, G. Jin, Q. Pei, C. Ruiliang, and C. Nianyi: Some regularities of melting points of AB-type intermetallic compounds. J. Phys. Chem. Solids 57, 1797 (1996).

    CAS  Google Scholar 

  65. 65.

    A.O. Oliynyk and A. Mar: Discovery of intermetallic compounds from traditional to machine-learning approaches. Acc. Chem. Res. 51, 59 (2018).

    CAS  Google Scholar 

  66. 66.

    J. Carrete, N. Mingo, S. Wang, and S. Curtarolo: Nanograined half-heusler semiconductors as advanced thermoelectrics: An ab initio high-throughput statistical study. Adv. Funct. Mater. 24, 7427 (2014).

    CAS  Google Scholar 

  67. 67.

    J. Carrete, W. Li, N. Mingo, S. Wang, and S. Curtarolo: Finding unprecedentedly low-thermal-conductivity half-heusler semiconductors via high-throughput materials modeling. Phys. Rev. X 4, 011019 (2014).

    Google Scholar 

  68. 68.

    F.A. Faber, A. Lindmaa, O.A. von Lilienfeld, and R. Armiento: Machine learning energies of 2 million elpasolite (ABC2D6) crystals. Phys. Rev. Lett. 117, 135502 (2016).

    Google Scholar 

  69. 69.

    D. Jha, L. Ward, A. Paul, W.-K. Liao, A. Choudhary, C. Wolverton, and A. Agrawal: ElemNet: deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8, 17593 (2018).

    Google Scholar 

  70. 70.

    X.D. Xiang, X. Sun, G. Briceño, Y. Lou, K.-A. Wang, H. Chang, W.G. Wallace-Freedman, S.-W. Chen, and P.G. Schultz: A combinatorial approach to materials discovery. Science 268, 1738 (1995).

    CAS  Google Scholar 

  71. 71.

    R.W. Armstrong, A.P. Combs, P.A. Tempest, S.D. Brown, and T.A. Keating: Multiple-component condensation strategies for combinatorial library synthesis. Acc. Chem. Res. 29, 123 (1996).

    CAS  Google Scholar 

  72. 72.

    S.V. Dudiy and A. Zunger: Searching for alloy configurations with target physical properties: Impurity design via a genetic algorithm inverse band structure approach. Phys. Rev. Lett. 97, 046401 (2006).

    CAS  Google Scholar 

  73. 73.

    A. Mannodi-Kanakkithodi, G. Pilania, T.D. Huan, T. Lookman, and R. Ramprasad: Machine learning strategy for accelerated design of polymer dielectrics. Sci. Rep. 6, 20952 (2016).

    Google Scholar 

  74. 74.

    A. Ravindran, K.M. Ragsdell, and G.V. Reklaitis, Engineering Optimization: Method and Applications (John Wiley & Sons, Hoboken, NJ, 2006).

    Google Scholar 

  75. 75.

    R. Martonák, A. Laio, and M. Parrinello: Predicting crystal structures: The Parrinello-Rahman method revisited. Phys. Rev. Lett. 90, 075503 (2003).

    Google Scholar 

  76. 76.

    J. Pannetier, J. Bassas-Alsina, J. Rodriguez-Carvajal, and V. Caignaert: Prediction of crystal-structures from crystal-chemistry rules by simulated annealing. Nature 346, 343 (1990).

    CAS  Google Scholar 

  77. 77.

    Y. Wang, J. Lv, L. Zhu, and Y. Ma: Crystal structure prediction via particle-swarm optimization. Phys. Rev. B 82, 094116 (2010).

    Google Scholar 

  78. 78.

    Y. Wang, J. Lv, L. Zhu, and Y. Ma: CALYPSO: A method for crystal structure prediction. Comput. Phys. Commun. 183, 2063 (2012).

    CAS  Google Scholar 

  79. 79.

    S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi: Optimization by simulated annealing. Science 220, 671 (1983).

    CAS  Google Scholar 

  80. 80.

    D.J. Wales and J.P.K. Doye: Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J. Phys. Chem. A 101, 5111 (1997).

    CAS  Google Scholar 

  81. 81.

    C.W. Glass, A.R. Oganov, and N. Hansen: USPEX—Evolutionary crystal structure prediction. Comput. Phys. Commun. 175, 713 (2006).

    CAS  Google Scholar 

  82. 82.

    Y. Li, J. Hao, H. Liu, Y. Li, and Y. Ma: The metallization and superconductivity of dense hydrogen sulfide. J. Chem. Phys. 140, 174712 (2014).

    Google Scholar 

  83. 83.

    D.V. Semenok, A.G. Kvashnin, I.A. Kruglov, and A.R. Oganov: Actinium hydrides AcH10, AcH12, AcH16 as high-temperature conventional superconductors. J. Phys. Chem. Lett. 9, 1920 (2018).

    CAS  Google Scholar 

  84. 84.

    T.K. Patra, V. Meenakshisundaram, J.-H. Hung, and D.S. Simmons: Neural-network-biased genetic algorithms for materials design: evolutionary algorithms that learn. ACS Comb. Sci. 19, 96 (2017).

    CAS  Google Scholar 

  85. 85.

    A.S. Botana, H. Zheng, S.H. Lapidus, J.F. Mitchell, and M.R. Norman: Averievite: A copper oxide kagome antiferromagnet. Phys. Rev. B 98, 054421 (2018).

    CAS  Google Scholar 

Download references


Los Alamos National Laboratory is managed by Triad National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy under Contract No. 892332 8CNA000001. Work for this review was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20190636ER (XX28). Work at Argonne is supported by Department of Energy, Office of Science, Basic Energy Sciences Division of Materials Science under Contract No. DE-AC02-06CH11357.

Author information



Corresponding author

Correspondence to Alejandro Lopez-Bezanilla.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lopez-Bezanilla, A., Littlewood, P.B. Growing field of materials informatics: databases and artificial intelligence. MRS Communications 10, 1–10 (2020).

Download citation