Skip to main content

Data Mining Approaches to High-Throughput Crystal Structure and Compound Prediction

  • Chapter
  • First Online:

Part of the book series: Topics in Current Chemistry ((TOPCURRCHEM,volume 345))

Abstract

Predicting unknown inorganic compounds and their crystal structure is a critical step of high-throughput computational materials design and discovery. One way to achieve efficient compound prediction is to use data mining or machine learning methods. In this chapter we present a few algorithms for data mining compound prediction and their applications to different materials discovery problems. In particular, the patterns or correlations governing phase stability for experimental or computational inorganic compound databases are statistically learned and used to build probabilistic or regression models to identify novel compounds and their crystal structures. The stability of those compound candidates is then assessed using ab initio techniques. Finally, we report a few cases where data mining driven computational predictions were experimentally confirmed through inorganic synthesis.

This is a preview of subscription content, log in via an institution.

References

  1. Kohn W, Sham L (1965) Self-consistent equations including exchange and correlation effects. Phys Rev 140(4A):1131–1138

    Article  Google Scholar 

  2. ABINIT (2004). http://www.abinit.org/. Accessed 1 July 2013

  3. Vienna ab initio simulation package (VASP). http://www.vasp.at/. Accessed 1 July 2013

  4. Quantum Espresso (2012). http://www.quantum-espresso.org/. Accessed 1 July 2013

  5. Hautier G, Jain A, Ong SP (2012) From the computer to the laboratory: materials discovery and design using first-principles calculations. J Mater Sci 47(21):7317–7340

    Article  CAS  Google Scholar 

  6. Curtarolo S, Hart GLW, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12(3):191–201

    Article  CAS  Google Scholar 

  7. Greeley J, Jaramillo TF, Bonde J, Nørskov JK, Chorkendorff IB (2006) Computational high-throughput screening of electrocatalytic materials for hydrogen evolution. Nat Mater 5(11):909–913

    Article  CAS  Google Scholar 

  8. Hautier G, Jain A, Ong SP, Kang B, Moore C, Doe R, Ceder G (2011) Phosphates as lithium-ion battery cathodes: an evaluation based on high-throughput ab initio calculations. Chem Mater 23:3495–3508

    Article  CAS  Google Scholar 

  9. Mueller T, Hautier G, Jain A, Ceder G (2011) Evaluation of tavorite-structured cathode materials for lithium-ion batteries using high-throughput computing. Chem Mater 23:3854–3862

    Article  CAS  Google Scholar 

  10. Setyawan W, Gaume RM, Lam S, Feigelson RS, Curtarolo S (2011) High-throughput combinatorial database of electronic band structures for inorganic scintillator materials. ACS Comb Sci 13(4):382–390

    Article  CAS  Google Scholar 

  11. Castelli IE, Olsen T, Datta S, Landis DD, Dahl S, Thygesen KS, Jacobsen KW (2012) Computational screening of perovskite metal oxides for optimal solar light capture. Energy Environ Sci 5(2):5814

    Article  CAS  Google Scholar 

  12. Jain A, Castelli IE, Hautier G, Bailey DH, Jacobsen KW (2013) Performance of genetic algorithms in search for water splitting perovskites. J Mater Sci 48:6519–6534

    Article  CAS  Google Scholar 

  13. Wu Y, Lazic P, Hautier G, Persson K, Ceder G (2013) First principles high throughput screening of oxynitrides for water-splitting photocatalysts. Energy Environ Sci 6:157–168

    Article  CAS  Google Scholar 

  14. Madsen GKH (2006) Automated search for new thermoelectric materials: the case of LiZnSb. J Am Chem Soc 128(37):12140–12146

    Article  CAS  Google Scholar 

  15. Wang S, Wang Z, Setyawan W, Mingo N, Curtarolo S (2011) Assessing the thermoelectric properties of sintered compounds via high-throughput ab-initio calculations. Phys Rev X 1(2):021012

    Google Scholar 

  16. Jain A, Seyed-Reihani SA, Fischer CC, Couling DJ, Ceder G, Green WH (2010) Ab initio screening of metal sorbents for elemental mercury capture in syngas streams. Chem Eng Sci 65(10):3025–3033

    Article  CAS  Google Scholar 

  17. Olivares-Amaya R, Amador-Bedolla C, Hachmann J, Atahan-Evrenk S, Sánchez-Carrera RS, Vogt L, Aspuru-Guzik A (2011) Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics. Energy Environ Sci 4:4849–4861

    Article  CAS  Google Scholar 

  18. Yang K, Setyawan W, Wang S, Buongiorno Nardelli M, Curtarolo S (2012) A search model for topological insulators with high-throughput robustness descriptors. Nat Mater 11(7):614–619

    Article  CAS  Google Scholar 

  19. Materials project. http://www.materialsproject.org. Accessed 1 July 2013

  20. Jain A, Hautier G, Moore CJ, Ping Ong S, Fischer CC, Mueller T, Persson KA, Ceder G (2011) A high-throughput infrastructure for density functional theory calculations. Comp Mater Sci 50:2295–2310

    Article  CAS  Google Scholar 

  21. AFLOWLIB: http://www.aflowlib.org. Accessed 1 July 2013

  22. “The Electronic Structure Project”, http://gurka.fysik.uu.se/ESP/. Accessed 1 July 2013

  23. Service RF (2012) Materials scientists look to a data-intensive future. Science 335:1434–1435

    Article  Google Scholar 

  24. Inorganic Crystal Structure Database (ICSD), http://www.fiz-karlsruhe.de/icsd.html, Accessed 1 July 2013

  25. Maddox J (1988) Crystals from first principles. Nature 335:201

    Article  Google Scholar 

  26. O’Keeffe M (2010) Aspects of crystal structure prediction: some successes and some difficulties. Phys. Chem. Chem. Phys. 12:10–15

    Google Scholar 

  27. Woodley SM, Catlow R (2008) Crystal structure prediction from first principles. Nat Mater 7(12):937–946

    Article  CAS  Google Scholar 

  28. Callen HB (1985) Thermodynamics and an introduction to thermostatistics. Wiley, New York

    Google Scholar 

  29. Chandler D (1987) Introduction to modern statistical mechanics. Oxford University Press, Oxford

    Google Scholar 

  30. Ceder G, Ven A, Marianetti C, Morgan D (2000) First-principles alloy theory in oxides. Modelling Simul. Mater. Sci. Eng. 8:311–321

    Article  CAS  Google Scholar 

  31. Van De Walle A, Ceder G (2000) First-principles computation of the vibrational entropy of ordered and disordered Pd3V. Phys Rev B 61(9):5972–5978

    Article  Google Scholar 

  32. Zhou F, Maxisch T, Ceder G (2006) Configurational electronic entropy and the phase diagram of mixed-valence oxides: the case of Li x FePO4. Phys Rev Lett 97:155704

    Article  Google Scholar 

  33. Chen H, Hautier G, Ceder G (2012) Synthesis, computed stability and crystal structure of a new family of inorganic compounds: carbonophosphates. J Am Chem Soc 134(48):19619–19627

    Article  CAS  Google Scholar 

  34. Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier VL, Persson KA, Ceder G (2013) Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comp Mater Sci 68:314–319

    Article  CAS  Google Scholar 

  35. Ong SP, Wang L, Kang B, Ceder G (2008) Li-Fe-P-O2 phase diagram from first principles calculations. Chem Mater 20(5):1798–1807

    Article  CAS  Google Scholar 

  36. Curtarolo S, Morgan D, Ceder G (2005) Accuracy of methods in predicting the crystal structures of metals: a review of 80 binary alloys. CALPHAD 29(3):163–211

    Article  CAS  Google Scholar 

  37. Lany S (2008) Semiconductor thermochemistry in density functional calculations. Phys Rev B 78(24):245207

    Article  Google Scholar 

  38. Hautier G, Ong SP, Jain A, Moore CJ, Ceder G (2012) Accuracy of density functional theory in predicting formation energies of ternary oxides from binary oxides and its implication on phase stability. Phys Rev B 85:155208

    Article  Google Scholar 

  39. Dudarev SL, Savrasov SY, Humphreys CJ, Sutton AP (1998) Electron-energy-loss spectra and the structural stability of nickel oxide: an LSDA+U study. Phys Rev B 57(3):1505–1509

    Article  CAS  Google Scholar 

  40. Zhou F, Cococcioni M, Marianetti CA, Morgan D, Ceder G (2004) First-principles prediction of redox potentials in transition-metal compounds with LDA+U. Phys Rev B 70:235121

    Article  Google Scholar 

  41. Jain A, Hautier G, Ong SP, Moore CJ, Fischer CC, Persson KA, Ceder G (2011) Formation enthalpies by mixing GGA and GGA+U calculations. Phys Rev B 84:045115

    Article  Google Scholar 

  42. Stevanović V, Lany S, Zhang X, Zunger A (2012) Correcting density functional theory for accurate predictions of compound enthalpies of formation: fitted elemental-phase reference energies. Phys Rev B 85:115104

    Article  Google Scholar 

  43. Oganov AR, Valle M (2009) How to quantify energy landscapes of solids. J Chem Phys 130(10):104504

    Article  Google Scholar 

  44. Ceder G (1993) A derivation of the Ising model for the computation of phase diagrams. Comp Mater Sci 1(2):144–150

    Article  CAS  Google Scholar 

  45. Ducastelle F (1991) Order and phase stability in alloys, volume 3 (cohesion and structure). North Holland, Amsterdam

    Google Scholar 

  46. Sanchez JM, Ducastelle F, Gratias D (1984) Generalized cluster description of multicomponent systems. Physica A 128:334–350

    Article  Google Scholar 

  47. Blum V, Zunger A (2004) Structural complexity in binary bcc ground states: the case of bcc Mo-Ta. Phys Rev B 69(2):20103

    Article  Google Scholar 

  48. Hart GLW (2009) Verifying predictions of the L13 crystal structure in Cd-Pt and Pd-Pt by exhaustive enumeration. Phys Rev B 80(1):014106

    Article  Google Scholar 

  49. Sanati M, Wang L, Zunger A (2003) Adaptive crystal structures: CuAu and NiPt. Phys Rev Lett 90(4):045502

    Article  CAS  Google Scholar 

  50. Van Der Ven A, Aydinol MK, Ceder G (1998) First-principles evidence for stage ordering in Li x CoO2. J Electrochem Soc 145(6):2149

    Article  Google Scholar 

  51. Wales DJ, Doye JPK (1997) Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J Phys Chem A 101(28):5111–5116

    Article  CAS  Google Scholar 

  52. Wales DJ, Scheraga HA (1999) Global optimization of clusters, crystals, and biomolecules. Science 285(5432):1368–1372

    Article  CAS  Google Scholar 

  53. Abraham NL, Probert MIJ (2006) A periodic genetic algorithm with real-space representation for crystal structure and polymorph prediction. Phys Rev B 73(22):224104

    Article  Google Scholar 

  54. Bush TS, Catlow CRA, Battle PD (1995) Evolutionary programming techniques for predicting inorganic crystal structures. J Mater Chem 5(8):1269–1272

    Article  CAS  Google Scholar 

  55. Oganov AR, Glass CW (2006) Crystal structure prediction using ab initio evolutionary techniques: principles and applications. J Chem Phys 124(24):244704

    Article  Google Scholar 

  56. Oganov AR, Glass CW (2008) Evolutionary crystal structure prediction as a tool in materials design. J Phys Condens Matter 20(6):064210

    Article  Google Scholar 

  57. Trimarchi G, Zunger A (2007) Global space-group optimization problem: finding the stablest crystal structure without constraints. Phys Rev B 75(10):104113

    Article  Google Scholar 

  58. Zhang X, Zunger A, Trimarchi G (2010) Structure prediction and targeted synthesis: a new Na(n)N2 diazenide crystalline structure. J Chem Phys 133(19):194504

    Article  Google Scholar 

  59. Oganov AR, Chen J, Gatti C, Ma Y, Ma Y, Glass CW, Liu Z, Yu T, Kurakevych OO, Solozhenko VL (2009) Ionic high-pressure form of elemental boron. Nature 457(February):863–868

    Article  CAS  Google Scholar 

  60. Kolmogorov A, Shah S, Margine E, Bialon A, Hammerschmidt T, Drautz R (2010) New superconducting and semiconducting Fe-B compounds predicted with an ab initio evolutionary search. Phys Rev Lett 105(21):217003

    Article  CAS  Google Scholar 

  61. Ono S, Kikegawa T, Ohishi Y (2007) High-pressure transition of CaCO3. Am Mineral 92(7):1246–1249

    Article  CAS  Google Scholar 

  62. Gou H, Dubrovinskaia N, Bykova E, Tsirlin AA, Kasinathan D, Richter A, Merlini M, Hanfland M, Abakumov AM, Batuk D, Van Tendeloo G, Nakajima Y, Kolmogorov AN, Dubrovinsky L (2013) Discovery of a superhard iron tetraboride superconductor. Phys Rev Lett 111:157002

    Google Scholar 

  63. Liebold-Ribeiro Y, Fischer D, Jansen M (2008) Experimental substantiation of the “energy landscape concept” for solids: synthesis of a new modification of LiBr. Angew Chem Int Edit 47(23):4428–4431

    Article  CAS  Google Scholar 

  64. Pauling L (1929) The principles determining the structure of complex ionic crystals. J Am Chem Soc 51:1010–1026

    Article  CAS  Google Scholar 

  65. Pettifor DG (1990) Structure maps in alloy design. J Chem Soc Faraday Trans 86(8):1209–1213

    Article  Google Scholar 

  66. Pettifor DG (2003) Structure maps revisited. J Phys Condens Matter 15:13–16

    Article  Google Scholar 

  67. Villars P (1983) A three-dimensional structural stability diagram for 998 binary AB intermetallic compounds. J Less Common Met 92(2):215–238

    Article  CAS  Google Scholar 

  68. Morgan D, Rodgers J, Ceder G (2003) Automatic construction, implementation and assessment of Pettifor maps. J Phys Condens Matter 15:4361–4369

    Article  CAS  Google Scholar 

  69. Ceder G, Morgan D, Fischer C, Tibbetts K, Curtarolo S (2006) Data-mining-driven quantum mechanics for the prediction of structure. MRS Bull 31:981–985

    Article  CAS  Google Scholar 

  70. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edn. (Springer Series in Statistics), Springer, chap 4, pp 80–113

    Google Scholar 

  71. von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113(12):1676–1689

    Article  Google Scholar 

  72. Rupp M, Tkatchenko A, Müller KR, von Lilienfeld OA (2012) Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett 108:058301

    Article  Google Scholar 

  73. Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G (2003) Predicting crystal structures with data mining of quantum calculations. Phys Rev Lett 91(13):135503

    Article  Google Scholar 

  74. Kolmogorov AN, Curtarolo S (2006) Prediction of different crystal structure phases in metal borides: a lithium monoboride analog to MgB2. Phys Rev B 73(18):180501

    Article  Google Scholar 

  75. Kolmogorov AN, Curtarolo S (2006) Theoretical study of metal borides stability. Phys Rev B 74(22):224507

    Article  Google Scholar 

  76. Levy O, Chepulskii RV, Hart GLW, Curtarolo S (2009) The new face of rhodium alloys: revealing ordered structures from first principles. J Am Chem Soc 132(2):833–837

    Article  Google Scholar 

  77. Fischer CC, Tibbetts KJ, Morgan D, Ceder G (2006) Predicting crystal structure by merging data mining with quantum mechanics. Nat Mater 5(8):641–646

    Article  CAS  Google Scholar 

  78. Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater 22(12):3762–3767

    Article  CAS  Google Scholar 

  79. Hundt R, Schön JC, Jansen M (2006) CPMZ-an algorithm for the efficient comparison of periodic structures. J Appl Crystallogr 39:6–16

    Article  CAS  Google Scholar 

  80. Morita T (1957) Cluster variation method of cooperative phenomena and its generalization I. J Phys Soc Jpn 12(7):753–755

    Article  Google Scholar 

  81. Fischer CC (2007) A machine learning approach to crystal structure prediction. PhD thesis, Massachusetts Institute of Technology

    Google Scholar 

  82. Eliason SR (1993) Maximum likelihood estimation: logic and practice. Sage Publications, Inc, Newberry Park

    Google Scholar 

  83. Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge

    Book  Google Scholar 

  84. Buntine W (1991) Theory refinement on Bayesian networks. In: Proceedings of the seventh conference on uncertainty in artificial intelligence, Citeseer 91:52–60

    Google Scholar 

  85. Lynch RSJ, Willett PK (2003) Adaptive Bayesian classification using noninformative Dirichlet priors. IEEE Trans Syst Man Cybern 33(3):2812–2815

    Article  Google Scholar 

  86. Ternary oxides predictions. http://ceder.mit.edu/ternaryoxides, accessed: 01 July 2013

  87. Hautier G, Fischer C, Ehrlacher V, Jain A, Ceder G (2011) Data mined ionic substitutions for the discovery of new compounds. Inorg Chem 50:656–663

    Article  CAS  Google Scholar 

  88. Johrendt D, Pöttgen R (2008) Pnictide oxides: a new class of high-T C superconductors. Angew Chem Int Edit 47(26):4782–4784

    Article  CAS  Google Scholar 

  89. Goldschmidt V (1926) Die gesetze der krystallochemie. Naturwissenschaften 14:477–485

    Article  CAS  Google Scholar 

  90. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19:263–312

    Google Scholar 

  91. Berger A, Della Pietra VJ, Della Pietra SA (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–72

    Google Scholar 

  92. Della Pietra SA, Della Pietra VJ, Lafferty J (1997) Inducing features of random fields. IEEE Trans Pattern Anal Mach Intell 19(4):1–13

    Google Scholar 

  93. Parthé E, Gelato L (1984) The standardization of inorganic crystal-structure data. Acta Crystallogr A 40:169–183

    Article  Google Scholar 

  94. Gaudin E, Boucher F, Evain M (2001) Some factors governing Ag+ and Cu+ Low coordination in chalcogenide environments. J Solid State Chem 160(1):212–221

    Article  CAS  Google Scholar 

  95. Zhang H, Li N, Li K, Xue D (2007) Structural stability and formability of ABO3-type perovskite compounds. Acta Crystallogr Sec B 63:812–818

    Article  CAS  Google Scholar 

  96. Jain A, Hautier G, Moore CJ, Kang B, Lee J, Chen H, Twu N, Ceder G (2012) A computational investigation of Li9M3(P2O7)2(PO4)2 (M=V, Mo) as cathodes for Li ion batteries. J Electrochem Soc 159(5):A622–A633

    Article  CAS  Google Scholar 

  97. Ma X, Hautier G, Jain A, Doe R, Ceder G (2013) Improved capacity retention for LiVO2 by Cr substitution. J Electrochem Soc 160(2):A279–A284

    Article  CAS  Google Scholar 

  98. International centre for diffraction data. PDF4+ database. http://www.icdd.com/products/pdf4.htm. Accessed 1 July 2013

  99. Chamberland B, Sleight AW, Weiher JF (1970) Preparation and characterization of MgMnO3 and ZnMnO3. J Solid State Chem 1(3–4):512–514

    Article  Google Scholar 

  100. Jansen M, Hoppe R (1974) Neue oxocobaltate (IV):Cs2[CoO3], Rb2[CoO3] und K2[CoO3]. Z Anorg Allg Chem 408:75–82

    Article  CAS  Google Scholar 

  101. Matar S, Baraille I, Subramanian M (2009) First principles studies of SnTiO3 perovskite as potential environmentally benign ferroelectric material. Chem Phys 355(1):43–49

    Article  CAS  Google Scholar 

  102. Fix T, Sahonta SL, Garcia V, MacManus-Driscoll JL, Blamire MG (2011) Structural and dielectric properties of SnTiO3, a putative ferroelectric. Crystal Growth Des 11:1422–1426

    Google Scholar 

  103. Ellis BL, Lee KT, Nazar LF (2010) Positive electrode materials for Li-ion and Li-batteries. Chem Mater 22(3):691–714

    Article  CAS  Google Scholar 

  104. Goodenough JB, Kim Y (2010) Challenges for rechargeable Li batteries. Chem Mater 22(3):587–603

    Article  CAS  Google Scholar 

  105. Whittingham MS (2004) Lithium batteries and cathode materials. Chem Rev 104(10):4271–4302

    Article  CAS  Google Scholar 

  106. Ceder G, Hautier G, Jain A, Ong SP (2011) Recharging lithium battery research with first-principles methods. MRS Bull 36(3):185–191

    CAS  Google Scholar 

  107. Meng YS, Arroyo-de Dompablo ME (2013) Recent Advances in First Principles Computational Research of Cathode Materials for Lithium-Ion Batteries, Acc Chem Res, 46(5):1171–1180

    Google Scholar 

  108. Ceder G, Jain A, Hautier G, Kim JC, Kang B, Daniel R (2013) Mixed phosphate-diphosphate electrode materials and methods of manufacturing same US8399130 B2

    Google Scholar 

  109. Kuang Q, Xu J, Zhao Y, Chen X, Chen L (2011) Layered monodiphosphate Li9V3(P2O7)3(PO4)2: a novel cathode material for lithium-ion batteries. Electrochim Acta 56(5):2201–2205

    Article  CAS  Google Scholar 

  110. Chen H, Hautier G, Jain A, Moore C, Kang B, Doe R, Wu L, Zhu Y, Tang Y, Ceder G (2012) Carbonophosphates: a new family of cathode materials for Li-ion batteries identified computationally. Chem Mater 24(11):2009–2016

    Article  CAS  Google Scholar 

  111. Jähne C, Neef C, Koo C, Meyer HP, Klingeler R (2013) A new LiCoPO4 polymorph via low temperature synthesis. J Mater Chem A 1(8):2856

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffroy Hautier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hautier, G. (2013). Data Mining Approaches to High-Throughput Crystal Structure and Compound Prediction. In: Atahan-Evrenk, S., Aspuru-Guzik, A. (eds) Prediction and Calculation of Crystal Structures. Topics in Current Chemistry, vol 345. Springer, Cham. https://doi.org/10.1007/128_2013_486

Download citation

Publish with us

Policies and ethics