Abstract
Predicting unknown inorganic compounds and their crystal structure is a critical step of high-throughput computational materials design and discovery. One way to achieve efficient compound prediction is to use data mining or machine learning methods. In this chapter we present a few algorithms for data mining compound prediction and their applications to different materials discovery problems. In particular, the patterns or correlations governing phase stability for experimental or computational inorganic compound databases are statistically learned and used to build probabilistic or regression models to identify novel compounds and their crystal structures. The stability of those compound candidates is then assessed using ab initio techniques. Finally, we report a few cases where data mining driven computational predictions were experimentally confirmed through inorganic synthesis.
This is a preview of subscription content, log in via an institution.
References
Kohn W, Sham L (1965) Self-consistent equations including exchange and correlation effects. Phys Rev 140(4A):1131–1138
ABINIT (2004). http://www.abinit.org/. Accessed 1 July 2013
Vienna ab initio simulation package (VASP). http://www.vasp.at/. Accessed 1 July 2013
Quantum Espresso (2012). http://www.quantum-espresso.org/. Accessed 1 July 2013
Hautier G, Jain A, Ong SP (2012) From the computer to the laboratory: materials discovery and design using first-principles calculations. J Mater Sci 47(21):7317–7340
Curtarolo S, Hart GLW, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12(3):191–201
Greeley J, Jaramillo TF, Bonde J, Nørskov JK, Chorkendorff IB (2006) Computational high-throughput screening of electrocatalytic materials for hydrogen evolution. Nat Mater 5(11):909–913
Hautier G, Jain A, Ong SP, Kang B, Moore C, Doe R, Ceder G (2011) Phosphates as lithium-ion battery cathodes: an evaluation based on high-throughput ab initio calculations. Chem Mater 23:3495–3508
Mueller T, Hautier G, Jain A, Ceder G (2011) Evaluation of tavorite-structured cathode materials for lithium-ion batteries using high-throughput computing. Chem Mater 23:3854–3862
Setyawan W, Gaume RM, Lam S, Feigelson RS, Curtarolo S (2011) High-throughput combinatorial database of electronic band structures for inorganic scintillator materials. ACS Comb Sci 13(4):382–390
Castelli IE, Olsen T, Datta S, Landis DD, Dahl S, Thygesen KS, Jacobsen KW (2012) Computational screening of perovskite metal oxides for optimal solar light capture. Energy Environ Sci 5(2):5814
Jain A, Castelli IE, Hautier G, Bailey DH, Jacobsen KW (2013) Performance of genetic algorithms in search for water splitting perovskites. J Mater Sci 48:6519–6534
Wu Y, Lazic P, Hautier G, Persson K, Ceder G (2013) First principles high throughput screening of oxynitrides for water-splitting photocatalysts. Energy Environ Sci 6:157–168
Madsen GKH (2006) Automated search for new thermoelectric materials: the case of LiZnSb. J Am Chem Soc 128(37):12140–12146
Wang S, Wang Z, Setyawan W, Mingo N, Curtarolo S (2011) Assessing the thermoelectric properties of sintered compounds via high-throughput ab-initio calculations. Phys Rev X 1(2):021012
Jain A, Seyed-Reihani SA, Fischer CC, Couling DJ, Ceder G, Green WH (2010) Ab initio screening of metal sorbents for elemental mercury capture in syngas streams. Chem Eng Sci 65(10):3025–3033
Olivares-Amaya R, Amador-Bedolla C, Hachmann J, Atahan-Evrenk S, Sánchez-Carrera RS, Vogt L, Aspuru-Guzik A (2011) Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics. Energy Environ Sci 4:4849–4861
Yang K, Setyawan W, Wang S, Buongiorno Nardelli M, Curtarolo S (2012) A search model for topological insulators with high-throughput robustness descriptors. Nat Mater 11(7):614–619
Materials project. http://www.materialsproject.org. Accessed 1 July 2013
Jain A, Hautier G, Moore CJ, Ping Ong S, Fischer CC, Mueller T, Persson KA, Ceder G (2011) A high-throughput infrastructure for density functional theory calculations. Comp Mater Sci 50:2295–2310
AFLOWLIB: http://www.aflowlib.org. Accessed 1 July 2013
“The Electronic Structure Project”, http://gurka.fysik.uu.se/ESP/. Accessed 1 July 2013
Service RF (2012) Materials scientists look to a data-intensive future. Science 335:1434–1435
Inorganic Crystal Structure Database (ICSD), http://www.fiz-karlsruhe.de/icsd.html, Accessed 1 July 2013
Maddox J (1988) Crystals from first principles. Nature 335:201
O’Keeffe M (2010) Aspects of crystal structure prediction: some successes and some difficulties. Phys. Chem. Chem. Phys. 12:10–15
Woodley SM, Catlow R (2008) Crystal structure prediction from first principles. Nat Mater 7(12):937–946
Callen HB (1985) Thermodynamics and an introduction to thermostatistics. Wiley, New York
Chandler D (1987) Introduction to modern statistical mechanics. Oxford University Press, Oxford
Ceder G, Ven A, Marianetti C, Morgan D (2000) First-principles alloy theory in oxides. Modelling Simul. Mater. Sci. Eng. 8:311–321
Van De Walle A, Ceder G (2000) First-principles computation of the vibrational entropy of ordered and disordered Pd3V. Phys Rev B 61(9):5972–5978
Zhou F, Maxisch T, Ceder G (2006) Configurational electronic entropy and the phase diagram of mixed-valence oxides: the case of Li x FePO4. Phys Rev Lett 97:155704
Chen H, Hautier G, Ceder G (2012) Synthesis, computed stability and crystal structure of a new family of inorganic compounds: carbonophosphates. J Am Chem Soc 134(48):19619–19627
Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier VL, Persson KA, Ceder G (2013) Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comp Mater Sci 68:314–319
Ong SP, Wang L, Kang B, Ceder G (2008) Li-Fe-P-O2 phase diagram from first principles calculations. Chem Mater 20(5):1798–1807
Curtarolo S, Morgan D, Ceder G (2005) Accuracy of methods in predicting the crystal structures of metals: a review of 80 binary alloys. CALPHAD 29(3):163–211
Lany S (2008) Semiconductor thermochemistry in density functional calculations. Phys Rev B 78(24):245207
Hautier G, Ong SP, Jain A, Moore CJ, Ceder G (2012) Accuracy of density functional theory in predicting formation energies of ternary oxides from binary oxides and its implication on phase stability. Phys Rev B 85:155208
Dudarev SL, Savrasov SY, Humphreys CJ, Sutton AP (1998) Electron-energy-loss spectra and the structural stability of nickel oxide: an LSDA+U study. Phys Rev B 57(3):1505–1509
Zhou F, Cococcioni M, Marianetti CA, Morgan D, Ceder G (2004) First-principles prediction of redox potentials in transition-metal compounds with LDA+U. Phys Rev B 70:235121
Jain A, Hautier G, Ong SP, Moore CJ, Fischer CC, Persson KA, Ceder G (2011) Formation enthalpies by mixing GGA and GGA+U calculations. Phys Rev B 84:045115
Stevanović V, Lany S, Zhang X, Zunger A (2012) Correcting density functional theory for accurate predictions of compound enthalpies of formation: fitted elemental-phase reference energies. Phys Rev B 85:115104
Oganov AR, Valle M (2009) How to quantify energy landscapes of solids. J Chem Phys 130(10):104504
Ceder G (1993) A derivation of the Ising model for the computation of phase diagrams. Comp Mater Sci 1(2):144–150
Ducastelle F (1991) Order and phase stability in alloys, volume 3 (cohesion and structure). North Holland, Amsterdam
Sanchez JM, Ducastelle F, Gratias D (1984) Generalized cluster description of multicomponent systems. Physica A 128:334–350
Blum V, Zunger A (2004) Structural complexity in binary bcc ground states: the case of bcc Mo-Ta. Phys Rev B 69(2):20103
Hart GLW (2009) Verifying predictions of the L13 crystal structure in Cd-Pt and Pd-Pt by exhaustive enumeration. Phys Rev B 80(1):014106
Sanati M, Wang L, Zunger A (2003) Adaptive crystal structures: CuAu and NiPt. Phys Rev Lett 90(4):045502
Van Der Ven A, Aydinol MK, Ceder G (1998) First-principles evidence for stage ordering in Li x CoO2. J Electrochem Soc 145(6):2149
Wales DJ, Doye JPK (1997) Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J Phys Chem A 101(28):5111–5116
Wales DJ, Scheraga HA (1999) Global optimization of clusters, crystals, and biomolecules. Science 285(5432):1368–1372
Abraham NL, Probert MIJ (2006) A periodic genetic algorithm with real-space representation for crystal structure and polymorph prediction. Phys Rev B 73(22):224104
Bush TS, Catlow CRA, Battle PD (1995) Evolutionary programming techniques for predicting inorganic crystal structures. J Mater Chem 5(8):1269–1272
Oganov AR, Glass CW (2006) Crystal structure prediction using ab initio evolutionary techniques: principles and applications. J Chem Phys 124(24):244704
Oganov AR, Glass CW (2008) Evolutionary crystal structure prediction as a tool in materials design. J Phys Condens Matter 20(6):064210
Trimarchi G, Zunger A (2007) Global space-group optimization problem: finding the stablest crystal structure without constraints. Phys Rev B 75(10):104113
Zhang X, Zunger A, Trimarchi G (2010) Structure prediction and targeted synthesis: a new Na(n)N2 diazenide crystalline structure. J Chem Phys 133(19):194504
Oganov AR, Chen J, Gatti C, Ma Y, Ma Y, Glass CW, Liu Z, Yu T, Kurakevych OO, Solozhenko VL (2009) Ionic high-pressure form of elemental boron. Nature 457(February):863–868
Kolmogorov A, Shah S, Margine E, Bialon A, Hammerschmidt T, Drautz R (2010) New superconducting and semiconducting Fe-B compounds predicted with an ab initio evolutionary search. Phys Rev Lett 105(21):217003
Ono S, Kikegawa T, Ohishi Y (2007) High-pressure transition of CaCO3. Am Mineral 92(7):1246–1249
Gou H, Dubrovinskaia N, Bykova E, Tsirlin AA, Kasinathan D, Richter A, Merlini M, Hanfland M, Abakumov AM, Batuk D, Van Tendeloo G, Nakajima Y, Kolmogorov AN, Dubrovinsky L (2013) Discovery of a superhard iron tetraboride superconductor. Phys Rev Lett 111:157002
Liebold-Ribeiro Y, Fischer D, Jansen M (2008) Experimental substantiation of the “energy landscape concept” for solids: synthesis of a new modification of LiBr. Angew Chem Int Edit 47(23):4428–4431
Pauling L (1929) The principles determining the structure of complex ionic crystals. J Am Chem Soc 51:1010–1026
Pettifor DG (1990) Structure maps in alloy design. J Chem Soc Faraday Trans 86(8):1209–1213
Pettifor DG (2003) Structure maps revisited. J Phys Condens Matter 15:13–16
Villars P (1983) A three-dimensional structural stability diagram for 998 binary AB intermetallic compounds. J Less Common Met 92(2):215–238
Morgan D, Rodgers J, Ceder G (2003) Automatic construction, implementation and assessment of Pettifor maps. J Phys Condens Matter 15:4361–4369
Ceder G, Morgan D, Fischer C, Tibbetts K, Curtarolo S (2006) Data-mining-driven quantum mechanics for the prediction of structure. MRS Bull 31:981–985
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edn. (Springer Series in Statistics), Springer, chap 4, pp 80–113
von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113(12):1676–1689
Rupp M, Tkatchenko A, Müller KR, von Lilienfeld OA (2012) Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett 108:058301
Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G (2003) Predicting crystal structures with data mining of quantum calculations. Phys Rev Lett 91(13):135503
Kolmogorov AN, Curtarolo S (2006) Prediction of different crystal structure phases in metal borides: a lithium monoboride analog to MgB2. Phys Rev B 73(18):180501
Kolmogorov AN, Curtarolo S (2006) Theoretical study of metal borides stability. Phys Rev B 74(22):224507
Levy O, Chepulskii RV, Hart GLW, Curtarolo S (2009) The new face of rhodium alloys: revealing ordered structures from first principles. J Am Chem Soc 132(2):833–837
Fischer CC, Tibbetts KJ, Morgan D, Ceder G (2006) Predicting crystal structure by merging data mining with quantum mechanics. Nat Mater 5(8):641–646
Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater 22(12):3762–3767
Hundt R, Schön JC, Jansen M (2006) CPMZ-an algorithm for the efficient comparison of periodic structures. J Appl Crystallogr 39:6–16
Morita T (1957) Cluster variation method of cooperative phenomena and its generalization I. J Phys Soc Jpn 12(7):753–755
Fischer CC (2007) A machine learning approach to crystal structure prediction. PhD thesis, Massachusetts Institute of Technology
Eliason SR (1993) Maximum likelihood estimation: logic and practice. Sage Publications, Inc, Newberry Park
Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge
Buntine W (1991) Theory refinement on Bayesian networks. In: Proceedings of the seventh conference on uncertainty in artificial intelligence, Citeseer 91:52–60
Lynch RSJ, Willett PK (2003) Adaptive Bayesian classification using noninformative Dirichlet priors. IEEE Trans Syst Man Cybern 33(3):2812–2815
Ternary oxides predictions. http://ceder.mit.edu/ternaryoxides, accessed: 01 July 2013
Hautier G, Fischer C, Ehrlacher V, Jain A, Ceder G (2011) Data mined ionic substitutions for the discovery of new compounds. Inorg Chem 50:656–663
Johrendt D, Pöttgen R (2008) Pnictide oxides: a new class of high-T C superconductors. Angew Chem Int Edit 47(26):4782–4784
Goldschmidt V (1926) Die gesetze der krystallochemie. Naturwissenschaften 14:477–485
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19:263–312
Berger A, Della Pietra VJ, Della Pietra SA (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–72
Della Pietra SA, Della Pietra VJ, Lafferty J (1997) Inducing features of random fields. IEEE Trans Pattern Anal Mach Intell 19(4):1–13
Parthé E, Gelato L (1984) The standardization of inorganic crystal-structure data. Acta Crystallogr A 40:169–183
Gaudin E, Boucher F, Evain M (2001) Some factors governing Ag+ and Cu+ Low coordination in chalcogenide environments. J Solid State Chem 160(1):212–221
Zhang H, Li N, Li K, Xue D (2007) Structural stability and formability of ABO3-type perovskite compounds. Acta Crystallogr Sec B 63:812–818
Jain A, Hautier G, Moore CJ, Kang B, Lee J, Chen H, Twu N, Ceder G (2012) A computational investigation of Li9M3(P2O7)2(PO4)2 (M=V, Mo) as cathodes for Li ion batteries. J Electrochem Soc 159(5):A622–A633
Ma X, Hautier G, Jain A, Doe R, Ceder G (2013) Improved capacity retention for LiVO2 by Cr substitution. J Electrochem Soc 160(2):A279–A284
International centre for diffraction data. PDF4+ database. http://www.icdd.com/products/pdf4.htm. Accessed 1 July 2013
Chamberland B, Sleight AW, Weiher JF (1970) Preparation and characterization of MgMnO3 and ZnMnO3. J Solid State Chem 1(3–4):512–514
Jansen M, Hoppe R (1974) Neue oxocobaltate (IV):Cs2[CoO3], Rb2[CoO3] und K2[CoO3]. Z Anorg Allg Chem 408:75–82
Matar S, Baraille I, Subramanian M (2009) First principles studies of SnTiO3 perovskite as potential environmentally benign ferroelectric material. Chem Phys 355(1):43–49
Fix T, Sahonta SL, Garcia V, MacManus-Driscoll JL, Blamire MG (2011) Structural and dielectric properties of SnTiO3, a putative ferroelectric. Crystal Growth Des 11:1422–1426
Ellis BL, Lee KT, Nazar LF (2010) Positive electrode materials for Li-ion and Li-batteries. Chem Mater 22(3):691–714
Goodenough JB, Kim Y (2010) Challenges for rechargeable Li batteries. Chem Mater 22(3):587–603
Whittingham MS (2004) Lithium batteries and cathode materials. Chem Rev 104(10):4271–4302
Ceder G, Hautier G, Jain A, Ong SP (2011) Recharging lithium battery research with first-principles methods. MRS Bull 36(3):185–191
Meng YS, Arroyo-de Dompablo ME (2013) Recent Advances in First Principles Computational Research of Cathode Materials for Lithium-Ion Batteries, Acc Chem Res, 46(5):1171–1180
Ceder G, Jain A, Hautier G, Kim JC, Kang B, Daniel R (2013) Mixed phosphate-diphosphate electrode materials and methods of manufacturing same US8399130 B2
Kuang Q, Xu J, Zhao Y, Chen X, Chen L (2011) Layered monodiphosphate Li9V3(P2O7)3(PO4)2: a novel cathode material for lithium-ion batteries. Electrochim Acta 56(5):2201–2205
Chen H, Hautier G, Jain A, Moore C, Kang B, Doe R, Wu L, Zhu Y, Tang Y, Ceder G (2012) Carbonophosphates: a new family of cathode materials for Li-ion batteries identified computationally. Chem Mater 24(11):2009–2016
Jähne C, Neef C, Koo C, Meyer HP, Klingeler R (2013) A new LiCoPO4 polymorph via low temperature synthesis. J Mater Chem A 1(8):2856
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hautier, G. (2013). Data Mining Approaches to High-Throughput Crystal Structure and Compound Prediction. In: Atahan-Evrenk, S., Aspuru-Guzik, A. (eds) Prediction and Calculation of Crystal Structures. Topics in Current Chemistry, vol 345. Springer, Cham. https://doi.org/10.1007/128_2013_486
Download citation
DOI: https://doi.org/10.1007/128_2013_486
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05773-6
Online ISBN: 978-3-319-05774-3
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)