Many of the machine learning-based approaches for materials property predictions use low-cost computational data. The motivation for machine learning models is based on the orders of magnitude speedup compared to DFT calculations or experimental characterization. High-quality experimental materials data would be ideal for training these models; unfortunately, experimental data are typically costly to obtain. As a result, experimental databases are often smaller and less cohesive. Using band gap, we demonstrate how an ensemble learning approach allows us to efficiently model experimental data by combining models trained on otherwise disparate computational and experimental data. This approach demonstrates how disparate data sources can be incorporated into the modeling of sparsely represented experimental data. In the case of band gap prediction, we reduce the root mean squared error by over 9%.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G et al (2013) Commentary: the materials project: a materials genome approach to accelerating materials innovation. Apl Mater 1:011002
Jain A, Persson KA, Ceder G (2016) Research update: the materials genome initiative: data sharing and the impact of collaborative ab initio databases. APL Mater 4:053102
Seshadri R, Sparks TD (2016) Perspective: interactive material property databases through aggregation of literature data. APL Mater 4:053206
Jain A, Hautier G, Ong SP, Persson K (2016) New opportunities for materials informatics: resources and data mining techniques for uncovering hidden relationships. J Mater Res 31:977–994
Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater 22:3762–3767
Chen W, Pöhls J-H, Hautier G, Broberg D, Bajaj S, Aydemir U, Gibbs ZM, Zhu H, Asta M, Snyder GJ et al (2016) Understanding thermoelectric properties from high-throughput calculations: trends, insights, and comparisons with experiment. J Mater Chem C 4:4414–4426
Hautier G, Jain A, Ong SP (2012) From the computer to the laboratory: materials discovery and design using first-principles calculations. J Mater Sci 47:7317–7340
Curtarolo S, Hart GLW, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12:191
Isaacs EB, Wolverton C (2018) Inverse band structure design via materials database screening: application to square planar thermoelectrics. Chem Mater 30:1540–1546
Meredig B, Agrawal A, Kirklin S, Saal JE, Doak JW, Thompson A, Zhang K, Choudhary A, Wolverton C (2014) Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys Rev B 89:094104
Schütt KT, Glawe H, Brockherde F, Sanna A, Müller KR, Gross EKU (2014) How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys Rev B 89:205118
Citrine (2018). https://citrination.com. Accessed 20 Dec 2019
Curtarolo S et al (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comp Mater Sci 58:218–226
Kauwe SK, Graser J, Vazquez A, Sparks TD (2018) Machine learning prediction of heat capacity for solid inorganics. Integr Mater Manuf Innov 7:1–9
Ward L, Agrawal A, Choudhary A, Wolverton C (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. NPJ Comput Mater 2:16028
Kauwe SK (2019) “kaaiian/ensemble_band_gap_prediction: Supplementary code for” Extracting knowledge from DFT: experimental band gap predictions through ensemble learning. https://doi.org/10.5281/zenodo.2656669
Oliynyk AO, Mar A (2017) Discovery of intermetallic compounds from traditional to machine-learning approaches. Acc Chem Res 51:59–68
Oliynyk AO, Adutwum LA, Rudyk BW, Pisavadia H, Lotfi S, Hlukhyy V, Harynuk JJ, Mar A, Brgoch J (2017) Disentangling structural confusion through machine learning: structure prediction and polymorphism of equiatomic ternary phases ABC. J Am Chem Soc 139:17870–17881
Carrete J, Li W, Mingo N, Wang S, Curtarolo S (2014) Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Phys Rev X 4:011019
Mansouri Tehrani A, Oliynyk AO, Parry M, Rizvi Z, Couper S, Lin F, Miyagi L, Sparks TD, Brgoch J (2018) Machine learning directed search for ultraincompressible, superhard materials. J Am Chem Soc 140:9844–9853
Graser J, Kauwe SK, Sparks TD (2018) Machine learning and energy minimization approaches for crystal structure predictions: a review and new horizons. Chem Mater 30:3601–3612
Huo H, Rupp M (2017) Unified representation of molecules and crystals for machine learning. arXiv:1704.06439
Isayev O, Oses C, Toher C, Gossett E, Curtarolo S, Tropsha A (2017) Universal fragment descriptors for predicting properties of inorganic crystals. Nat Commun 8:15679
Faber F, Lindmaa A, Lilienfeld OA, Armiento R (2015) Crystal structure representations for machine learning models of formation energies. Int J Quantum Chem 115:1094–1101
Faber FA, Christensen AS, Huang B, Lilienfeld OA (2018) Alchemical and structural distribution based representation for universal quantum machine learning. J Chem Phys 148:241717
Jha D, Ward L, Paul A, Liao W-K, Choudhary A, Wolverton C, Agrawal A (2018) Elemnet: deep learning the chemistry of materials from only elemental composition. Sci Rep 8:17593
Lee J, Seko A, Shitara K, Nakayama K, Tanaka I (2016) Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques. Phys Rev B 93:115104
Seko A, Maekawa T, Tsuda K, Tanaka I (2014) Machine learning with systematic density-functional theory calculations: application to melting temperatures of single-and binary-component solids. Phys Rev B 89:054303
Hutchinson ML, Antono E, Gibbons BM, Paradiso S, Ling J, Meredig B (2017) Overcoming data scarcity with transfer learning. arXiv:1711.05099
Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA (2015) Big data meets quantum chemistry approximations: the Δ-machine learning approach. J Chem Theory Comput 11:2087–2096
Zaspel P, Huang B, Harbrecht H, Lilienfeld OA (2018) Boosting quantum machine learning models with multi-level combination technique: pople diagrams revisited. J Chem Theory Comput 15(3):1546–1559
Pilania G, Gubernatis JE, Lookman T (2017) Multi-fidelity machine learning models for accurate bandgap predictions of solids. Comput Mater Sci 129:156–163
Zhuo Y, Mansouri Tehrani A, Brgoch J (2018) Predicting the band gaps of inorganic solids by machine learning. J Phys Chem Lett 9:1668–1673
DeCost BL, Francis T, Holm EA (2017) Exploring the microstructure manifold: image texture representations applied to ultrahigh carbon steel microstructures. Acta Mater 133:30–40
Kauwe SK (2019) Ensemble band gap data. https://figshare.com/articles/Ensemble_Band_Gap_Data/8295503. Accessed 1 Feb 2020
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
We would like to thank the National Science Foundation for their support of this research under NSF CAREER Award 1651668. We would also like to thank the Brgoch group at the University of Houston for inspiring this research and for readily supplying data in a way which adheres to FAIR Data Principles.
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
About this article
Cite this article
Kauwe, S.K., Welker, T. & Sparks, T.D. Extracting Knowledge from DFT: Experimental Band Gap Predictions Through Ensemble Learning. Integr Mater Manuf Innov (2020). https://doi.org/10.1007/s40192-020-00178-0
- Machine learning
- Band gap
- Transfer learning
- Ensemble learning