Extracting Knowledge from DFT: Experimental Band Gap Predictions Through Ensemble Learning


Many of the machine learning-based approaches for materials property predictions use low-cost computational data. The motivation for machine learning models is based on the orders of magnitude speedup compared to DFT calculations or experimental characterization. High-quality experimental materials data would be ideal for training these models; unfortunately, experimental data are typically costly to obtain. As a result, experimental databases are often smaller and less cohesive. Using band gap, we demonstrate how an ensemble learning approach allows us to efficiently model experimental data by combining models trained on otherwise disparate computational and experimental data. This approach demonstrates how disparate data sources can be incorporated into the modeling of sparsely represented experimental data. In the case of band gap prediction, we reduce the root mean squared error by over 9%.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G et al (2013) Commentary: the materials project: a materials genome approach to accelerating materials innovation. Apl Mater 1:011002

    Article  Google Scholar 

  2. 2.

    Jain A, Persson KA, Ceder G (2016) Research update: the materials genome initiative: data sharing and the impact of collaborative ab initio databases. APL Mater 4:053102

    Article  Google Scholar 

  3. 3.

    Seshadri R, Sparks TD (2016) Perspective: interactive material property databases through aggregation of literature data. APL Mater 4:053206

    Article  Google Scholar 

  4. 4.

    Jain A, Hautier G, Ong SP, Persson K (2016) New opportunities for materials informatics: resources and data mining techniques for uncovering hidden relationships. J Mater Res 31:977–994

    CAS  Article  Google Scholar 

  5. 5.

    Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater 22:3762–3767

    CAS  Article  Google Scholar 

  6. 6.

    Chen W, Pöhls J-H, Hautier G, Broberg D, Bajaj S, Aydemir U, Gibbs ZM, Zhu H, Asta M, Snyder GJ et al (2016) Understanding thermoelectric properties from high-throughput calculations: trends, insights, and comparisons with experiment. J Mater Chem C 4:4414–4426

    CAS  Article  Google Scholar 

  7. 7.

    Hautier G, Jain A, Ong SP (2012) From the computer to the laboratory: materials discovery and design using first-principles calculations. J Mater Sci 47:7317–7340

    CAS  Article  Google Scholar 

  8. 8.

    Curtarolo S, Hart GLW, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12:191

    CAS  Article  Google Scholar 

  9. 9.

    Isaacs EB, Wolverton C (2018) Inverse band structure design via materials database screening: application to square planar thermoelectrics. Chem Mater 30:1540–1546

    CAS  Article  Google Scholar 

  10. 10.

    Meredig B, Agrawal A, Kirklin S, Saal JE, Doak JW, Thompson A, Zhang K, Choudhary A, Wolverton C (2014) Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys Rev B 89:094104

    Article  Google Scholar 

  11. 11.

    Schütt KT, Glawe H, Brockherde F, Sanna A, Müller KR, Gross EKU (2014) How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys Rev B 89:205118

    Article  Google Scholar 

  12. 12.

    Citrine (2018). https://citrination.com. Accessed 20 Dec 2019

  13. 13.

    Curtarolo S et al (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comp Mater Sci 58:218–226

    CAS  Article  Google Scholar 

  14. 14.

    Kauwe SK, Graser J, Vazquez A, Sparks TD (2018) Machine learning prediction of heat capacity for solid inorganics. Integr Mater Manuf Innov 7:1–9

    Article  Google Scholar 

  15. 15.

    Ward L, Agrawal A, Choudhary A, Wolverton C (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. NPJ Comput Mater 2:16028

    Article  Google Scholar 

  16. 16.

    Kauwe SK (2019) “kaaiian/ensemble_band_gap_prediction: Supplementary code for” Extracting knowledge from DFT: experimental band gap predictions through ensemble learning. https://doi.org/10.5281/zenodo.2656669

  17. 17.

    Oliynyk AO, Mar A (2017) Discovery of intermetallic compounds from traditional to machine-learning approaches. Acc Chem Res 51:59–68

    Article  Google Scholar 

  18. 18.

    Oliynyk AO, Adutwum LA, Rudyk BW, Pisavadia H, Lotfi S, Hlukhyy V, Harynuk JJ, Mar A, Brgoch J (2017) Disentangling structural confusion through machine learning: structure prediction and polymorphism of equiatomic ternary phases ABC. J Am Chem Soc 139:17870–17881

    CAS  Article  Google Scholar 

  19. 19.

    Carrete J, Li W, Mingo N, Wang S, Curtarolo S (2014) Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Phys Rev X 4:011019

    Google Scholar 

  20. 20.

    Mansouri Tehrani A, Oliynyk AO, Parry M, Rizvi Z, Couper S, Lin F, Miyagi L, Sparks TD, Brgoch J (2018) Machine learning directed search for ultraincompressible, superhard materials. J Am Chem Soc 140:9844–9853

    CAS  Article  Google Scholar 

  21. 21.

    Graser J, Kauwe SK, Sparks TD (2018) Machine learning and energy minimization approaches for crystal structure predictions: a review and new horizons. Chem Mater 30:3601–3612

    CAS  Article  Google Scholar 

  22. 22.

    Huo H, Rupp M (2017) Unified representation of molecules and crystals for machine learning. arXiv:1704.06439

  23. 23.

    Isayev O, Oses C, Toher C, Gossett E, Curtarolo S, Tropsha A (2017) Universal fragment descriptors for predicting properties of inorganic crystals. Nat Commun 8:15679

    CAS  Article  Google Scholar 

  24. 24.

    Faber F, Lindmaa A, Lilienfeld OA, Armiento R (2015) Crystal structure representations for machine learning models of formation energies. Int J Quantum Chem 115:1094–1101

    CAS  Article  Google Scholar 

  25. 25.

    Faber FA, Christensen AS, Huang B, Lilienfeld OA (2018) Alchemical and structural distribution based representation for universal quantum machine learning. J Chem Phys 148:241717

    Article  Google Scholar 

  26. 26.

    Jha D, Ward L, Paul A, Liao W-K, Choudhary A, Wolverton C, Agrawal A (2018) Elemnet: deep learning the chemistry of materials from only elemental composition. Sci Rep 8:17593

    Article  Google Scholar 

  27. 27.

    Lee J, Seko A, Shitara K, Nakayama K, Tanaka I (2016) Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques. Phys Rev B 93:115104

    Article  Google Scholar 

  28. 28.

    Seko A, Maekawa T, Tsuda K, Tanaka I (2014) Machine learning with systematic density-functional theory calculations: application to melting temperatures of single-and binary-component solids. Phys Rev B 89:054303

    Article  Google Scholar 

  29. 29.

    Hutchinson ML, Antono E, Gibbons BM, Paradiso S, Ling J, Meredig B (2017) Overcoming data scarcity with transfer learning. arXiv:1711.05099

  30. 30.

    Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA (2015) Big data meets quantum chemistry approximations: the Δ-machine learning approach. J Chem Theory Comput 11:2087–2096

    CAS  Article  Google Scholar 

  31. 31.

    Zaspel P, Huang B, Harbrecht H, Lilienfeld OA (2018) Boosting quantum machine learning models with multi-level combination technique: pople diagrams revisited. J Chem Theory Comput 15(3):1546–1559

    Article  Google Scholar 

  32. 32.

    Pilania G, Gubernatis JE, Lookman T (2017) Multi-fidelity machine learning models for accurate bandgap predictions of solids. Comput Mater Sci 129:156–163

    CAS  Article  Google Scholar 

  33. 33.

    Zhuo Y, Mansouri Tehrani A, Brgoch J (2018) Predicting the band gaps of inorganic solids by machine learning. J Phys Chem Lett 9:1668–1673

    CAS  Article  Google Scholar 

  34. 34.

    DeCost BL, Francis T, Holm EA (2017) Exploring the microstructure manifold: image texture representations applied to ultrahigh carbon steel microstructures. Acta Mater 133:30–40

    CAS  Article  Google Scholar 

  35. 35.

    Kauwe SK (2019) Ensemble band gap data. https://figshare.com/articles/Ensemble_Band_Gap_Data/8295503. Accessed 1 Feb 2020

  36. 36.

    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

Download references


We would like to thank the National Science Foundation for their support of this research under NSF CAREER Award 1651668. We would also like to thank the Brgoch group at the University of Houston for inspiring this research and for readily supplying data in a way which adheres to FAIR Data Principles.

Author information



Corresponding author

Correspondence to Taylor D. Sparks.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kauwe, S.K., Welker, T. & Sparks, T.D. Extracting Knowledge from DFT: Experimental Band Gap Predictions Through Ensemble Learning. Integr Mater Manuf Innov (2020). https://doi.org/10.1007/s40192-020-00178-0

Download citation


  • Machine learning
  • Band gap
  • Transfer learning
  • Ensemble learning