High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates

Abstract

The optimization of composition and processing to obtain materials that exhibit desirable characteristics has historically relied on a combination of domain knowledge, trial and error, and luck. We propose a methodology that can accelerate this process by fitting data-driven models to experimental data as it is collected to suggest which experiment should be performed next. This methodology can guide the practitioner to test the most promising candidates earlier and can supplement scientific and engineering intuition with data-driven insights. A key strength of the proposed framework is that it scales to high-dimensional parameter spaces, as are typical in materials discovery applications. Importantly, the data-driven models incorporate uncertainty analysis, so that new experiments are proposed based on a combination of exploring high-uncertainty candidates and exploiting high-performing regions of parameter space. Over four materials science test cases, our methodology led to the optimal candidate being found with three times fewer required measurements than random guessing on average.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    https://links.citrination.com/magnetocaloric-benchmark.

  2. 2.

    https://links.citrination.com/superconductor-benchmark.

  3. 3.

    https://links.citrination.com/thermoelectric-benchmark.

  4. 4.

    https://links.citrination.com/steel-fatigue-strength-benchmark.

References

  1. 1.

    Roy R (2010) A primer on the Taguchi method. Soc Manuf Eng, 1–245

  2. 2.

    Fisher R A (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32

    Google Scholar 

  3. 3.

    Chaloner K, Verdinelli I (1995) Bayesian experimental design: a review. Stat Sci 10(3):273–304

    Article  Google Scholar 

  4. 4.

    Chernoff H (1959) Sequential design of experiments. Ann Math Stat 30(3):755–770

    Article  Google Scholar 

  5. 5.

    Cohn D A, Ghahramani Z, Jordan M I (1996) Active learning with statistical models. J Artif Intell Res 4(1):129–145

    Google Scholar 

  6. 6.

    Martinez-Cantin R (2014) BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits. J Mach Learn Res 15(1):3735–3739

    Google Scholar 

  7. 7.

    Shan S, Wang GG (2010) Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions. Struct Multidiscip Optim 41(2):219–241. doi:10.1007/s00158-009-0420-2

    Article  Google Scholar 

  8. 8.

    Wang Y, Reyes KG, Brown KA, Mirkin CA, Powell WB (2015) Nested-batch-mode learning and stochastic optimization with an application to sequential multistage testing in materials science. SIAM J Sci Comput 37(3):B361–B381. doi:10.1137/140971117. http://epubs.siam.org/doi/10.1137/140971117

    Article  Google Scholar 

  9. 9.

    Aggarwal R, Demkowicz M, Marzouk YM (2015) Information-driven experimental design in materials science. Inf Sci Mater Discov Des 225:13–44. doi:10.1007/978-3-319-23871-5

    Google Scholar 

  10. 10.

    Ueno T, Rhone T D, Hou Z, Mizoguchi T, Tsuda K (2016) Combo: an efficient bayesian optimization library for materials science. Mater Discov 4:18–21

    Article  Google Scholar 

  11. 11.

    Xue D, Xue D, Yuan R, Zhou Y, Balachandran P, Ding X, Sun J, Lookman T (2017) An informatics approach to transformation temperatures of NiTi-based shape memory alloys. Acta Mater 125:532–541

    Article  Google Scholar 

  12. 12.

    Dehghannasiri R, Xue D, Balachandran PV, Yousefi MR, Dalton LA, Lookman T, Dougherty ER (2017) Optimal experimental design for materials discovery. Comput Mater Sci 129:311–322. doi:10.1016/j.commatsci.2016.11.041

    Article  Google Scholar 

  13. 13.

    Oliynyk A, Antono E, Sparks T, Ghadbeigi L, Gaultois M, Meredig B, Mar A (2016) High-throughput machine-learning-driven synthesis of full-heusler compounds. Chem Mater 28(20):7324–7331

    Article  Google Scholar 

  14. 14.

    Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  15. 15.

    Ho T K (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8): 832–844

    Article  Google Scholar 

  16. 16.

    Efron B (2012) Model selection estimation and bootstrap smoothing. Division of Biostatistics, Stanford University

  17. 17.

    Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the Jackknife and the infinitesimal Jackknife. J Mach Learn Res 15:1625–1651. doi:10.1016/j.surg.2006.10.010.Use. http://jmlr.org/papers/v15/wager14a.html, arXiv:1311.4555v2

    Google Scholar 

  18. 18.

    Hutchinson M (2016) Citrine Informatics Lolo. https://github.com/CitrineInformatics/lolo accessed: 2017-03-21

  19. 19.

    Bocarsly JD, Levin EE, Garcia CA, Schwennicke K, Wilson SD, Seshadri R (2017) A simple computational proxy for screening magnetocaloric compounds. Chem Mater 29(4):1613–1622

    Article  Google Scholar 

  20. 20.

    Sparks T, Gaultois M, Oliynyk A, Brgoch J, Meredig B (2016) Data mining our way to the next generation of thermoelectrics. Scr Mater 111:10–15

    Article  Google Scholar 

  21. 21.

    Agrawal A, Deshpande P D, Cecen A, Basavarsu G P, Choudhary A N, Kalidindi S R (2014) Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters. Integr Mater Manuf Innov 3(1):1–19

    Article  Google Scholar 

  22. 22.

    Ward L, Agrawal A, Choudhary A, Wolverton C (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. arXiv preprint

  23. 23.

    van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

  24. 24.

    O’Mara J, Meredig B, Michel K (2016) Materials data infrastructure: a case study of the citrination platform to examine data import, storage, and access. JOM 68(8):2031–2034

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank S. Wager and T. Covert for their discussions regarding random forest uncertainty estimates. The authors would also like to thank the rest of the Citrine Informatics team. S. Paradiso and M. Hutchinson acknowledge support from Argonne National Laboratories through contract 6F-31341, associated with the R2R Manufacturing Consortium funded by the Department of Energy Advanced Manufacturing Office.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Julia Ling.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ling, J., Hutchinson, M., Antono, E. et al. High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates. Integr Mater Manuf Innov 6, 207–217 (2017). https://doi.org/10.1007/s40192-017-0098-z

Download citation

Keywords

  • Machine learning
  • Experimental design
  • Sequential design
  • Active learning
  • Uncertainty quantification