Abstract
The optimization of composition and processing to obtain materials that exhibit desirable characteristics has historically relied on a combination of domain knowledge, trial and error, and luck. We propose a methodology that can accelerate this process by fitting data-driven models to experimental data as it is collected to suggest which experiment should be performed next. This methodology can guide the practitioner to test the most promising candidates earlier and can supplement scientific and engineering intuition with data-driven insights. A key strength of the proposed framework is that it scales to high-dimensional parameter spaces, as are typical in materials discovery applications. Importantly, the data-driven models incorporate uncertainty analysis, so that new experiments are proposed based on a combination of exploring high-uncertainty candidates and exploiting high-performing regions of parameter space. Over four materials science test cases, our methodology led to the optimal candidate being found with three times fewer required measurements than random guessing on average.
Similar content being viewed by others
References
Roy R (2010) A primer on the Taguchi method. Soc Manuf Eng, 1–245
Fisher R A (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32
Chaloner K, Verdinelli I (1995) Bayesian experimental design: a review. Stat Sci 10(3):273–304
Chernoff H (1959) Sequential design of experiments. Ann Math Stat 30(3):755–770
Cohn D A, Ghahramani Z, Jordan M I (1996) Active learning with statistical models. J Artif Intell Res 4(1):129–145
Martinez-Cantin R (2014) BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits. J Mach Learn Res 15(1):3735–3739
Shan S, Wang GG (2010) Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions. Struct Multidiscip Optim 41(2):219–241. doi:10.1007/s00158-009-0420-2
Wang Y, Reyes KG, Brown KA, Mirkin CA, Powell WB (2015) Nested-batch-mode learning and stochastic optimization with an application to sequential multistage testing in materials science. SIAM J Sci Comput 37(3):B361–B381. doi:10.1137/140971117. http://epubs.siam.org/doi/10.1137/140971117
Aggarwal R, Demkowicz M, Marzouk YM (2015) Information-driven experimental design in materials science. Inf Sci Mater Discov Des 225:13–44. doi:10.1007/978-3-319-23871-5
Ueno T, Rhone T D, Hou Z, Mizoguchi T, Tsuda K (2016) Combo: an efficient bayesian optimization library for materials science. Mater Discov 4:18–21
Xue D, Xue D, Yuan R, Zhou Y, Balachandran P, Ding X, Sun J, Lookman T (2017) An informatics approach to transformation temperatures of NiTi-based shape memory alloys. Acta Mater 125:532–541
Dehghannasiri R, Xue D, Balachandran PV, Yousefi MR, Dalton LA, Lookman T, Dougherty ER (2017) Optimal experimental design for materials discovery. Comput Mater Sci 129:311–322. doi:10.1016/j.commatsci.2016.11.041
Oliynyk A, Antono E, Sparks T, Ghadbeigi L, Gaultois M, Meredig B, Mar A (2016) High-throughput machine-learning-driven synthesis of full-heusler compounds. Chem Mater 28(20):7324–7331
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Ho T K (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8): 832–844
Efron B (2012) Model selection estimation and bootstrap smoothing. Division of Biostatistics, Stanford University
Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the Jackknife and the infinitesimal Jackknife. J Mach Learn Res 15:1625–1651. doi:10.1016/j.surg.2006.10.010.Use. http://jmlr.org/papers/v15/wager14a.html, arXiv:1311.4555v2
Hutchinson M (2016) Citrine Informatics Lolo. https://github.com/CitrineInformatics/lolo accessed: 2017-03-21
Bocarsly JD, Levin EE, Garcia CA, Schwennicke K, Wilson SD, Seshadri R (2017) A simple computational proxy for screening magnetocaloric compounds. Chem Mater 29(4):1613–1622
Sparks T, Gaultois M, Oliynyk A, Brgoch J, Meredig B (2016) Data mining our way to the next generation of thermoelectrics. Scr Mater 111:10–15
Agrawal A, Deshpande P D, Cecen A, Basavarsu G P, Choudhary A N, Kalidindi S R (2014) Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters. Integr Mater Manuf Innov 3(1):1–19
Ward L, Agrawal A, Choudhary A, Wolverton C (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. arXiv preprint
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
O’Mara J, Meredig B, Michel K (2016) Materials data infrastructure: a case study of the citrination platform to examine data import, storage, and access. JOM 68(8):2031–2034
Acknowledgements
The authors would like to thank S. Wager and T. Covert for their discussions regarding random forest uncertainty estimates. The authors would also like to thank the rest of the Citrine Informatics team. S. Paradiso and M. Hutchinson acknowledge support from Argonne National Laboratories through contract 6F-31341, associated with the R2R Manufacturing Consortium funded by the Department of Energy Advanced Manufacturing Office.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ling, J., Hutchinson, M., Antono, E. et al. High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates. Integr Mater Manuf Innov 6, 207–217 (2017). https://doi.org/10.1007/s40192-017-0098-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40192-017-0098-z