High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates

  • Julia Ling
  • Maxwell Hutchinson
  • Erin Antono
  • Sean Paradiso
  • Bryce Meredig
Technical Article


The optimization of composition and processing to obtain materials that exhibit desirable characteristics has historically relied on a combination of domain knowledge, trial and error, and luck. We propose a methodology that can accelerate this process by fitting data-driven models to experimental data as it is collected to suggest which experiment should be performed next. This methodology can guide the practitioner to test the most promising candidates earlier and can supplement scientific and engineering intuition with data-driven insights. A key strength of the proposed framework is that it scales to high-dimensional parameter spaces, as are typical in materials discovery applications. Importantly, the data-driven models incorporate uncertainty analysis, so that new experiments are proposed based on a combination of exploring high-uncertainty candidates and exploiting high-performing regions of parameter space. Over four materials science test cases, our methodology led to the optimal candidate being found with three times fewer required measurements than random guessing on average.


Machine learning Experimental design Sequential design Active learning Uncertainty quantification 


  1. 1.
    Roy R (2010) A primer on the Taguchi method. Soc Manuf Eng, 1–245Google Scholar
  2. 2.
    Fisher R A (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32Google Scholar
  3. 3.
    Chaloner K, Verdinelli I (1995) Bayesian experimental design: a review. Stat Sci 10(3):273–304CrossRefGoogle Scholar
  4. 4.
    Chernoff H (1959) Sequential design of experiments. Ann Math Stat 30(3):755–770CrossRefGoogle Scholar
  5. 5.
    Cohn D A, Ghahramani Z, Jordan M I (1996) Active learning with statistical models. J Artif Intell Res 4(1):129–145Google Scholar
  6. 6.
    Martinez-Cantin R (2014) BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits. J Mach Learn Res 15(1):3735–3739Google Scholar
  7. 7.
    Shan S, Wang GG (2010) Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions. Struct Multidiscip Optim 41(2):219–241. doi:10.1007/s00158-009-0420-2 CrossRefGoogle Scholar
  8. 8.
    Wang Y, Reyes KG, Brown KA, Mirkin CA, Powell WB (2015) Nested-batch-mode learning and stochastic optimization with an application to sequential multistage testing in materials science. SIAM J Sci Comput 37(3):B361–B381. doi:10.1137/140971117. http://epubs.siam.org/doi/10.1137/140971117 CrossRefGoogle Scholar
  9. 9.
    Aggarwal R, Demkowicz M, Marzouk YM (2015) Information-driven experimental design in materials science. Inf Sci Mater Discov Des 225:13–44. doi:10.1007/978-3-319-23871-5 Google Scholar
  10. 10.
    Ueno T, Rhone T D, Hou Z, Mizoguchi T, Tsuda K (2016) Combo: an efficient bayesian optimization library for materials science. Mater Discov 4:18–21CrossRefGoogle Scholar
  11. 11.
    Xue D, Xue D, Yuan R, Zhou Y, Balachandran P, Ding X, Sun J, Lookman T (2017) An informatics approach to transformation temperatures of NiTi-based shape memory alloys. Acta Mater 125:532–541CrossRefGoogle Scholar
  12. 12.
    Dehghannasiri R, Xue D, Balachandran PV, Yousefi MR, Dalton LA, Lookman T, Dougherty ER (2017) Optimal experimental design for materials discovery. Comput Mater Sci 129:311–322. doi:10.1016/j.commatsci.2016.11.041 CrossRefGoogle Scholar
  13. 13.
    Oliynyk A, Antono E, Sparks T, Ghadbeigi L, Gaultois M, Meredig B, Mar A (2016) High-throughput machine-learning-driven synthesis of full-heusler compounds. Chem Mater 28(20):7324–7331CrossRefGoogle Scholar
  14. 14.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  15. 15.
    Ho T K (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8): 832–844CrossRefGoogle Scholar
  16. 16.
    Efron B (2012) Model selection estimation and bootstrap smoothing. Division of Biostatistics, Stanford UniversityGoogle Scholar
  17. 17.
    Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the Jackknife and the infinitesimal Jackknife. J Mach Learn Res 15:1625–1651. doi:10.1016/j.surg.2006.10.010.Use. http://jmlr.org/papers/v15/wager14a.html, arXiv:1311.4555v2 Google Scholar
  18. 18.
    Hutchinson M (2016) Citrine Informatics Lolo. https://github.com/CitrineInformatics/lolo accessed: 2017-03-21
  19. 19.
    Bocarsly JD, Levin EE, Garcia CA, Schwennicke K, Wilson SD, Seshadri R (2017) A simple computational proxy for screening magnetocaloric compounds. Chem Mater 29(4):1613–1622CrossRefGoogle Scholar
  20. 20.
    Sparks T, Gaultois M, Oliynyk A, Brgoch J, Meredig B (2016) Data mining our way to the next generation of thermoelectrics. Scr Mater 111:10–15CrossRefGoogle Scholar
  21. 21.
    Agrawal A, Deshpande P D, Cecen A, Basavarsu G P, Choudhary A N, Kalidindi S R (2014) Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters. Integr Mater Manuf Innov 3(1):1–19CrossRefGoogle Scholar
  22. 22.
    Ward L, Agrawal A, Choudhary A, Wolverton C (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. arXiv preprintGoogle Scholar
  23. 23.
    van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605Google Scholar
  24. 24.
    O’Mara J, Meredig B, Michel K (2016) Materials data infrastructure: a case study of the citrination platform to examine data import, storage, and access. JOM 68(8):2031–2034CrossRefGoogle Scholar

Copyright information

© The Minerals, Metals & Materials Society 2017

Authors and Affiliations

  1. 1.Citrine InformaticsRedwood CityUSA

Personalised recommendations