Skip to main content
Log in

A Case Study of Multimodal, Multi-institutional Data Management for the Combinatorial Materials Science Community

  • Thematic Section: Harnessing the Power of Materials Data
  • Published:
Integrating Materials and Manufacturing Innovation Aims and scope Submit manuscript

Abstract

Although the convergence of high-performance computing, automation, and machine learning has significantly altered the materials design timeline, transformative advances in functional materials and acceleration of their design will require addressing the deficiencies that currently exist in materials informatics, particularly a lack of standardized experimental data management. The challenges associated with experimental data management are especially true for combinatorial materials science, where advancements in automation of experimental workflows have produced datasets that are often too large and too complex for human reasoning. The data management challenge is further compounded by the multimodal and multi-institutional nature of these datasets, as they tend to be distributed across multiple institutions and can vary substantially in format, size, and content. Furthermore, modern materials engineering requires the tuning of not only composition but also of phase and microstructure to elucidate processing–structure–property–performance relationships. To adequately map a materials design space from such datasets, an ideal materials data infrastructure would contain data and metadata describing (i) synthesis and processing conditions, (ii) characterization results, and (iii) property and performance measurements. Here, we present a case study for the low-barrier development of such a dashboard that enables standardized organization, analysis, and visualization of a large data lake consisting of combinatorial datasets of synthesis and processing conditions, X-ray diffraction patterns, and materials property measurements generated at several different institutions. While this dashboard was developed specifically for data-driven thermoelectric materials discovery, we envision the adaptation of this prototype to other materials applications, and, more ambitiously, future integration into an all-encompassing materials data management infrastructure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Code Availability

The source code for the data portal described in this work is currently available on Bitbucket at the following link: https://bitbucket.org/tecca-data-portal/workspace/snippets/rqnMyX/source-code#file-data-portal-src.zip. Note that this code is made available for reference only, as running the dashboard requires access to a private Globus endpoint.

Notes

  1. Certain commercial equipment, instruments, software, or materials are identified in this document. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.

References

  1. Alam MM, Aktar MA, Idris NDM, Al-Amin AQ (2023) World energy economics and geopolitics amid COVID-19 and post-COVID-19 policy direction. World Dev Sustain 2:100048. https://doi.org/10.1016/j.wds.2023.100048

    Article  Google Scholar 

  2. Alberi K, Nardelli MB, Zakutayev A, Mitas L, Curtarolo S, Jain A, Fornari M, Marzari N, Takeuchi I, Green ML et al (2018) The 2019 materials design roadmap. J Phys D Appl Phys 52:013001. https://doi.org/10.1088/1361-6463/aad926

    Article  CAS  Google Scholar 

  3. Ong SP (2019) Accelerating materials science with high-throughput computations and machine learning. Comput Mater Sci 161:143. https://doi.org/10.1016/j.commatsci.2019.01.013

    Article  CAS  Google Scholar 

  4. Correa-Baena J-P, Hippalgaonkar K, van Duren J, Jaffer S, Chandrasekhar VR, Stevanovic V, Wadia C, Guha S, Buonassisi T (2018) Accelerating materials development via automation, machine learning, and high-performance computing. Joule 2:1410. https://doi.org/10.1016/j.joule.2018.05.009

    Article  CAS  Google Scholar 

  5. Fong AY, Pellouchoud L, Davidson M, Walroth RC, Church C, Tcareva E, Wu L, Peterson K, Meredig B, Tassone CJ (2021) Utilization of machine learning to accelerate colloidal synthesis and discovery. J Chem Phys 154:224201. https://doi.org/10.1063/5.0047385

    Article  CAS  PubMed  Google Scholar 

  6. Ling J, Hutchinson M, Antono E, Paradiso S, Meredig B (2017) High-dimensional materials and process optimization using data-driven experimental design with well-calibrated uncertainty estimates. Integr Mater Manuf Innov 6:207–217. https://doi.org/10.1007/s40192-017-0098-z

    Article  Google Scholar 

  7. Himanen L, Geurts A, Foster AS, Rinke P (2019) Data-driven materials science: Status, challenges, and perspectives. Adv Sci 6:1900808. https://doi.org/10.1002/advs.201900808

    Article  Google Scholar 

  8. Open science. https://openscience.com. Accessed 30 Aug 2023

  9. Foster. https://www.fosteropenscience.eu. Accessed 30 Aug 2023

  10. Chesbrough H (2006) Open innovation: the new imperative for creating and profiting from technology. Harvard Business School Press, Brighton, MA

    Book  Google Scholar 

  11. McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, McDougall D, Nosek BA, Ram K, Soderberg CK et al (2016) Point of view: how open science helps researchers succeed. eLife 5:e16800. https://doi.org/10.7554/eLife.16800

    Article  PubMed  PubMed Central  Google Scholar 

  12. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18

    Article  PubMed  PubMed Central  Google Scholar 

  13. Scientific data. https://www.nature.com/sdata/. Accessed 30 Aug 2023

  14. Sharing research data for journal authors. https://www.elsevier.com/authors/author-services/research-data. Accessed 30 Aug 2023

  15. (2023) Public access plan: ‘Ensuring free, immediate and equitable access’ to the results of Department of Energy Scientific Research, US Department of Energy, Washington, DC. https://doi.org/10.11578/2023DOEPublicAccessPlan

  16. Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson KA (2013) Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater 1:011002. https://doi.org/10.1063/1.4812323

    Article  CAS  Google Scholar 

  17. Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the Open Quantum Materials Database (OQMD). JOM 65:1501. https://doi.org/10.1007/s11837-013-0755-4

    Article  CAS  Google Scholar 

  18. Kirklin S, Saal JE, Meredig B, Thompson A, Doak JW, Aykol M, Rühl S, Wolverton C (2015) The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput Mater 1:15010. https://doi.org/10.1038/npjcompumats.2015.10

    Article  CAS  Google Scholar 

  19. Suram SK, Newhouse PF, Zhou L, Van Campen DG, Mehta A, Gregoire JM (2016) High throughput light absorber discovery, part 2: Establishing structure–band gap energy relationships. ACS Comb Sci 18:682–688. https://doi.org/10.1021/acscombsci.6b00054

    Article  CAS  PubMed  Google Scholar 

  20. Li YJ, Savan A, Kostka A, Stein HS, Ludwig A (2018) Accelerated atomic-scale exploration of phase evolution in compositionally complex materials. Mater Horiz 5:86. https://doi.org/10.1039/C7MH00486A

    Article  CAS  Google Scholar 

  21. Stein HS, Guevarra D, Shinde A, Jones RJR, Gregoire JM, Haber JA (2019) Functional mapping reveals mechanistic clusters for OER catalysis across (Cu–Mn–Ta–Co–Sn–Fe)Ox composition and pH space. Mater Horiz 6:1251. https://doi.org/10.1039/C8MH01641K

    Article  CAS  Google Scholar 

  22. Maier WF, Stöwe K, Sieg S (2007) Combinatorial and high-throughput materials science. Angew Chem Int Ed 46:6016. https://doi.org/10.1002/anie.200603675

    Article  CAS  Google Scholar 

  23. Talley KR, Bauers SR, Melamed CL, Papac MC, Heinselman KN, Khan I, Roberts DM, Jacobson V, Mis A, Brennecka GL, Perkins JD, Zakutayev A (2019) Combigor: data-analysis package for combinatorial materials science. ACS Comb Sci 21:537. https://doi.org/10.1021/acscombsci.9b00077

    Article  CAS  PubMed  Google Scholar 

  24. Takeuchi I, Long CJ, Famodu OO, Murakami M, Hattrick-Simpers J, Rubloff GW, Stukowski M, Rajan K (2005) Data management and visualization of x-ray diffraction spectra from thin film ternary composition spreads. Rev Sci Instrum 76:062223. https://doi.org/10.1063/1.1927079

    Article  CAS  Google Scholar 

  25. Zakutayev A, Wunder N, Schwarting M, Perkins JD, White R, Munch K, Tumas W, Phillips C (2018) An open experimental database for exploring inorganic materials. Sci Data 5:180053. https://doi.org/10.1038/sdata.2018.53

    Article  PubMed  PubMed Central  Google Scholar 

  26. Foster I (2011) Globus online: accelerating and democratizing science through cloud-based services. IEE Internet Comput 15:70. https://doi.org/10.1109/MIC.2011.64

    Article  Google Scholar 

  27. Allen B, Bresnahan J, Childers L, Foster I, Kandaswamy G, Kettimuthu R, Kordas J, Link M, Martin S, Pickett K, Tuecke S (2012) Software as a service for data scientists. Commun ACM 55:81. https://doi.org/10.1145/2076450.2076468

    Article  Google Scholar 

  28. Shi XL, Zou J, Chen ZG (2020) Advanced thermoelectric design: from materials and structures to devices. Chem Rev 120(15):7399–7515. https://doi.org/10.1021/acs.chemrev.0c00026

    Article  CAS  PubMed  Google Scholar 

  29. Fielding RT (2000) Architectural styles and the design of network-based software architectures. PhD thesis, University of California Irvine

  30. Harris, R Svelte. https://svelte.dev/. Accessed 30 Aug 2023

  31. Ronacher A Flask. https://flask.palletsprojects.com/en/2.3.x/. Accessed 30 Aug 2023

  32. Chard K, Dart E, Foster I, Shifflett D, Tuecke S, Williams J (2018) The modern research data portal: a design pattern for networked, data-intensive science. PeerJ Comput Sci 4:e144. https://doi.org/10.7717/peerj-cs.144

    Article  PubMed  PubMed Central  Google Scholar 

  33. Heroku. Available at https://www.heroku.com/. Accessed 30 Aug 2023

  34. Hammer-Lahav E (2010) The OAuth 1.0 Protocol, RFC 5849. https://www.rfc-editor.org/rfc/pdfrfc/rfc5849.txt.pdf. Accessed 30 Aug 2023

  35. Sakimura N, Bradley J, Jones M (2014) OpenID Connect Dynamic Client Registration 1.0 incorporating errata set 1. https://openid.net/specs/openid-connect-registration-1_0.html. Accessed 30 Aug 2023

  36. Sakimura N, Bradley J, Jones M, de Medeiros B, Mortimore C (2014) OpenID Connect Core 1.0 incorporating errata set 1. https://openid.net/specs/openid-connect-core-1_0.html. Accessed 30 Aug 2023

  37. Sakimura N, Bradley J, Jones M, Jay E (2014) OpenID Connect Discovery 1.0 incorporating errata set 1. https://openid.net/specs/openid-connect-discovery-1_0.html. Accessed 30 Aug 2023

  38. de Pablo JJ, Jones B, Kovacs CL, Ozolins V, Ramirez AP (2014) The materials genome initiative, the interplay of experiment, theory and computation. Curr Opin Solid State Mater Sci 18:99. https://doi.org/10.1016/j.cossms.2014.02.003

    Article  Google Scholar 

  39. National Science and Technology Council Committee on Technology Subcommittee on the Materials Genome Initiative (2021) Materials Genome Initiative strategic plan. National Science and Technology Council, Washington, DC. https://www.mgi.gov/sites/default/files/documents/MGI-2021-Strategic-Plan.pdf. Accessed 30 Aug 2023

  40. Krahl R, Darroch L, Huber R, Devaraju A, Klump J, Habermann T, Stocker M, The Research Data Alliance Persistent Identification of Instruments Working Group members (2021) Metadata schema for the persistent identification of instruments. Research Data Alliance. https://doi.org/10.15497/RDA00070

  41. React. https://react.dev/. Accessed 30 Aug 2023

  42. Bootstrap. https://getbootstrap.com/. Accessed 30 Aug 2023

  43. Bootstrap Icons. https://icons.getbootstrap.com/. Accessed 30 Aug 2023

  44. Nikolaev P, Hooper D, Webber F, Rao R, Decker K, Krein M, Poleski J, Barto R, Maruyama B (2016) Autonomy in materials research: a case study in carbon nanotube growth. npj Comput Mater 2:16031. https://doi.org/10.1038/npjcompumats.2016.31

    Article  Google Scholar 

  45. Kusne AG, Gao T, Mehta A, Ke L, Nguyen MC, Ho K-M, Antropov V, Wang C-Z, Kramer MJ, Long C, Takeuchi I (2014) On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets. Sci Rep 4:6367. https://doi.org/10.1038/srep06367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Ley SV, Fitzpatrick DE, Ingham RJ, Myers RM (2015) Organic synthesis: March of the machines. Angew Chem Int Ed 54:3449. https://doi.org/10.1002/anie.201410744

    Article  CAS  Google Scholar 

  47. Kitson PJ, Marie G, Francoia J-P, Zalesskiy SS, Sigerson RC, Mathieson JS, Cronin L (2018) Digitization of multistep organic synthesis in reactionware for on-demand pharmaceuticals. Science 359:314. https://doi.org/10.1126/science.aao3466

    Article  CAS  PubMed  Google Scholar 

  48. Rahmanian F, Flowers J, Guevarra D, Richter M, Fichtner M, Donnely P, Gregoire JM, Stein HS (2022) Enabling modular autonomous feedback-loops in materials science through hierarchical experimental laboratory automation and orchestration. Adv Mater Interfaces 9:2101987. https://doi.org/10.1002/admi.202101987

    Article  Google Scholar 

  49. Deneault JR, Chang J, Myung J, Hooper D, Armstrong A, Pitt M, Maruyama B (2021) Toward autonomous additive manufacturing: Bayesian optimization on a 3D printer. MRS Bull 46:566. https://doi.org/10.1557/s43577-021-00051-1

    Article  Google Scholar 

  50. Elias JR, Chard R, Libera JA, Foster I, Chaudhuri S (2020) The manufacturing data and machine learning platform: enabling real-time monitoring and control of scientific experiments via IoT. In: 2020 IEEE 6th world forum on internet of things (WF-IoT), New Orleans, LA, USA. https://doi.org/10.1109/WF-IoT48130.2020.9221078

  51. Sim M, Ghazi Vakili M, Strieth-Kalthoff F, Hao H, Hickman R, Miret S, Pablo-García S, Aspuru-Guzik A (2023) ChemOS 2.0: an orchestration architecture for chemical self-driving laboratories. ChemRxiv. This content is a preprint and has not been peer-reviewed. https://doi.org/10.26434/chemrxiv-2023-v2khf

  52. Statt MJ, Rohr BA, Guevarra D, Suram SK, Morrell TE, Gregoire JM (2023) The materials provenance store. Sci Data 10:184. https://doi.org/10.1038/s41597-023-02107-0

    Article  PubMed  PubMed Central  Google Scholar 

  53. Soedarmadji E, Stein HS, Suram SK, Guevarra D, Gregoire JM (2019) Tracking materials science data lineage to manage millions of materials experiments and analyses. NPJ Comput Mater 5:79. https://doi.org/10.1038/s41524-019-0216-x

    Article  Google Scholar 

  54. Stein HS, Gregoire JM (2019) Progress and prospects for accelerating materials science with automated and autonomous workflows. Chem Sci 10:9640. https://doi.org/10.1039/C9SC03766G

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Blaiszik B, Chard K, Pruyne J, Ananthakrishnan R, Tuecke S, Foster I (2016) The materials data facility: data services to advance materials science research. JOM 68:2045. https://doi.org/10.1007/s11837-016-2001-3

    Article  Google Scholar 

  56. Ghiringhelli LM, Baldauf C, Bereau T, Brockhauser S, Carbogno C, Chamanara J, Cozzini S, Curtarolo S, Draxl C, Dwaraknath S (2023) Shared metadata for data-centric materials science. Sci Data 10:626. https://doi.org/10.1038/s41597-023-02501-8

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the US Department of Energy, Office of Energy Efficiency and Renewable Energy (EERE), specifically the Advanced Materials & Manufacturing Technologies Office (AMMTO), under contract DE-AC02-76SF00515.

Funding

Advanced Materials and Manufacturing Technologies Office, DE-AC02-76SF00515, Apurva Mehta

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James E. Saal.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Allec, S.I., Muckley, E.S., Johnson, N.S. et al. A Case Study of Multimodal, Multi-institutional Data Management for the Combinatorial Materials Science Community. Integr Mater Manuf Innov (2024). https://doi.org/10.1007/s40192-024-00345-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40192-024-00345-7

Keywords

Navigation