Abstract
Although the convergence of high-performance computing, automation, and machine learning has significantly altered the materials design timeline, transformative advances in functional materials and acceleration of their design will require addressing the deficiencies that currently exist in materials informatics, particularly a lack of standardized experimental data management. The challenges associated with experimental data management are especially true for combinatorial materials science, where advancements in automation of experimental workflows have produced datasets that are often too large and too complex for human reasoning. The data management challenge is further compounded by the multimodal and multi-institutional nature of these datasets, as they tend to be distributed across multiple institutions and can vary substantially in format, size, and content. Furthermore, modern materials engineering requires the tuning of not only composition but also of phase and microstructure to elucidate processing–structure–property–performance relationships. To adequately map a materials design space from such datasets, an ideal materials data infrastructure would contain data and metadata describing (i) synthesis and processing conditions, (ii) characterization results, and (iii) property and performance measurements. Here, we present a case study for the low-barrier development of such a dashboard that enables standardized organization, analysis, and visualization of a large data lake consisting of combinatorial datasets of synthesis and processing conditions, X-ray diffraction patterns, and materials property measurements generated at several different institutions. While this dashboard was developed specifically for data-driven thermoelectric materials discovery, we envision the adaptation of this prototype to other materials applications, and, more ambitiously, future integration into an all-encompassing materials data management infrastructure.
Similar content being viewed by others
Code Availability
The source code for the data portal described in this work is currently available on Bitbucket at the following link: https://bitbucket.org/tecca-data-portal/workspace/snippets/rqnMyX/source-code#file-data-portal-src.zip. Note that this code is made available for reference only, as running the dashboard requires access to a private Globus endpoint.
Notes
Certain commercial equipment, instruments, software, or materials are identified in this document. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.
References
Alam MM, Aktar MA, Idris NDM, Al-Amin AQ (2023) World energy economics and geopolitics amid COVID-19 and post-COVID-19 policy direction. World Dev Sustain 2:100048. https://doi.org/10.1016/j.wds.2023.100048
Alberi K, Nardelli MB, Zakutayev A, Mitas L, Curtarolo S, Jain A, Fornari M, Marzari N, Takeuchi I, Green ML et al (2018) The 2019 materials design roadmap. J Phys D Appl Phys 52:013001. https://doi.org/10.1088/1361-6463/aad926
Ong SP (2019) Accelerating materials science with high-throughput computations and machine learning. Comput Mater Sci 161:143. https://doi.org/10.1016/j.commatsci.2019.01.013
Correa-Baena J-P, Hippalgaonkar K, van Duren J, Jaffer S, Chandrasekhar VR, Stevanovic V, Wadia C, Guha S, Buonassisi T (2018) Accelerating materials development via automation, machine learning, and high-performance computing. Joule 2:1410. https://doi.org/10.1016/j.joule.2018.05.009
Fong AY, Pellouchoud L, Davidson M, Walroth RC, Church C, Tcareva E, Wu L, Peterson K, Meredig B, Tassone CJ (2021) Utilization of machine learning to accelerate colloidal synthesis and discovery. J Chem Phys 154:224201. https://doi.org/10.1063/5.0047385
Ling J, Hutchinson M, Antono E, Paradiso S, Meredig B (2017) High-dimensional materials and process optimization using data-driven experimental design with well-calibrated uncertainty estimates. Integr Mater Manuf Innov 6:207–217. https://doi.org/10.1007/s40192-017-0098-z
Himanen L, Geurts A, Foster AS, Rinke P (2019) Data-driven materials science: Status, challenges, and perspectives. Adv Sci 6:1900808. https://doi.org/10.1002/advs.201900808
Open science. https://openscience.com. Accessed 30 Aug 2023
Foster. https://www.fosteropenscience.eu. Accessed 30 Aug 2023
Chesbrough H (2006) Open innovation: the new imperative for creating and profiting from technology. Harvard Business School Press, Brighton, MA
McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, McDougall D, Nosek BA, Ram K, Soderberg CK et al (2016) Point of view: how open science helps researchers succeed. eLife 5:e16800. https://doi.org/10.7554/eLife.16800
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18
Scientific data. https://www.nature.com/sdata/. Accessed 30 Aug 2023
Sharing research data for journal authors. https://www.elsevier.com/authors/author-services/research-data. Accessed 30 Aug 2023
(2023) Public access plan: ‘Ensuring free, immediate and equitable access’ to the results of Department of Energy Scientific Research, US Department of Energy, Washington, DC. https://doi.org/10.11578/2023DOEPublicAccessPlan
Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson KA (2013) Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater 1:011002. https://doi.org/10.1063/1.4812323
Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the Open Quantum Materials Database (OQMD). JOM 65:1501. https://doi.org/10.1007/s11837-013-0755-4
Kirklin S, Saal JE, Meredig B, Thompson A, Doak JW, Aykol M, Rühl S, Wolverton C (2015) The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput Mater 1:15010. https://doi.org/10.1038/npjcompumats.2015.10
Suram SK, Newhouse PF, Zhou L, Van Campen DG, Mehta A, Gregoire JM (2016) High throughput light absorber discovery, part 2: Establishing structure–band gap energy relationships. ACS Comb Sci 18:682–688. https://doi.org/10.1021/acscombsci.6b00054
Li YJ, Savan A, Kostka A, Stein HS, Ludwig A (2018) Accelerated atomic-scale exploration of phase evolution in compositionally complex materials. Mater Horiz 5:86. https://doi.org/10.1039/C7MH00486A
Stein HS, Guevarra D, Shinde A, Jones RJR, Gregoire JM, Haber JA (2019) Functional mapping reveals mechanistic clusters for OER catalysis across (Cu–Mn–Ta–Co–Sn–Fe)Ox composition and pH space. Mater Horiz 6:1251. https://doi.org/10.1039/C8MH01641K
Maier WF, Stöwe K, Sieg S (2007) Combinatorial and high-throughput materials science. Angew Chem Int Ed 46:6016. https://doi.org/10.1002/anie.200603675
Talley KR, Bauers SR, Melamed CL, Papac MC, Heinselman KN, Khan I, Roberts DM, Jacobson V, Mis A, Brennecka GL, Perkins JD, Zakutayev A (2019) Combigor: data-analysis package for combinatorial materials science. ACS Comb Sci 21:537. https://doi.org/10.1021/acscombsci.9b00077
Takeuchi I, Long CJ, Famodu OO, Murakami M, Hattrick-Simpers J, Rubloff GW, Stukowski M, Rajan K (2005) Data management and visualization of x-ray diffraction spectra from thin film ternary composition spreads. Rev Sci Instrum 76:062223. https://doi.org/10.1063/1.1927079
Zakutayev A, Wunder N, Schwarting M, Perkins JD, White R, Munch K, Tumas W, Phillips C (2018) An open experimental database for exploring inorganic materials. Sci Data 5:180053. https://doi.org/10.1038/sdata.2018.53
Foster I (2011) Globus online: accelerating and democratizing science through cloud-based services. IEE Internet Comput 15:70. https://doi.org/10.1109/MIC.2011.64
Allen B, Bresnahan J, Childers L, Foster I, Kandaswamy G, Kettimuthu R, Kordas J, Link M, Martin S, Pickett K, Tuecke S (2012) Software as a service for data scientists. Commun ACM 55:81. https://doi.org/10.1145/2076450.2076468
Shi XL, Zou J, Chen ZG (2020) Advanced thermoelectric design: from materials and structures to devices. Chem Rev 120(15):7399–7515. https://doi.org/10.1021/acs.chemrev.0c00026
Fielding RT (2000) Architectural styles and the design of network-based software architectures. PhD thesis, University of California Irvine
Harris, R Svelte. https://svelte.dev/. Accessed 30 Aug 2023
Ronacher A Flask. https://flask.palletsprojects.com/en/2.3.x/. Accessed 30 Aug 2023
Chard K, Dart E, Foster I, Shifflett D, Tuecke S, Williams J (2018) The modern research data portal: a design pattern for networked, data-intensive science. PeerJ Comput Sci 4:e144. https://doi.org/10.7717/peerj-cs.144
Heroku. Available at https://www.heroku.com/. Accessed 30 Aug 2023
Hammer-Lahav E (2010) The OAuth 1.0 Protocol, RFC 5849. https://www.rfc-editor.org/rfc/pdfrfc/rfc5849.txt.pdf. Accessed 30 Aug 2023
Sakimura N, Bradley J, Jones M (2014) OpenID Connect Dynamic Client Registration 1.0 incorporating errata set 1. https://openid.net/specs/openid-connect-registration-1_0.html. Accessed 30 Aug 2023
Sakimura N, Bradley J, Jones M, de Medeiros B, Mortimore C (2014) OpenID Connect Core 1.0 incorporating errata set 1. https://openid.net/specs/openid-connect-core-1_0.html. Accessed 30 Aug 2023
Sakimura N, Bradley J, Jones M, Jay E (2014) OpenID Connect Discovery 1.0 incorporating errata set 1. https://openid.net/specs/openid-connect-discovery-1_0.html. Accessed 30 Aug 2023
de Pablo JJ, Jones B, Kovacs CL, Ozolins V, Ramirez AP (2014) The materials genome initiative, the interplay of experiment, theory and computation. Curr Opin Solid State Mater Sci 18:99. https://doi.org/10.1016/j.cossms.2014.02.003
National Science and Technology Council Committee on Technology Subcommittee on the Materials Genome Initiative (2021) Materials Genome Initiative strategic plan. National Science and Technology Council, Washington, DC. https://www.mgi.gov/sites/default/files/documents/MGI-2021-Strategic-Plan.pdf. Accessed 30 Aug 2023
Krahl R, Darroch L, Huber R, Devaraju A, Klump J, Habermann T, Stocker M, The Research Data Alliance Persistent Identification of Instruments Working Group members (2021) Metadata schema for the persistent identification of instruments. Research Data Alliance. https://doi.org/10.15497/RDA00070
React. https://react.dev/. Accessed 30 Aug 2023
Bootstrap. https://getbootstrap.com/. Accessed 30 Aug 2023
Bootstrap Icons. https://icons.getbootstrap.com/. Accessed 30 Aug 2023
Nikolaev P, Hooper D, Webber F, Rao R, Decker K, Krein M, Poleski J, Barto R, Maruyama B (2016) Autonomy in materials research: a case study in carbon nanotube growth. npj Comput Mater 2:16031. https://doi.org/10.1038/npjcompumats.2016.31
Kusne AG, Gao T, Mehta A, Ke L, Nguyen MC, Ho K-M, Antropov V, Wang C-Z, Kramer MJ, Long C, Takeuchi I (2014) On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets. Sci Rep 4:6367. https://doi.org/10.1038/srep06367
Ley SV, Fitzpatrick DE, Ingham RJ, Myers RM (2015) Organic synthesis: March of the machines. Angew Chem Int Ed 54:3449. https://doi.org/10.1002/anie.201410744
Kitson PJ, Marie G, Francoia J-P, Zalesskiy SS, Sigerson RC, Mathieson JS, Cronin L (2018) Digitization of multistep organic synthesis in reactionware for on-demand pharmaceuticals. Science 359:314. https://doi.org/10.1126/science.aao3466
Rahmanian F, Flowers J, Guevarra D, Richter M, Fichtner M, Donnely P, Gregoire JM, Stein HS (2022) Enabling modular autonomous feedback-loops in materials science through hierarchical experimental laboratory automation and orchestration. Adv Mater Interfaces 9:2101987. https://doi.org/10.1002/admi.202101987
Deneault JR, Chang J, Myung J, Hooper D, Armstrong A, Pitt M, Maruyama B (2021) Toward autonomous additive manufacturing: Bayesian optimization on a 3D printer. MRS Bull 46:566. https://doi.org/10.1557/s43577-021-00051-1
Elias JR, Chard R, Libera JA, Foster I, Chaudhuri S (2020) The manufacturing data and machine learning platform: enabling real-time monitoring and control of scientific experiments via IoT. In: 2020 IEEE 6th world forum on internet of things (WF-IoT), New Orleans, LA, USA. https://doi.org/10.1109/WF-IoT48130.2020.9221078
Sim M, Ghazi Vakili M, Strieth-Kalthoff F, Hao H, Hickman R, Miret S, Pablo-García S, Aspuru-Guzik A (2023) ChemOS 2.0: an orchestration architecture for chemical self-driving laboratories. ChemRxiv. This content is a preprint and has not been peer-reviewed. https://doi.org/10.26434/chemrxiv-2023-v2khf
Statt MJ, Rohr BA, Guevarra D, Suram SK, Morrell TE, Gregoire JM (2023) The materials provenance store. Sci Data 10:184. https://doi.org/10.1038/s41597-023-02107-0
Soedarmadji E, Stein HS, Suram SK, Guevarra D, Gregoire JM (2019) Tracking materials science data lineage to manage millions of materials experiments and analyses. NPJ Comput Mater 5:79. https://doi.org/10.1038/s41524-019-0216-x
Stein HS, Gregoire JM (2019) Progress and prospects for accelerating materials science with automated and autonomous workflows. Chem Sci 10:9640. https://doi.org/10.1039/C9SC03766G
Blaiszik B, Chard K, Pruyne J, Ananthakrishnan R, Tuecke S, Foster I (2016) The materials data facility: data services to advance materials science research. JOM 68:2045. https://doi.org/10.1007/s11837-016-2001-3
Ghiringhelli LM, Baldauf C, Bereau T, Brockhauser S, Carbogno C, Chamanara J, Cozzini S, Curtarolo S, Draxl C, Dwaraknath S (2023) Shared metadata for data-centric materials science. Sci Data 10:626. https://doi.org/10.1038/s41597-023-02501-8
Acknowledgements
This work was partially supported by the US Department of Energy, Office of Energy Efficiency and Renewable Energy (EERE), specifically the Advanced Materials & Manufacturing Technologies Office (AMMTO), under contract DE-AC02-76SF00515.
Funding
Advanced Materials and Manufacturing Technologies Office, DE-AC02-76SF00515, Apurva Mehta
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Allec, S.I., Muckley, E.S., Johnson, N.S. et al. A Case Study of Multimodal, Multi-institutional Data Management for the Combinatorial Materials Science Community. Integr Mater Manuf Innov (2024). https://doi.org/10.1007/s40192-024-00345-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40192-024-00345-7