Community action on FAIR data will fuel a revolution in materials research

Brinson, L. Catherine; Bartolo, Laura M.; Blaiszik, Ben; Elbert, David; Foster, Ian; Strachan, Alejandro; Voorhees, Peter W.

doi:10.1557/s43577-023-00498-4

Community action on FAIR data will fuel a revolution in materials research

Impact: Letter
Open access
Published: 29 March 2023

Volume 49, pages 12–16, (2024)
Cite this article

Download PDF

You have full access to this open access article

MRS Bulletin Aims and scope Submit manuscript

Community action on FAIR data will fuel a revolution in materials research

Download PDF

L. Catherine Brinson ORCID: orcid.org/0000-0003-2551-1563¹,
Laura M. Bartolo⁴,
Ben Blaiszik^3,6,
David Elbert⁸,
Ian Foster^2,3,
Alejandro Strachan⁵ &
…
Peter W. Voorhees⁷

Graphical abstract

How common is the common-ratio effect?

Article Open access 20 September 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Little of the data—arguably the most important product of worldwide materials research worldwide—are shared in forms usable by others. The small and biased proportion of results published are buried in plots and text licensed by journals. This situation wastes resources, hinders innovation, and, in the current era of data-driven discovery, is no longer tenable. In this article, we propose specific synergistic, collaborative, and global actions to enable the assembly of large quantities of Findable, Accessible, Interoperable, Reusable (FAIR)¹ materials data. We provide a context to comprehend what FAIR data can mean for materials scientists, a motivation for the adoption of FAIR principles, and a perspective on how widespread adoption of FAIR data can advance their science.

A decade ago, the US Materials Genome Initiative (MGI)² articulated goals of accelerated materials development and deployment via advanced computational methods, integrated and high throughput experiments, with a focus on data standards, sharing, transparency, modeling, and design; a 2021 Materials Genome Initiative Strategic Plan² expands the MGI’s scope to encompass a new “Materials Innovation Infrastructure,” a focus on AI, and a community network for standards, education, and training. Parallel initiatives worldwide are pursuing similar visions.^3,4,5,6,7 In Germany, the National Research Data Infrastructure (NFDI)’s MatWerk 2021 has awarded five years of funding to support efforts in FAIR DATA and shared data space for Materials Science and Engineering. In 2021, the UK launched its Innovation Strategy with support for advanced materials and manufacturing, and in 2020, the EU established the OntoCommons for shared materials and manufacturing data ontologies. Japan’s Strategic Innovations Program (SIP) created the Design System of Structural Materials in 2020. Such efforts, the rapidly growing number of papers in materials science using machine learning and their citation rates (Figure 4 in References 8 and 9), and the emergence of journal publications focused on scientific data and associated metadata (e.g., Nature Scientific Data) make clear the global importance of data to materials science and engineering.^{10,11,12,13,14,15,16,17}

Yet despite large investments in materials science and engineering—more than $37B in 2018 by US industry alone¹⁸—most data languish in local storage systems or reports and papers.^2,12,13 In contrast, imagine being able to “Google” all materials ever synthesized or predicted, to find organized, annotated, quantitative, referenced, citable, and downloadable data for the subset of materials that have a desired combination of properties and characteristics. Joining MGI and FAIR data brings this vision into reach.

FAIR materials data

The FAIR principles, applicable to any type of data, provide unifying guidelines for the effective sharing, discovery, and reuse of digital resources, including data, metadata, protocols, workflows, and software. FAIR data for Materials will enable better science via reproducibility and transparency and provide a path to reward valued data generators. Widespread FAIR data will unleash an era of materials informatics where exploring prior work is nearly instantaneous and drive development of advanced analytics and machine learning for materials.

Realizing the promises of MGI and FAIR, however, requires community agreement and implementation. General FAIR principles¹ are necessary but not sufficient to transform the field of materials, where varied interpretations and definitions of basic composition and property terms hold back effective implementation.¹⁹ Each data type has different forms, vocabularies, and descriptors across material types, from polymeric systems to metals, biomaterials, ceramics, and functional materials.

As depicted in Figure 1, making materials data FAIR need not involve heroic efforts but does require attention and deliberate and consistent adoption of available protocols. For example, the use of globally unique, persistent identifiers (UUIDs or PIDs) as long-lasting references for digital resources is “FAIR,” while the typical protocol of making data “available upon request” is “not FAIR.”

Materials data stakeholders: Barriers and hopes

In planning the operational and cultural changes required to achieve broadly FAIR materials data, we must consider the agendas, needs, and concerns of five large cadres of stakeholders: researchers who generate data; developers of hardware and software tools used to produce research results; publishers and repository developers that transmit research results; funders who support research; and consumers who use data. We interviewed members of each group in developing our recommendations.

The number one barrier to FAIR materials data is fear of productive time lost in archiving, cleaning, annotating, and storing data and associated metadata. Funders and researchers are concerned about lost productivity, publishers about barriers and delays to publication when data sharing is enforced, and consumers about spending time finding data in a new and unfamiliar landscape. Other major concerns identified include navigation of licensing, fear of being scooped/fear of losing credit, intellectual property restrictions for materials data, and quality control for data housed in repositories.

Stakeholders simultaneously expressed great hope for a data-rich future where journal articles are linked with FAIR data sets; ever-growing supplementary information (SI) is replaced with references to cleanly annotated data in repositories; measures of quality and FAIR metrics naturally evolve for housed data; and data are citable, findable, and reusable, and have significantly larger impact.

Achieving widespread FAIR materials data requires overcoming both sociological and technical challenges. To combat the major fear of “lost time,” we need demonstrations of FAIR data enabling success, incentives for sharing FAIR data, and infrastructure to simplify or automate data upload and annotation. Data literacy and best practices need to become part of education and researchers’ daily workflow so that making data FAIR is no longer a taxing afterthought nor a fear of lost credit. Sustainability models must be developed and implemented to support hosting large quantities of data and required infrastructure.

A roadmap to FAIR materials data infrastructure

We depict in a roadmap (Figure 2) both individual and community-level actions to accelerate materials research via FAIR data. The community-level actions are:

Incentivize and recognize data literacy and reward best practices in data stewardship. Track “data use” citations and create a data citation index to reward publishing of FAIR data; create open educational content for FAIR materials data methodologies.
Prioritize capture of materials research products beyond data sets: Archive post-processing methods, trained models, and codes; establish links between materials data repositories and associated models/software.
Establish benchmark materials data sets of high value and high profile to drive algorithm development. Establish an award for materials discoveries based on prior data.
Define high-impact community data generation tasks in subfields of materials science. Challenge materials subfields to prioritize specific data products (e.g., microstructural image collections) for transformational change. Engage repositories and communities to catalyze these changes.
Promote trustworthy repositories. Define audit and certification criteria for materials repositories to ensure long-term storage, access, and preservation of data as part of the global materials data infrastructure.
Collect and publicize success stories. Collate compelling examples of data-driven approaches used to advance materials research, curated and promoted by professional organizations and funding agencies.

Figure 2 also shows four levels of individual action that can be taken by researchers, research groups, and labs to produce FAIR data and enhance scholarly output. The practices encompassed by these levels—organized in roughly increasing order of complexity—can be adopted one at a time, in various orders, and in any materials research effort. In each level, actions are labeled with F, A, I, or R:

Level 1: Planning and preliminary data submission

Define materials data and metadata at project outset. Consider how the data could be reused by others for tasks unrelated to the originator’s work; quantifying and capturing uncertainties is often critical in this step. (R) Use electronic lab notebooks to facilitate data and metadata extraction as well as documenting and publishing data management workflows.²³ (I) Make data available through a general repository with persistent identifiers (e.g., DOIs) for data sets (e.g., Zenodo, Figshare, Dryad). (F) Include licensing information and how to cite examples in metadata, as supported by Figshare, Dryad, MDF, and nanoHUB. (R).

Level 2: Materials-specific metadata and complete submission

Include detailed descriptive metadata, via for example, metadata columns in a CSV data file. (R, F) Place data and metadata in materials-specific repository (F, A) with fields designed to handle and share materials relevant terms: for example, OpenKIM for interatomic models, MDF for heterogenous data sets up to many terabytes in size, Foundry for structured ML-ready data sets, MaterialsMine for polymer nanocomposites and structural metamaterials, or AFLOW or OQMD for DFT calculated data on thermodynamic properties of crystallographic materials.

Level 3: Enhanced functionality

Ensure data and metadata are both human and machine readable; employ “tidy” data protocols.²⁴ Place data in repositories that support long-term storage and query via standard interfaces (e.g., APIs) (F, A): for example, Materials Project, AFLOW, OQMD, MDF.

Level 4: Community standards, provenance, and reusing data

Use community standards for knowledge representation and standard file formats for data and metadata. Examples include SMILES for molecules and CIF for crystals that can be automatically processed by visualization and machine learning packages. (I) Include metadata that points to other metadata as needed to provide detailed context, ensure software and protocols have well-defined and verified requirements (inputs) and services (outputs). (I) Reuse others’ data in your research (e.g., for benchmarking or in analyses to create new data). (R).

Community networks such as the US MaRDA and materials subgroups in the Research Data Alliance (RDA), working closely with stakeholders, can support the transition to FAIR materials data. Critical actions include providing the coordination and engagement required to develop and maintain protocols, standards, and best practices; development and promotion of sustainability models for materials data repositories; regular updates to the roadmap to FAIR materials data and annual scoring of the communities’ progress.

New data-driven approaches to materials innovation promise transformational contributions to human health and prosperity, but are hindered by inadequate access to data on materials and material properties. The roadmap presented here highlights policies and practices that the materials community and individuals can adopt to catalyze the creation of a distributed, yet unified, worldwide materials innovation network within which data can be reused and recombined to unleash a new era of accelerated innovation and progress.

Data availability

Data sharing not applicable to this article as no data sets were generated or analyzed during the current study.

References

M.D. Wilkinson, M. Dumontier, I. Jan Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J.G. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A.C ‘t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. Van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, B. Monsa, Sci. Data 3, 160018 (2016)
Article Google Scholar
US White House Office of Science and Technology Policy, Materials Genome Initiative Strategic Plan (US White House Office of Science and Technology Policy, Washington, DC, 2021) (original MGI letter from 2011 also at https://www.mgi.gov). https://www.mgi.gov/sites/default/files/documents/MGI-2021-Strategic-Plan.pdf. Accessed 15 Jan 2023
C. Eberl, M. Neibel, T. Hickel T, National Research Data Infrastructure for Materials Science and Engineering MatWerk (2021). https://nfdi-matwerk.de/. Accessed 15 Jan 2023
UK Department of Business, Energy & Industrial Strategy, UK Innovation Strategy: Leading the Future by Creating It (UK Department of Business, Energy & Industrial Strategy, London, 2021). https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1009577/uk-innovation-strategy.pdf. Accessed 15 Jan 2023
European Materials Modelling Council (EMMC), Ontology-Driven Data Documentation for Industry Commons (EMMC, Brussels, 2020). https://ontocommons.eu. Accessed 15 Jan 2023
Japan Council for Science and Technology Policy, Strategic Innovations Program—Pioneering the Future: Japanese Science, Technology and Innovation 2020 (Japan Council for Science and Technology Policy, Tokyo, 2020). https://www8.cao.go.jp/cstp/panhu/sip_english/sip_en.html. Accessed 15 Jan 2023
S. O’Meara, Nature 567, S1 (2019). https://www.nature.com/articles/d41586-019-00885-5. Accessed 15 Jan 2023
M.E. Deagen, L.C. Brinson, R.A. Vaia, L.S. Schadler, MRS Bull. 47(4), 379 (2022). https://doi.org/10.1557/s43577-021-00214-0
Article Google Scholar
B. Blaiszik, AI/ML Publication Statistics and Charts (2022). https://doi.org/10.5281/zenodo.7057437
L.C. Brinson, M. Deagen, W. Chen, J. McCusker, D.L. McGuinness, L.S. Schadler, M. Palmeri, U. Ghumman, A. Lin, B. Hu, ACS Macro Lett. 9, 1086 (2020). https://doi.org/10.1021/acsmacrolett.0c00264
Article CAS Google Scholar
M.M. Cencer, B.A. Suslick, J.S. Moore, Tetrahedron 123, 132984 (2022)
Article CAS Google Scholar
L. Himanen, A. Geurts, A.S. Foster, P. Rinke, Adv. Sci. 6(21), 1900808 (2019)
Article Google Scholar
The Minerals, Metals & Materials Society (TMS), Building a Materials Data Infrastructure: Opening New Pathways to Discovery and Innovation in Science and Engineering (TMS, Pittsburgh, 2017)
Google Scholar
J.J. de Pablo, N.E. Jackson, M.A. Webb, L.-Q. Chen, J.E. Moore, D. Morgan, R. Jacobs, T. Pollock, D.G. Schlom, E.S. Toberer, J. Analytis, I. Dabo, D.M. DeLongchamp, G.A. Fiete, G.M. Grason, G. Hautier, Y. Mo, K. Rajan, E.J. Reed, E. Rodriguez, V. Stevanovic, J. Suntivich, K. Thornton, J.-C. Zhao, NPJ Comput. Mater. 5(1), 41 (2019)
A. Jain, K.A. Persson, G. Ceder, APL Mater. 4(5), 053102 (2016)
Article Google Scholar
B. Blaiszik, L. Ward, M. Schwarting, J. Gaff, R. Chard, D. Pike, K. Chard, I. Foster, MRS Commun. 9(4), 1125 (2019)
Article CAS Google Scholar
M. Scheffler, M. Aeschlimann, M. Albrecht, T. Bereau, H.J. Bungartz, C. Felser, M. Greiner, A. Gross, C.T. Koch, K. Kremer, W.E. Nagel, M. Scheidgen, C. Woell, C. Draxl, Nature 604, 635 (2022). https://doi.org/10.1038/s41586-022-04501-x
Article CAS Google Scholar
National Center for Science and Engineering Statistics, Business Research and Development: 2018. NSF 21-312 (National Science Foundation, Alexandria, 2020). https://ncses.nsf.gov/pubs/nsf21312/. Accessed 15 Jan 2023
L. Lannom, D. Koureas, A.R. Hardisty, Data Intell. 2(1–2), 122 (2020)
Article Google Scholar
S. Hall, B. McMahon (eds.), International Tables for Crystallography Volume G: Definition and Exchange of Crystallographic Data (Springer, Dordrecht, 2005)
Google Scholar
D. Weininger, J. Chem. Inf. Comput. Sci. 28(1), 31 (1988). https://doi.org/10.1021/ci00057a005
Article CAS Google Scholar
C.W. Andersen, R. Armiento, E. Blokhin, G.J. Conduit, S. Dwaraknath, M.L. Evans, Á. Fekete, A. Gopakumar, S. Gražulis, A. Merkys, F. Mohamed, C. Oses, G. Pizzi, G.-M. Rignanese, M. Scheidgen, L. Talirz, C. Toher, D. Winston, R. Aversa, K. Choudhary, P. Colinet, S. Curtarolo, D. Di Stefano, C. Draxl, S. Er, M. Esters, M. Fornari, M. Giantomassi, M. Govoni, G. Hautier, V. Hegde, M.K. Horton, P. Huck, G. Huhs, J. Hummelshøj, A. Kariyaa, B. Kozinsky, S. Kumbhar, M. Liu, N. Marzari, A.J. Morris, A.A. Mostofi, K.A. Persson, G. Petretto, T. Purcell, F. Ricci, F. Rose, M. Scheffler, D. Speckhard, M. Uhrin, A. Vaitkus, P. Villars, D. Waroquiers, C. Wolverton, M. Wu, X. Yang, Sci. Data 8, 217 (2021). https://doi.org/10.1038/s41597-021-00974-z
M. Hunt, S. Clark, D. Mejia, S. Desai, A. Strachan, PLoS One 17(3), e0264492 (2022)
Article CAS Google Scholar
H. Wickham, J. Stat. Softw. 59(10), 1 (2014). https://doi.org/10.18637/jss.v059.i10
Article Google Scholar

Download references

Acknowledgments

The authors wish to thank J. Allison, A. Mehta, E. De Guire, E. Schultes, D. Lowenberg, J. Warren, J. Brook, and L. Franklin for helpful conversations as this article was prepared.

Funding

L.C.B. acknowledges support from DOE DE-SC0021358 and NSF CSSI-1835677. P.V. acknowledges the financial support of Award 70NANB14H012 from the US Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Material Design (CHiMaD). D.C.E. acknowledges support from the National Science Foundation [Platform for the Accelerated Realization, Analysis, and Discovery of Interface Materials (PARADIM)] under Cooperative Agreement DMR-2039380 and DTRA Award HDTRA1-20-2-0001. A.S. acknowledges support from the US National Science Foundation DMREF Program (DMREF-1922316) and Network for Computational Nanotechnology (EEC-1227110).

Author information

Authors and Affiliations

Department of Mechanical Engineering and Materials Science, Duke University, Durham, USA
L. Catherine Brinson
Department of Computer Science, The University of Chicago, Chicago, USA
Ian Foster
Data Science and Learning Division, Argonne National Laboratory, Lemont, USA
Ben Blaiszik & Ian Foster
Center for Hierarchical Materials Design, Northwestern University, Evanston, USA
Laura M. Bartolo
School of Materials Engineering, Purdue University, West Lafayette, USA
Alejandro Strachan
Globus, The University of Chicago, Chicago, USA
Ben Blaiszik
Department of Materials Science and Engineering, Northwestern University, Evanston, USA
Peter W. Voorhees
PARADIM Materials Innovation Platform, Johns Hopkins University, Baltimore, USA
David Elbert

Authors

L. Catherine Brinson
View author publications
You can also search for this author in PubMed Google Scholar
Laura M. Bartolo
View author publications
You can also search for this author in PubMed Google Scholar
Ben Blaiszik
View author publications
You can also search for this author in PubMed Google Scholar
David Elbert
View author publications
You can also search for this author in PubMed Google Scholar
Ian Foster
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Strachan
View author publications
You can also search for this author in PubMed Google Scholar
Peter W. Voorhees
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Catherine Brinson.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Widely shared and accessible materials data are the key to a world in which accelerated material development addresses society’s greatest challenges. We present a roadmap for connected materials data to enable researchers, designers, and manufacturers to harness its power.

Rights and permissions

Open access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Brinson, L.C., Bartolo, L.M., Blaiszik, B. et al. Community action on FAIR data will fuel a revolution in materials research. MRS Bulletin 49, 12–16 (2024). https://doi.org/10.1557/s43577-023-00498-4

Download citation

Accepted: 13 February 2023
Published: 29 March 2023
Issue Date: January 2024
DOI: https://doi.org/10.1557/s43577-023-00498-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.