Introduction

As new characterization techniques are developed and these processes increase complexity, the amount of data generated also increases. Performing analysis using larger data sources, or big data, is known as the fourth paradigm of scientific research [1]. This new paradigm of data-driven research includes but not limited to machine learning, statistical analysis, data mining, and modeling [1]. To take advantage of big data and maximize time and resource efficiency, an automated framework is necessary which outlines the workflow of materials science data from collection to modeling.

Developments in advanced computational methods, instrumentation, and data storage capabilities have enabled new approaches in many fields like engineering, medicine, physical and social sciences [2]. The data generated with newer methods are increasing in cost and volume of data can easily reach terabytes or even petabytes in scale. Traditional research methodologies and processes may not be sufficient in sustainable data management, which can lead to research waste and one-time utility of expensive data sets. In order to meet these evolving challenges, data-centric approaches, like FAIR principles, are needed.

FAIR principles, a framework for data governance, aim to make data findable, accessible, interoperable, and reusable (FAIR) [3]. Additionally, integration of data sets from a diverse range of fields can uncover new insights, but the processes or mechanisms in which to integrate disparate data sets are not intuitive. The FAIR initiative is a joint effort launched in 2016 by a group of scientists and industries proposing to revolutionize the way organizations manage, store, process, and exchange data [3]. By defining a set of recommendations and technical guidance for each FAIR principle, the FAIR initiative promotes an efficiency and transparency in data management systems.

Substantial work has been reported in the literature toward adoption of FAIRification for scientific study [4,5,6,7]. However, the interpretations and application of FAIR principles in most cases have been incomplete and irregular. When multiple domains are integrated, it is unclear on how the FAIR practices are applied across disciplines needed for material characterization. These characterization techniques, for example, can span from mechanical testing, chemical composition, and metallography, which requires domain expertise for each technique. Global initiatives, such as NOMAD [8], that aim to improve data governance such as developing FAIR are increasing in practice and diversity in disciplines; some of these curated efforts are expanding into material science [9] and benefits fields that require multiple disparate disciplines. Since AM is an interdisciplinary field, development of FAIRified data streams allows for easy data registration such as combining in situ measurements and ex situ characterization.

Laser-powder bed fusion (L-PBF), selective laser sintering (SLS), direct metal laser sintering (DMLS), and selective laser melting (SLM) are popular techniques for additive manufacturing of metal parts. In addition, there has been work on developing ontologies for L-PBF and other advanced manufacturing methods [10, 11]. L-PBF AM process is often costly and with data generated from various measurement modalities. Typical systems are equipped with multiple measuring modalities to understand the manufacturing behavior throughout every step of the part generation. These measurement tools are coupled with the machine providing in situ, or real-time, information during part generation and then ex situ characterization for assessing finished parts. In situ measurements commonly found in L-PBF are pyrometry [12] and high-speed camera videos that monitor [13] the melt pool and metal powder fusion behavior, while ex situ radiography characterizes the overall print porosity, completeness, and dimensions [14, 15]. However, each modality contains specific data formats and interpreting the information requires specific domain knowledge. Often these result in discussions between data practitioners and domain experts downstream, which is research waste.

While having a lot of data is key toward understanding the manufacturing process, it is not intuitive how disparate modalities can be registered together. Depending on the resolution and diversity of data collected, the resulting data set can become increasingly complex and massive. Currently, exploring workflows and frameworks to address this big data challenge are seldom investigated. Applying FAIR principles is an increasingly popular tool for easier interoperability and reusability of data sets and provides human- and machine-readable formatting. Frameworks utilizing FAIR principles will benefit additive manufacturing by allowing the full leveraging of AI and machine learning in large scale [16, 17].

In this study, we propose a framework for the FAIRification of L-PBF with in situ and ex situ characterization techniques. We obtained a rich data set from an L-PBF study containing both data and metadata from high-speed camera videos, pyrometer measurements, and radiography images. We highlight the need and importance of a comprehensive study protocol that follows a systematic and well-structured data registration and keeping adherence to FAIR principles [18]. FAIR principles were incorporated in our automated pipeline framework to both curate and format the data and metadata. While the scope of our work is limited toward L-PBF systems, our generalized framework is applicable across multiple systems.

Analytical framework

There is a need for standardizing research and fostering a culture of good practices when uncovering insights from big data science using a well-established study protocol, as shown in Fig. 1 [7, 19].

Fig. 1
figure 1

Study protocol pipeline for scientific investigations and process to develop the extraction process of translating features found in data into a combined study that can be further refined. The first step is to exhaust the historical data of a topic in a systematic review that feeds into the Data Engineering Model used to inform the software and statistics approaches needed to refine and analyze data so that it may constitute a true FAIRified study. This process allows for more robust data analysis tin order to allow for the community of material science to become more reliable

We focus on the application of a systematic review to establish the features of divergent data sources, then forming data models that find the relations between these data sources in order to achieve the final step of integration and FAIRification.

Systematic review

Systematic review in a study protocol includes domain expert guidance within a topical study. The concept of a study protocol is well-implemented in medical studies [20] with established guidelines such as systematic reviews and meta-analyses [21]. These guidelines for reporting systematic medical studies include full disclosure of goals, methods, and findings [22, 23]. More recently, new guidelines address the implementation of machine learning (ML) [24]. Protocols aid in organizing a research plan by communicating the critical questions to be addressed, tailoring a data collection plan, explaining how the data will be processed and then modeled in a comprehensive framework [22], Also, protocols communicate a well-defined research focus, especially where the problem being addressed is largely unstructured [25].

In AM, the first step involves querying in situ techniques from historical printing campaigns and related open-data repositories. These studies show what tools are available, what output is typical among measurement modalities, and what were common errors and hazards when using the technique. With historical errors identified for a measurement method, a data collection plan can be drafted. A data collection plan details what each device is able to record about the print along with the manner of output. Next, a data processing plan is required to determine how the researcher is going to approach extracting data and its insights. This allows for a comparison of the scientific record of similar experiments to the current output for comparing studies. Most importantly, having structured and transparent frameworks grants the ability for other researchers to properly reproduce and replicate past works [20, 26]. Poorly structured studies become prone to questionable validity, biased outcomes, and research that is unused [27]. Therefore, there is equal emphasis on the scientific findings and the workflow process that leads to those findings.

Data integration

After developing an initial framework from the systematic review, common methods can be compared. Each method measurement has a feature that some relationship to outlined part. The part has an exact coordinate related to original build schematic to position along a part. As shown in Fig. 2a, we focus on L-PBF use of in situ pyrometry measurements and high-speed camera video along with ex situ radiography. Both in situ methods have common coordinates based on the temporal relationship of occurring simultaneously as the part printed. For the registration of radiographic objects, the unique part of our build was identified first based on the length of active pyrometry signal to the presence of sample, as shown in Fig. 2b.

Fig. 2
figure 2

Generalized schematic of a AM L-PBF part from initial schematic transferred to a printer tool. As the print occurs, the building recipe has monitors that output a raw data format shown in (a). This is supplemented by a print assessment such as examining the length and completeness of the print or presence of known problematic features. Data registration is the joining of two data sets together based on a common set of variables between the two data sets, shown in (b)

Radiography is multiple X-ray projections from an X-ray generator such as a synchrotron. Synchrotrons have a variety of “recipes" that alters exposure depending on the materials composition or desire resolution of features. Automated image stitching is used to generate a single image from multiple high-resolution images, resulting in multiple partIDs on the same file. Post-processing is required for correcting raw outputs. Our radiography images were prepossessed to enhance contrast and reduce director noise. For better findability and interoperability, the image was separated into the unique partID to align all files based on the partID.

Based on historic image analysis methods, we can extract numeric descriptions about images spatiotemporal related to the desired Build plan. As a result, we achieve a compact and traceable process that takes raw output from a machine into an informative representation. As shown in Fig. 2b, we have the resulting width length of the final part that is spatially related to the pyrometry measurement position which is temporally related to the melt pool dimensions at the time.

Data analytics automation

Integrating multimodal data requires a user-defined and standardized framework where the integration process is monitored and follows data standards that enable it to be Findable, Accessible, Interoperable, and Reusable. The FAIR framework proposed herein facilitates the integration of data from in situ and ex situ methods in a L-PBF experiment by defining a schema where consistent naming convention and uniquely persistent identifiers are used to link subdomains. Figure 3 illustrates the steps involved in the fairification process.

Fig. 3
figure 3

FAIR implementation. User starts a new domain using the FAIRmaterials package to create the respective OWL and JSON-LD templates. To FAIRify their data, users input a .csv with metadata terms mapped to PMDCo, and .json and .owl files are returned

The first step of the proposed methodology is the schema mapping, consisting of a schematic representation of features and variables relevant to the domain. A comprehensive list of information related to the sample (partID), instrument (tool), recipe (measurement metadata), and results is stored. The terms in the schema are mapped to PMD Core Ontology (PMDCo [28]) terms in a spreadsheet template available in the package. PMDCo is a mid-level materials science ontology compliant with BFO [29], a top Level Ontology (TLO).

After assigning and mapping the domain terms to PMDCo, the Fairmaterials package [30] is used to generate the JavaScript Object Notation for Linked Data JSON-LD templates and fairified .json files, in addition to the domain-specific ontologies(.owl files).

The integration of domains (i.e., radiography and high-speed camera images) takes place by linking the subdomains by unique persistent identifiers that follow the orcid-userdefinedid-harsh structure. The first part of the IDs corresponds to the orcid of the user. userdefinedid is a convention adopted by the domain expert and harsh allows users to monitor file integrity. A unique ID associated with the operator’s orcid and process identifier ID is assigned to the printed part, in a similar process to the sample, tool, and recipe. The metadata of the part is stored in the orcid-part-harsh.json file that contains printing parameters metadata. The part is linked to the ex situ and in situ methods by chaining the part ID name to the tool/recipe/results metadata. A result from a part that was subjected to radiography and is output in the .tif format would be orcid-idresults-idrecipe-idtool-idpart-hash.tif and the meta data orcid-idresults-idrecipe-idtool-idpart-hash.json, as illustrated in Fig. 4.

Fig. 4
figure 4

Linking L-PBF with radiography and high-speed camera metadata. The printed part receives an id that is shared with the characterization methods and used to link to the tool, recipe, and results. This convention allows one to locate the instrument, metadata, and material used to produce a part as well as parameters related to the characterizations tool, recipe, and results details

Using this naming convention, the user can determine the tool, recipe, and material used. The partID must have a entry which links to each .json file. However, each recipe and tool id does not have an id part, the id part .json provides the linking node that may be filled by the interchanged tool and/or recipe version that is called by the partID. The JSON-LD provides the framework which each schematic links together.

A partID in L-PBF is the simple, unique, object designed from a computer model such as a .STL or parameter setup file.txt or .pxp from Fig. 2a. While the data amount of each point cumbersome, the object-relationship parting is not as data heavy. The .STL is a file that is related to the build object with each point sharing the relationship blueprint is related to object. Any change in the .STL requires new orcid-idtool-idrecipe.json. The object can be linked by a unique identification of the drafter sample with time of submitting to the tool. The unique object with the assigned [orcid] has starting properties such as assigned material, .STL for the sample, date, and time it was submitted to printing recorded in orcid-idmaterial-hash.

Tools are given a setup condition such as the operation range it is able to function under, the maximum and minimum operational allowance, the type of attachments present, and environment of operation. Recipes give instructions to the tool that inform how the printed process will occur under what parameters, what attachment used, what tool is being used, and other printer-user modified conditions applied recorded in orcid-idtool-idrecipe-idpart-hash. The resulting combined object of each set should have a standard output type output. This naming convention enables the user to not only easily locate the material, tool, and the recipe employed to produce a part, but also contributes to the generation of more efficient and reliable scripts and code automation.

Conclusion

The approach proposed herein allows the development of domain-specific schemas and JSON-LD templates which provides a baseline representation for a proper and efficient data management system. Furthermore, by having a proactive approach in design and a direction for instrumentation verification and a reproducible process on how and why the data are processed in a particular manner. With the series of the sample being recorded to the operator by the unique identifier of orcid along with the instrument process, the standard processing method, and the final result as it has been transformed allows for traceability. The framework proposed in this work consists on initially creating domain schema that are translated into a JSON-LD templates in a subsequent step containing the part, tool, recipe, and results specific information. This template can be used by individuals who makes the part using their orcid along with a part identification to ensure cross-sample identification with other experiments. Along with that, the tool being used also have the same connective information with the recipe as well as the results, that can be easily linked to the recipe, tool, and associated sample based on the file names. This results in four distinct JSON-LD templates that, using the FAIRmaterials package, converts csv-stored metadata into JSON-LD, providing machine readability and accessibility. The same concept is applied when linking different methods, where the part information stored in JSON-LD can be easily accessed and mapped to the characterization techniques (if applied). The framework proposed enables the development of a systematic, standard, and well-defined naming convention that allows FAIRified metadata JSON-LD to be easily findable and linked to other sources, which represents major advances toward efficiency and optimization of data analysis and modeling.