Introduction

Recent advancements in additive manufacturing (AM) have significantly broadened the scope of fabrication capabilities, enabling the creation of complex geometries and the use of diverse materials. Notably, laser powder bed fusion (L-PBF) produces parts through the deposition of metal powder feedstock onto a build plate, followed by selective melting and solidification using a laser [1, 2]. AM and L-PBF are challenging due to their wide range of materials and intricate geometries [3,4,5]. The interplay between material properties and process parameters necessitates robust monitoring and quality control mechanisms to ensure the reliability and integrity of printed components [6,7,8,9]. The growing adoption of AM across industries [10,11,12] highlights the critical need for advanced techniques to monitor the printing process and identify potential defects early on [13, 14].

One of the most prevalent issues in L-PBF is the formation of defects such as pores, which can compromise the mechanical properties of a build [2, 15, 16]. These defects are often correlated to melt pool characteristics, which are in turn influenced by process parameters such as laser power, speed, and hatching space [17]. For example, excessive laser power is applied can lead to keyhole defects [18, 19], characterized by deep penetrations in the melt pool [20] that introduce a morphology [21,22,23]. On the contrary, insufficient laser power may cause lack-of-fusion defects [24, 25], characterized by large, irregular voids within the printed part [26]. Literature has extensively examined process parameters such as print speed and laser power [21] to minimize the formation of pore defects [27,28,29,30].

To address these challenges, in situ monitoring techniques employing sensors such as high-speed cameras and pyrometers have been developed. These sensors enable the real-time observation of melt pool dynamics and thermal profiles [31,32,33], generating vast amounts of heterogeneous data that require sophisticated analysis methods [16]. However, the sheer volume and complexity of this data pose significant analytical challenges, necessitating advanced data processing and analytics solutions [34,35,36].

Extracting melt pool geometry from high-speed camera images in itself represents a significant challenge. Traditional feature extraction techniques, such as thresholding, edge detection, and region-based segmentation, have been employed to delineate the melt pool from the surrounding material [20, 37, 38]. Computer vision monitoring has been implemented as well to extract these features from high-speed camera images of single tracks [39,40,41]. Despite their utility, these classical image processing methods often struggle with the variability in melt pool appearance due to changes in process parameters, reflections, and spatter, requiring extensive tuning and manual intervention for each set of conditions.

Deep learning has recently enabled feature extraction to scale across large diverse data sets to enable novel insights recently [42, 43]. Convolutional Neural Networks (CNNs) in particular, offers a more robust alternative capable of capturing complex patterns in image data, including subtle variations in melt pool geometry [44, 45]. These models can learn to identify melt pools under a wide range of conditions. However, their effectiveness is contingent upon the availability of large, accurately annotated data sets. This dependency can be supported by a subfield in deep learning called weak supervision. Weak supervision leverages more accessible, albeit less precise, sources of information to train deep learning models. This approach can involve using simpler image processing techniques to generate approximate labels or reducing the dependency on extensive manual annotations.

Analysis of melt pool geometry alone, however, provides limited information [18, 23, 46]. By linking in situ monitoring data with ex-situ analysis, such as radiography, a comprehensive approach is offered to understanding and correlating observable features with the presence of defects. Radiography for example, provides granular details into qualities such as number of pores, size and pore shape [16, 23, 47] that are useful for understanding the part quality and characterization of its microstructure [48].

In this work, we develop an automated pipeline that leverages deep learning, classical image processing, and multi-modal analytics to extract melt pool and spatter visual features from over 700,000 high-speed optical monitoring camera frames, tracking 715 metal L-PBF prints. A U-Net [49] deep learning model is trained under a weakly supervised paradigm to perform feature extraction. Large-scale statistical characterization then uncovers influential trends in quantities like melt pool area and pyrometry signals that relate to a higher likelihood of flaws. The entire framework provides a workflow to connect sensor streams to data-driven part qualification insights without extensive human involvement. Through date-enabled characterization, the methodology demonstrates progress toward goals of real-time defect prediction, informed process control, and autonomous metal additive manufacturing.

Methods

Experimental Setup

An open architecture L-PBF Aconity3D (AconityUS, Inc.) system, described in previous work [22, 50], was used for the initial fabrication of samples. In situ measurements were recorded using a coaxially aligned in-line pyrometer and high-speed video camera; this setup enabled data acquisition to be temporally synchronized to the fabrication process. This setup is depicted in Fig. 1a.

Fig. 1
figure 1

Schematic of L-PBF experimental setup with co-axial in situ monitoring shown in (a) and X-ray radiography experiment shown in (b). Reprinted from [50] with permission. Final build plate consisting single and multi-tracks after L-PBF fabrication in (c)

Operational parameters were recorded, spanning a combination of laser power (from 50 to 375 W) and laser velocities (from 100 to 400 mm/s) with hatch spacing of 0.1 mm. High-speed camera videos were recorded on a 10-bit Micronton EoSens MC1362 with a capture rate of 1 kHz, with a 14 µm/pixel resolution. Pyrometry measurements were performed on a Kleiber KGA 740LO, Kleiber Infrared GmbH, tuned to capture infrared (IR) emission of wavelengths ranging from 1600–1800 nm with a 100 kHz capture rate.

The data set consists of 666 single-track printed uni-directionally along with 50 multi-track prints, as depicted in Fig. 1c. The prints systematically vary across different combinations of laser power and speed. These were a grid layout and order randomized to minimize residual thermal effects between adjacent prints. All samples were produced using a 316 L stainless steel powder feedstock with particle sizes ranging from 15 to 45 µm. Each sample used a beam second moment width of approximately 100 µm (D4\(\sigma \)).

The final build segmented for ex situ radiography is shown in Fig. 1b. Ex situ measurements through X-ray radiography imaging were completed at the Advanced Light Source at Lawrence Berkeley National Laboratory. Projections were taken with a polychromatic beam using a 5x lens, resulting in 1.3 µm/pixel resolution and 100 ms exposure time on three 0.1 \(\times \) 3 \(\times \) 7 cm3 plates, containing the laser printed tracks. Pores were manually selected using the open source software Fiji [51] Fig. 2.

Fig. 2
figure 2

High level overview of multi-modal data pipeline. High-speed camera features are extracted and integrated with radiography, pyrometry, and process parameters at the sample level

Data Science and Computing Infrastructure

Model training and data analysis were performed using the High Performance Computing (HPC) cluster at Case Western Reserve University which is described in in References [52, 53]. Distributed data storage and processing were done using the Common Research Analytics and Data Lifecycle Environment (CRADLE), described in References [54, 55]. Large-scale data processing [56] employed Hadoop, HBase, and Spark in CRADLE.

Data cleaning was performed using R(3.3.0+)/RStudio(4.2.2) [57, 58] and the Tidyverse [59] packages. Python (3.11.4) [60] was used for image sequence preprocessing, feature extraction, and model training. Image frames were extracted from high-speed camera videos with FFmpeg [61]. Image processing and data analysis were done with NumPy [62], scikit-image [63], and pandas [64]. Image segmentation implemented U-Net [49] from scratch using TensorFlow [65].

Image Processing Feature Extraction

An image processing pipeline was developed to extract the two features of interest from high-speed camera frames: melt pool and spatter. This is a simple variation of melt pool feature extraction compared to that of related literature; it serves as a rough baseline for the downstream analysis performed in the overall feature extraction pipeline. Since the laser was not powered on for the majority of the high-speed camera video, most image frames contain neither feature. In image frames where the laser is powered, the melt pool appears as the highest intensity region of the image typically in the shape of an ellipse [22]. Spatter only exists in a subset of image frames when the laser is powered on and appears as sharp, thin streaks.

The image processing workflow can be described as follows. First, a high-speed camera video is processed with each frame analyzed to detect the presence of melt pool. The frame is flagged as containing a melt pool if the number of unique pixel values passes a threshold (meaning the image is not entirely black). If a melt pool is present, a local contrast enhancement with a neighborhood size of 10 pixels is applied. Next, a local threshold with a neighborhood size of 20 pixels is applied. These two steps binarize the image into distinct components by setting pixels corresponding to a feature classed as 1 and background pixels to 0. The determination of neighborhood sizes, was achieved through rigorous parameter testing and validation against manually annotated images.

The process of identifying and quantifying features from the captured images employs the connected components algorithm, a graph-based technique designed for object extraction in binary images. This algorithm operates by identifying adjacent pixels of identical value and grouping them into distinct components, subsequently calculating relevant statistics for each identified group. Specifically, for melt pool and spatter analysis, key metrics such as the centroid, area, mean intensity, and the lengths of the major and minor axes are determined. The object with the highest intensity is designated as the melt pool, whereas objects with lower intensity are classified as spatter. This distinction is crucial, especially in scenarios where melt pools are small and spatter objects are comparably large, making size an unreliable differentiator. Instead, intensity serves as the primary criterion for classification. A sample pipeline workflow is summarized in Fig. 3.

This pipeline was performed over 715 high-speed camera videos where each video contained 1000 frames at a 256 x 256 pixel resolution. This image processing pipeline was performed over all extracted frames in parallel using the Simple Linux Utility for Resource Management (SLURM) [66, 67]. SLURM enabled the efficient deployment of the pipeline to quantify data from over half a million image frames across hundreds of HPC nodes..

Fig. 3
figure 3

Sample pipeline for high-speed camera frames using image processing to find and mask for features. Discovered features are classified as the melt pool or spatter. Feature statistics are gathered from pixel-wise measurements

Deep Learning Feature Extraction

A deep learning pipeline was also developed for feature extraction and quantification of melt pool and spatter. In order to train a deep learning model for segmentation, the model requires both the original image and a corresponding segmentation mask, where each pixel is categorized either as a specific feature of interest or as background. Conventionally, this segmentation process relies heavily on the expertise of a subject matter expert to manually annotate images, a method both time-consuming and prone to variability. Here, however, we leverage the outputs of our image processing pipeline as a form of pseudo-ground truth to train our deep learning model. This approach, where image processing-generated masks serve as initial training data, is a form of weak supervision. It significantly streamlines the training process by utilizing readily available, albeit imperfect, data as a substitute for manually annotated data sets.

The model architecture and training procedure are described as follows. U-Net [49], a CNN-based architecture, was selected as the segmentation model. A standard implementation of U-Net was constructed, with four encoder and four decoder blocks. The encoder blocks, designed for feature extraction, consist of convolutional layers followed by max pooling, whereas the decoder blocks, aimed at image reconstruction, include transposed convolutional layers for upsampling, along with concatenation and convolutional layers. Each convolutional block comprises two convolutional layers, each followed by batch normalization and ReLU activation. Convolutional layers employ L2 regularization to improve generalization and reduce overfitting. Layers in the encoder increase the number of filters from 64 to 1024, doubling each layer. The opposite occurs in the contracting path. The model used a single-channel input to handle gray-scale images and contained a total of 31,054,275 parameters.

The model was trained for 200 epochs with early stopping set to a patience of 10. Adam was used as the optimizer [68] with a learning rate of 0.001. Categorical focal cross-entropy was selected as the loss function, for multi-class semantic segmentation. We measured model performance using accuracy, precision, recall, and intersection-over-union (IoU).

A curated data set of 200 sample frames, specifically selected to represent instances when the laser was active was used to train the model. This data set was balanced to include frames featuring solely the melt pool and those capturing both the melt pool and spatter, ensuring the model’s unbiased learning towards the less frequently observed spatter features. This data was distributed over different energy density and parameters regimes to provide a robust representation of process parameters. The data set was divided using an 80/10/10 split for training, validation, and testing, respectively. The respective data sets were batched into sets of 8 images per batch.

Upon training completion, the best-performing model, as determined by validation metrics, was used to performance inference across the entirety of the high-speed camera footage. After generating predictions across all high-speed camera videos, statistics of features were quantified using the same connected components approach described in the image processing pipeline. In this case, however, the label of melt pool or spatter was automatically assigned by the predictive U-Net model.

Multi-modal Data Integration

Four characterization methods of samples are integrated to give a multi-modal, spatiotemporal representation of the data set. The four methods include: process parameters, in situ pyrometry data, high-speed camera features, and X-radiography features.

High-speed camera features after quantification were labeled using a track ID and frame number from the video. These were merged with pyrometry signal measurements through spatiotemporal components such as coordinate locations and time. A single high-speed camera frame is matched to 100 pyrometry readings due to the faster rate of measurement. Fiji [51] was used for manual pore assignment using radiography projections. The pore assignment was merged with the integrated high-speed camera and pyrometry data where a single track is labeled as true or false for the presence of pore.

In total, our multi-modal data set consists of over 700,000 frames annotated with segmentation labels, regional feature vectors, aligned pyrometry signals, and registered pore labels for investigating predictive relationships. Custom correlation and multivariate analysis is conducted using Python and R to relate modalities. All information was ingested into CRADLE’s Hadoop ecosystem for efficient querying and analysis using Spark.

The resulting labeled features from the three approaches were saved into a database with the corresponding frame and track number. This enabled the comparison to other measurements such as the pyrometry signal and pore count assignments. Each approach was then compared to examine the resulting features that were extracted. These features were compared using multivariate statistical analytics to uncover statistically significant patterns and to predict defect formation.

Results

Deep Learning Model Performance

The U-Net model demonstrated strong performance on the melt pool and spatter segmentation task. As illustrated in Fig. 4, the model converged after approximately 50 epochs. Training and validation curves are plotted for both the loss function as well as the IoU metric (calculated in a class-wise and mean manner). Table 1 summarizes the metrics from the highest performing epoch, based on validation metrics during training. The results indicate robust performance in terms of accuracy, recall, and precision across both the training and validation data sets.

IoU, however, was specifically selected for plotting given it provides a more robust metric for assessing model performance. In cases where the features of interest occupy a small region relative to the background, metrics such as accuracy, precision, and recall can be biased. Both Fig. 4 and Table 1 demonstrate the model’s proficiency in segmenting the melt pool region, with IoU scores of 0.99 on both the training and validation sets. This highlights the model’s ability to effectively generalize to new data. Performance for spatter segmentation was less successful, with a maximum IoU of 0.767 on the validation set, indicating a failure to fully converge on this feature. Justification for this discrepancy in performance across the two features is provided in subsequent sections.

Fig. 4
figure 4

U-Net training and validation loss curves. Intersection-over-union (IoU) curves class-wise and mean for melt pool and spatter

Table 1 U-Net performance metrics

Feature Extraction and Quantification

After performing inference on the full set of 715 high-speed camera videos using the trained U-Net model, measurable melt pool and spatter features were detected in 682 videos. In total, the model identified 497 individual melt pool instances and 682 distinct spatter occurrences. For each detected feature, characteristics such as area, axis lengths, perimeter, and intensity were quantified to enable detailed analysis [69, 70].

Area distributions of melt pool and spatter features are depicted in Fig. 5 with average values found in Fig. 2. The melt pool features exhibited a mean area of between 1636 to 4386 microns2 with a standard deviation of 1543–2528. Quantified metrics for the spatter features revealed greater variability, with a mean area of 342–470 microns and a standard deviation of 446–590. The increased variance indicates more diversity in the shape and size of spatter formations relative to melt pools.

Fig. 5
figure 5

Area distributions of melt pool (a) and spatter (b) features as provided by the image processing pipeline and deep learning inference. Image processing is depicted in blue and U-Net is depicted in yellow

Multi-modal Feature Analysis

After integrating multiple modalities including process parameters, pyrometry measurements, radiography, and high-speed camera features into a single data set, a comprehensive analysis was conducted to understand feature relationships. Figure 6 depicts how laser power (a), laser speed (b), and presence of defects from radiography (c) vary across the build plate by track.

Fig. 6
figure 6

Distribution of process parameters and defects across the build plate: a represents distribution of laser power b represents distribution of laser speed c denotes presence of pore within a track

A comparative summary from all three modalities is shown in Fig. 7. For each combination of laser power and speed parameter setting, the heat maps displays: number of defects observed in radiography, mean melt pool area from high-speed camera analysis, and average pyrometry measurement. Regions of the heat map are follow a similar trend previous work with multi-modal sensors [21, 50].

The mean (\(\bar{X}\)) and standard deviation (\(\sigma \)) of pyrometry for each parameter group were calculated as:

$$\begin{aligned} \bar{X}=\dfrac{\sum \limits _{i=1}^{N}{X_i}}{N}; \quad \sigma =\sqrt{\dfrac{\sum \limits _{i=1}^{N}{(X_i - \bar{X})^2}}{N}} \end{aligned}$$
(1)

where \(X_i\) is an individual data point and N is the sample size. A 99% confidence interval around the mean was defined as:

$$\begin{aligned} \text {CI} = \bar{X} \pm Z\dfrac{\sigma }{\sqrt{N}} \end{aligned}$$
(2)

where Z = 2.8.

Key relationships can be observed between modalities and across process parameters. For example, pyrometry values show an inverse trend with laser power, while melt pool size increases with power. Further analysis is warranted to determine optimal parameter settings which avoid defects while maintaining an ideal melt pool morphology.

Fig. 7
figure 7

Heat map comparing the standard image processing method. Frames able to be processed have corresponding diagnostic data from in-situ monitors and dimensions

Analysis of Defect Occurrence

To elucidate potential causes of defect formation, high-speed camera features were analyzed based on whether pores were observed via radiography. Table 2 summarizes mean values of pyrometry signals and melt pool/spatter areas segmented from camera images for cases with and without pores.

As shown in Fig. 8d, the distributions of melt pool area and pyrometry signal differ substantially depending on if a pore defect occurs in the same location. Specifically, tracks containing pores exhibit higher average pyrometry intensity and larger melt pool size. This indicates excess thermal energy and unstable melt pool dynamics may increase likelihood of pore formation.

Fig. 8
figure 8

Distribution of pyrometry and high-speed camera feature (melt pool and spatter) area values by the presence of pore. The dashed line represents the mean value of measurements. Average values are found in Table 2

Table 2 Summary of melt pool and spatter areas from deep learning predictions. Parentheses include the standard deviation of in situ average reading

Discussion

Performance of Image Processing and Deep Learning

Fig. 9
figure 9

High-speed camera feature extraction from three unique regions of the parameter space. Depiction of the original image and a comparison of features segmented by image processing and U-Net

Traditional image processing and weakly supervised deep learning were each applied for automated extraction of melt pools and spatter from high-speed camera images. Figure 9 depicts specific examples and captures some general trends that emerge in comparing these approaches. As shown in Fig. 7 the image processed data had similar melt pool areas, pyrometry signals and pore defect distribution compared to the U-Net model. While the data set is limited to high speed visual imaging, this application is limited due to limitations of the detector. Over-saturation of the detector may also reduce the reliability of this data set by limiting the resolution of the image for detecting features uniquely.

Comparing the two approaches reveals the following. The image processing methodology has the ability to detect extremely low contrast phenomena not detected by the deep learning approach. This can be attributed to the contrast enhancement which improves visibility of features. The flip-side of this coin, however, is that residual light from features is also increased which results in an over-estimation of large, visible features. These errors may be due to the relatively small training data set that was initially trained on. The deep learning model (U-Net) trained on these noisy labels tends to negate this effect and produce more precise segmentation of large visible features. However, this comes at the cost of missing more small, diffuse objects. Each approach as a result tends to perform better in different data regimes. Increasing the size and distribution of the training data set, can potentially enable the deep learning model to further refine noisy labels and predict across a more robust domain.

Weak Supervision for Manual Annotation Free Training

A key advantage of weak supervision is eliminating the requirement for manual annotation of training data sets. By employing image processing as a proxy for manual annotation, a much lower effort is required to generate labels for training. We demonstrate this feasibility using the most basic and simplistic case of image processing: contrast enhancements and thresholding. However, vast literature exists on more sophisticated computer vision techniques amenable to automated feature extraction. Methods like SIFT can better capture melt pool geometry and clustering algorithms such as K-means can segment features into multiple regions of intensity. Integrating more advanced methodologies like these for automated label generation could further improve model performance over the simple pipeline presented.

The weak supervision paradigm provides flexibility to leverage emerging techniques without costly manual annotation needing scarce domain experts. Any classical or contemporary computer vision approach with parameters generating segmentation masks or feature descriptors could slot into the framework as a labeling mechanism for deep learning. In the event that large labeled data sets become available, the deep learning models can be trained directly on expert ground truth to improve accuracy further.

Towards Automated Real-Time Monitoring

Analysis of 715,000 high-speed camera frames from additive manufacturing builds demonstrates the efficiency of the automated pipeline against manual examination. The trained U-Net model was benchmarked to perform inference on a single image in 0.00393 s or roughly 250 Hz. While a gap exists between the 1 kHz capture rate and 250 Hz inference rate, these metrics demonstrate strong use for offline assessment or a system with acceptable latency. Additional batch processing in the pipeline, using a more efficient model backbone, or model pruning can all be employed for closer to real-time analysis.

This throughput opens the door for real-time melt pool morphology and quality monitoring during builds. By analyzing imagery as it is captured and linking extracted visual metrics to process variables, the pipeline could enact closed-loop feedback control. For example, detecting an unusually large or energetic melt pool could trigger adaptive adjustments to laser power or scan speed to stabilize the process before defects emerge.

However, high-speed camera data alone provides an incomplete picture. Integrating additional in situ sensor streams like infrared cameras or pyrometers as well as post-process metrology from CT scans or microscopy will likely achieve a more holistic and robust monitoring. Sensor fusion combines perspectives to enrich the process signature for accurate quality assessment. Links between visual, thermal, and parameter data streams may provide the most discerning insights into build health. An automated framework encompassing this multi-modal pipeline, computer vision techniques, real-time analytics, and actuators for process variable adjustment could significantly reduce defects through closed-loop control.

Conclusion

In this work, we developed an automated pipeline for analyzing melt pool morphology and quality monitoring in metal additive manufacturing from high-speed camera footage. The pipeline integrates both classical image processing methods and deep neural networks for detecting key phenomena like melt pools and spatter.

We demonstrate its capabilities on a data set of over 700,000 high-speed camera frames. Our approach affirms the feasibility of training convolution neural networks like U-Net in a weakly-supervised fashion using only the output of threshold-based segmentation algorithms on enhanced images, rather than human-annotated labels. Results indicate that although imperfect, these automated labels can empower precise feature extraction. However, model refinements based on expert-annotated visual data or more complex image processing techniques could further improve accuracy.

Trends in melt pool morphology relative to process parameters and final part properties emerge from large-scale feature extraction. In particular, tracks containing pores exhibit larger melt pools compared to pore-free regions, with average areas of 4386 µm−2 vs 1636 µm−2 respectively. Links between visual, thermal, positional data, and defects highlight the importance of multi-modal analysis for understanding process outcomes. The methodologies presented demonstrate progress toward key goals of defect prediction, informed process adjustments, and autonomous production.

The fusion of computer vision techniques with sensor streams holds potential for closing the loop with adaptive process control. By combining feature extraction with sensor-driven analytics across data modalities, the reliability and efficiency of metal additive techniques stand to rapidly advance.