1 Introduction

Additive manufacturing (AM) is a process of joining materials to produce parts layer upon layer based on three-dimensional (3D) model data, contrary to subtractive and formative manufacturing [1]. AM has been categorized into seven processes: binder jetting, material jetting, powder bed fusion, sheet lamination, vat photopolymerization (VP), material extrusion (ME), and direct energy deposition (DED) [1]. Further, non-metal AM involves binder jetting, material jetting, VP, and ME; whereas metal AM includes powder bed fusion, sheet lamination, and DED [2]. Non-metal AM mostly involves using polymers as feedstock materials. VP can fabricate lightweighted and high energy absorpted parts using penetrated and interpenetrated cellular lattice structures [3]. ME is known as the highest-coefficient and widely-diffused process among non-metal AM owing to its low equipment cost and fast build-up rates, as it extrudes material in a semisolid state via a nozzle and solidifies the extruded material [4]. Meanwhile, metal AM has exhibited flexibility in the component geometries and designs of metallic parts because it can fuse and solidify metal alloy structures on a substrate owing to the supplement of high-density energy. In the metal AM, DED can be divided into powder bed, powder feed, and wire feed processes in terms of feedstock materials or into laser beam, electron beam and arc-based systems in terms of energy sources. The latter is known as wire arc additive manufacturing (WAAM), which is a process of melting a wire feedstock and depositing the part layer upon layer by using an arc-based system as a source of energy.

WAAM consists of a wire as the feeding stock, a welding arc as the energy source, and a robot arm as the deposition operator. WAAM can be categorized into gas metal arc welding, plasma arc welding and gas tungsten arc welding (GTAW) based on the heat sources [5]. WAAM possesses the benefits of a high deposition rate, near-net-shape fabrication, diversity in applicable wires, cost efficiency for large parts owing to its low-cost equipment installation, and less material waste owing to the low buy-to-fly ratio [6]. Considering these benefits, WAAM has been applied in the automotive, aerospace, and shipbuilding industries [7]. These benefits are similar to those of ME as mentioned above. However, their differences occur in terms of feedstock materials because WAAM treats metallic parts although ME mostly deals with non-metallic parts, e.g., acrylonitrile butadiene styrene and polylactic acid.

Known defects in WAAM include balling, porosity, deformation, oxidation, delamination, cracking, high residual stress, and low surface finish [8]. These defects result in low precision, poor surface quality, and deterioration of mechanical properties [9]. The layer-by-layer stacking mechanism causes poor dimensional accuracy and surface finish and thus leads to make a volumetric error between the designed and fabricated parts, which has limited widespread applications of additive manufacturing [10]. Moreover, WAAM can cause defects and undesirable features, e.g., heterogeneous microstructures, owing to non-equilibrium thermal cycles and induce process instability, i.e., balling formation and spatter, adversely affecting surface roughness and mechanical properties. Hence, methods to achieve process stability and high-quality parts are required in WAAM. These methods require a solid understanding of the underlying physics together with formulation of mathematical and statistical models. The WAAM community is actively seeking data-driven solutions for monitoring and detecting defects based on in-situ and real-time approaches [11]. Anomaly detection can be used as a primary solution for detecting abnormalities during the process. Anomaly detection uses sensor data to identify patterns that did not conform to a well-defined notion of normal behavior [12]. Anomalies incur defects and thus have to be automatically detected to assure the quality of products and reduce the cost of post-process treatment [13].

The sensor data are numerically analyzed and transformed into mathematical models for anomaly detection. For this, the design of experiments (DOE) can be used to effectively determine the numerical relationship between the input and output data with a small set of experiments [14]. However, the DOE is valid under restricted experimental conditions and is not appropriate for real-time monitoring and control. Recently, machine learning has gained increasing attention as a data-driven approach for overcoming these limitations. Machine learning derives mathematical models for making decisions based on training data acquired from experiments. Machine learning can generate machine-specific models for real-time monitoring and control in dynamic environments [15]. Machine learning can be classified into supervised, unsupervised, and reinforcement learning [16]. In supervised learning, human observations are used to identify input and output variables based on expertise and knowledge. Computers learn numerical causality between these two variables using a training dataset, wherein every input datum is labeled with a corresponding output value [17]. On the one hand, unsupervised learning makes inferences from unlabeled data by exploring hidden patterns or grouping similar data clusters from a dataset [18]. On the other hand, reinforcement learning is a semi-supervised model that interacts with its environment and learns to act optimally to gain the most significant reward [19].

Considering the advantages of machine learning, the AM community has been characterizing the linkage among process–structure–property–performance as the design rule [9]. However, it is challenging to establish this relationship because machine learning requires massive data to achieve reliable results. For example, the materials and process parameters significantly affect the corresponding microstructures and quality performance. Consequently, their combinations incur an exponentially increasing number of data samples. This effect is termed as the curse of dimensionality [20]. While considering a high-performance material (e.g., Inconel 625 or Ti–6Al–4V), which is expensive, obtaining sufficient datasets becomes challenging. Moreover, data missing and scarcity occasionally occur owing to the complexity of the process and dynamics of manufacturing environments [21]. Defects should be minimized and detected in-process to minimize post-process treatment and reduce product disposal. Therefore, a cost-effective method is required for real-time monitoring and control.

Transfer learning (TL) represents a key solution to this problem. TL is a learning approach that aims to extract knowledge from source domains or tasks to be used for a target domain or task [22]. In AM, TL can be used to extract original features from raw image or numerical data in a source material and then transfer and adjust the features to create anomaly detection models applicable to the target material. TL exhibits good accuracy especially when the features learned from a source material possess high transferability, which signifies whether features are correctly learned to obtain generic phenomena across domains. TL can be cost-effective when source materials and source data are inexpensive, the target material is expensive, and the target data are insufficient or expensive. Although this TL approach can facilitate the establishment of design rules, it has not been comprehensively investigated, and related knowledge is lacking.

This study proposes a TL-based material-adaptive anomaly detection method to use data inexpensively obtained using a GTAW-based process. The proposed method generates anomaly detection models for classifying balling defects as abnormal based on property-concatenated TL. The proposed method uses TL to derive CNN-based anomaly detection models by transferring models derived from a source material to a target material, where machine learning rarely achieves good accuracy owing to data scarcity. In addition, the proposed method applies the property concatenation to combine material properties as additional features onto image features, contrary to typical TL where CNNs use original image features extracted from source material data. This property concatenation aims to reflect the knowledge of WAAM, i.e., thermal properties of materials affect melting and solidification mechanisms during deposition, thereby improving accuracy. The proposed method uses CNNs to apply image-based learning for time-series classification by extracting the image features from voltage image snapshots, which are converted and segregated from a time-series profile of numerical voltage data. Experiments are performed in a gas tungsten arc welding system to demonstrate the feasibility and validity of the proposed method with three materials: low-carbon steel (LCS), stainless steel 316L (STS), and Inconel 625 (INC).

This study is organized as follows. Section 2 introduces related works; Sect. 3 explains the experiments; Sect. 4 proposes the method; Sect. 5 describes and discusses the validation of the proposed method; Sect. 6 concludes the study.

2 Related Works

2.1 Wire Arc Additive Manufacturing and Balling Defect

WAAM uses DED as an energy source to fabricate 3D metallic parts. The feedstock materials currently available in the welding industry are titanium, aluminum, steel, nickel, and inconel alloys. Figure 1 illustrates a schematic diagram of GTAW and its WAAM system. GTAW helps create weld beads on a base substrate using an arc generated from a non-consumable tungsten electrode. The arc generates a molten pool, solidifying the metallurgical bond between the feedstock and base substrate (or previously deposited layer). In GTAW, the primary process parameters are current, travel speed (TS), and wire feed rate (WFR). The current affects the intensity of the heat input, while the TS and WFR influence the dynamics of the bead formation. They are determined by operators and are mutually independent because their settings are separately applied to the control of associated devices, including a power source, torch, and wire feeder.

Fig. 1
figure 1

Schematic diagram of gas tungsten arc welding (left) and wire arc additive manufacturing system (right)

Figure 2 shows the balling phenomenon. Balling can be specified as an irregular bead surface contour comprising protrusions caused by the separation of spherical droplets [23]. The filler metal starts melting at the wire tip and forms a droplet at the center of the arc. Droplets are formed sequentially, and molten pools are shaped on the base substrate as the filler metal continues to move under the arc. The molten pool size and deposition area may decrease when the heat input per unit length of the drop is less than a certain threshold because of low current or high TS [24]. Molten pools are small in size, and a separated and spherical bead area is formed as they gradually move to the rear side. The repetitive occurrence of this phenomenon leads to balling defects.

Fig. 2
figure 2

Formation of the balling phenomenon (irregular bead shape and spherical droplets)

Figure 3 shows one-dimensional (1D) data of current and voltage and their corresponding bead shapes under the balling condition. The arc length is proportional to the voltage under a constant current, wherein the arc length indicates the distance between the electrode and bead. As shown in Fig. 3a and c, when the bead is formed as balling closer to the electrode, the arc length decreases, leading to a decrease in voltage. In contrast, a long arc leads to a higher voltage in the separated bead area between two humped beads, as shown in Fig. 3b and d. Hence, voltage profiles are decisive and should be analyzed to detect balling defects as they reflect bead formation.

Fig. 3
figure 3

Balling beads and a voltage profile: short arcs in (a) and (c) and long arcs in (b) and (d)

2.2 Anomaly Detection Using Machine Learning

Anomaly detection relies on sensor-data-driven algorithms as they can empirically reflect melt pool behaviors, kinematics, and thermodynamics. Sensor data include heat transfer tracking, surface optical or thermal imaging, melt pool imaging, and melt pool dynamics, providing key information concerning anomaly detection.

Accordingly, the relevant literature highly depends on machine learning, which can be further divided into real-time and non-real-time approaches. Non-real-time anomaly detection is used when an anomaly is detected in an ex-situ procedure. Artificial neural networks (ANNs) are commonly used in numerical prediction using sensor data. In contrast, convolutional neural networks (CNNs) are popular for two-dimensional (2D) data, including melt pool images, infrared images, and computed tomography scans. The excellent performance of CNNs was demonstrated as an image detector and classifier specific to anomaly detection problems in industries [9]. A typical CNN structure comprises convolution, pooling, and fully connected layers. The convolution layer includes a set of convolutional kernels for dividing an image into small slices to extract feature motifs and convolving with the image using weights, i.e., by multiplying image tensors with their corresponding slices. The pooling layer reduces input dimensionality and provides spatial invariance to the network. The fully connected layer considers inputs from the preceding layers and derives the outputs of all the layers [25]. CNNs can extract representative features without prior knowledge and reduce the training time by decreasing the weight dimension [26]. However, CNNs require a large amount of training data and heavy computation as the network becomes deeper [27].

Biranchi et al. (2015) suggested a machine learned approach that predicted compressive strength using multi-gene genetic programming and general regression neural network in fused deposition modeling [28]. Scime and Beuth (2018) developed a multi-scale CNN method for detecting diverse defects from image patches in laser powder bed fusion [17]. This method paved the way for in-process defect rectification when a feedback control system was implemented. Jin et al. (2019) derived a CNN encoder and decoder for detecting outliers based on a learned distribution of normal behaviors [12]. Mojahed Yazdi et al. (2020) proposed a deep-learning method for detecting porosity in the internal layers of a cylindrical part [29]. They merged a CNN with an ANN to extract features from image data to generate statistical features. Lyu and Manoochehri (2021) proposed a CNN model for extracting, analyzing, and classifying in-plane anomalies in fused filament fabrication [30]. These studies derived anomaly detection algorithms with over 90% accuracy; however, they could not reach to real-time decision-making for in-situ quality monitoring and control.

In recent times, real-time anomaly detection has become a subject of interest in the AM community for two reasons. First, real-time image acquisition has become more practical because low-cost and robust machine vision systems have become more available. Second, real-time data analysis has become more feasible owing to advances in computing power. Yan et al. (2022) proposed a decomposition-based method for real-time anomaly detection based on spatio-temporal data in laser powder bed fusion [31]. This method benefited from a layer-wise production paradigm for gathering information on process quality and stability in real-time. Lee et al. (2021) suggested a CNN-based method for detecting anomalies in WAAM based on real-time monitoring using high-dynamic-range (HDR) camera images [19]. Segura et al. (2021) proposed an online framework for detecting droplet anomalies from video images in inkjet printing [32]. Cho et al. (2022) implemented a MobileNet-based real-time anomaly detection system [33]. In this regard, the application of machine learning represents a promising solution for real-time anomaly detection and provides a basis for real-time quality control.

2.3 Anomaly Detection Using Transfer Learning

Typical machine learning requires massive training data to achieve acceptable performance. However, collecting sufficient data is significantly challenging as data acquisition is costly. Nevertheless, learning-driven modeling is vital, and TL is a means for overcoming this problem. The major terms in TL are defined as follows:

  • Task \({\mathcal{T}}\) denotes learning tasks, e.g., regression, prediction, clustering, and classification. \({\mathcal{T}}\) comprises a label space \({\mathcal{Y}}\) and predictive function \(f\left( \cdot \right)\).

  • Domain \({\mathcal{D}}\) denotes different feature spaces or marginal probability distributions caused by disparate contexts in which data are generated. \({\mathcal{D}}\) consists of a feature space \({\mathcal{X}}\) and probability function \(P\left( X \right)\).

  • Source denotes a task \({\mathcal{T}}_{S}\) or a knowledge supplier \({\mathcal{D}}_{S}\).

  • Target denotes a task of interest \({\mathcal{T}}_{T}\) or a preceding knowledge consumer \({\mathcal{D}}_{T}\).

  • Knowledge is a broad term that includes instances, features, parameters, relations, and models and thus acts as a transporter between the source and target.

Figure 4 presents the differences between machine learning, inductive TL and transductive TL. As shown in Fig. 4a, machine learning learns data to derive each model for a task. When \({\mathcal{T}}\) performs prediction, machine learning produces a model, i.e., \({\mathcal{Y}} = f\left( \cdot \right)\), from training labeled data samples that correlate inputs and outputs of predictions. When \({\mathcal{T}}\) performs another task, machine learning would produce a different model specific to the task from data samples. Figure 4b shows inductive TL, where a predictive model is induced in \({\mathcal{D}}_{T}\) using data in \({\mathcal{D}}_{S}\) and \({\mathcal{T}}_{S}\) when \({\mathcal{T}}_{S} \ne {\mathcal{T}}_{T}\), irrespective of the homogeneity between \({\mathcal{D}}_{s}\) and \({\mathcal{D}}_{T}\). As conditional probabilities \(P(Y|X)\) can be different across tasks, a few labeled data in \({\mathcal{D}}_{T}\) are required to adjust the transfer of conditional probabilities or the discriminative function from \({\mathcal{T}}_{S}\) to \({\mathcal{T}}_{T}\) [34]. Figure 4c shows transductive TL, where a predictive model is transduced in \({\mathcal{D}}_{T}\) when \({\mathcal{T}}_{S} = {\mathcal{T}}_{T}\); however, \({\mathcal{D}}_{s} \ne {\mathcal{D}}_{T}\). Transductive TL includes two cases: (1) the feature spaces between \({\mathcal{D}}_{S}\) and \({\mathcal{D}}_{T}\) are different, i.e., \({\mathcal{X}}_{S} \ne {\mathcal{X}}_{T}\), and (2) the feature spaces are the same, i.e., \({\mathcal{X}}_{S} = {\mathcal{X}}_{T}\); however, the marginal distributions of the input data are different, i.e., \(P\left( {X_{S} } \right) \ne P\left( {X_{T} } \right)\) [22]. The latter case is identified as domain adaptation, where a difference in the marginal probability distributions existed between the source and target data; thus, the knowledge of the source domains needs to be adapted to the target domain.

Fig. 4
figure 4

Machine learning and transfer learning

TL has been applied to manufacturing to create predictive models for fault diagnostics and anomaly detection [35]. Oquab et al. (2014) suggested a network for training labeled source data and transferring CNN internal layers to a target learner [36]. Shao et al. (2018) employed a deep TL method to diagnose motors, gearboxes, and shaft bearings [37]. Guo et al. (2018) suggested a deep convolutional transfer network for the fault diagnosis of bearings in different machines [38]. Sun et al. (2018) used a sparse autoencoder and deep TL technique to estimate the residual life of a cutting tool [39]. Ferguson et al. (2018) proposed a mask region-based CNN to identify casting defects from X-ray images and perform defect detection and segmentation [40]. Imoto et al. (2018) used a CNN to automate defect classification and the TL network to reduce the labeled data for classifying defects in semiconductor manufacturing [41]. Pan et al. (2019) applied TL to the fault diagnoses of high-voltage circuit breakers [42]. Zellinger et al. (2020) presented a multisource TL method for predicting errors using time-series data for tool settings incorporating domain knowledge [43]. Wang and Gao (2020) proposed a deep learning-based TL model for diagnosing faults in rolling bearings based on vibration analysis [44]. Gong et al. (2020) studied the same concept for detecting defects in aeronautic composite materials using the images of non-destructive X-ray tests [45]. Michau and Fink (2021) proposed an unsupervised TL framework to ensure the alignment of unit distributions for enforcing the conservation of the inherent variability of datasets [46]. Liu et al. (2021) suggested a deep TL approach to extract low-dimensional features for process recognition in milling [47]. Kim et al. (2022) proposed a multisource TL method for creating predictive models of machining power [48]. Marei et al. (2021) applied a TL-enabled CNN approach to estimate the health of cutting tools [49]. Liu et al. (2021) suggested a knowledge reuse strategy for training CNN models to improve defect inspection accuracy for injection molding [50].

Although TL is rarely used for anomaly detection in AM, its application is increasing [51]. Ho et al. (2021) proposed a TL-based method for predicting porosity in real-time using the thermal images of a melt pool [52]. Scime et al. (2020) presented CNN and TL-based models for the pixel-wise semantic segmentation of layer-wise powder-bed image data [13]. Zhu et al. (2023) developed a TL-based method with applying a parameteric and self-supervised object detection model to detect surface morphology in DED [53].

The significance of the present work originates from the need to generate and apply alternative but desirable anomaly detection models in materials where data are insufficient. When anomaly detection models are required for high-cost materials, collecting sufficient data is more challenging. TL is an efficient solution because it adopts a well-trained network even using insufficient data and employs the network across multiple domains. In other words, a low-cost material’s (e.g., steel) model can be used to create a high-cost material’s (e.g., Ti–6Al–4V) model for anomaly detection. TL enables the network to be trained with a dataset collected from a low-cost material, i.e., source material, the features of which are extracted and stored in the hidden layers inside the network. These features are adjusted with a high-cost material, i.e., the target material, particularly when the two materials are not distinct in physical and thermal aspects.

3 Experiments

3.1 Experimental Setup

The experiments were conducted using a GTAW-based WAAM system, as shown in Fig. 5. Table 1 lists the experimental setup details. The robot arm was moved to the coordinates designated by the controller. As the tungsten inert gas (TIG) torch was attached to the hand of the robot arm and supplied with energy from a source, it deposited a feeding material provided by the wire feeder to generate weld beads on the substrate. The TS, WFR, and current were process parameters determined for the controller input, wire feeder, and energy source, respectively. The current and voltage sensors measured the numerical values of arc characteristics in real-time. The data interface monitored and acquired the arc current and voltage data generated by the sensor. An HDR camera was attached to the torch to capture weld pool and bead images along with the movement of the torch. This camera was optimized for arc welding with a dynamic range of 140 dB to capture high-quality video frames. Standard camera systems are inapplicable owing to their low dynamic ranges and lightning interferences in arc welding. The camera data interface recorded the images and converted them into.jpg file formats.

Fig. 5
figure 5

Experimental environment, including wire feeder, shielding gas, TIG power source, TIG torch, robot, and HDR camera

Table 1 Experimental setup details

The experiments were designed based on changes in two process parameters: a WFR of 70–300 cm per minute (cpm) and TS of 10–100 cpm with increments of 25 and 10, respectively. The current was maintained at 200 amperes (A). Each pair of parameters generated 100 unique trials per material, as shown in Table 2. Accordingly, 300 trials were executed for three materials: LCS, STS, and INC. Figure 6 shows the samples of bead depositions on a single layer.

Table 2 Process parameters and bead numbers considered for the experiments
Fig. 6
figure 6

Bead depositions for experiments

3.2 Data Acquisition

Three types of data were acquired: (1) bead shapes, (2) numerical voltage data on timestamps, and (3) camera image data on single-layer deposition. The voltage data were measured at a time rate of 1 kHz and stored as.txt files. They formed a profile, as illustrated in Fig. 3. The camera image data were captured at 50 frames per second (fps) and stored as.mp4 files. Each file was partitioned into individual.jpg images at 50 fps. The camera frame image comprised three regions of interest: the metal transfer, arc shape, and weld pool, as shown in Fig. 7. To avoid confusion, the camera image data and voltage image data were defined as follows:

  • Camera image data: Melt pool image files (.jpg) obtained from each video file as captured by the HDR camera (Fig. 7).

  • Voltage image data: Image files (.jpg) converted and captured based on the time-series numerical voltage data (.txt) as the input data of the models (Fig. 12).

Fig. 7
figure 7

Camera image data, including metal transfer, arc shape, and weld pool

4 Method

This study aims to (1) develop a TL-based method for deriving anomaly detection models learned from a single source material and (2) apply the models to detect anomalies in a target material. In this method, the material properties are concatenated between the source and target materials as manufacturing knowledge features in the model to compensate for discrepancies in the melting mechanism. Figure 8 illustrates the overall prodecure of the proposed method. Section 3.1 explains the data preprocessing procedures, and Sect. 3.2 introduces the modeling method.

Fig. 8
figure 8

Procedure of material-concatenated transfer learning

4.1 Data Preprocessing

Data preprocessing transforms raw data into high-quality training and testing data. Feature extraction and pattern discovery during learning become complicated when training data include sparse, imprecise, qualitative, faulty, or missing samples [54]. Hence, data preprocessing is essential because model performance mainly relies on the quality and quantity of the training data. Figure 9 shows the data preprocessing procedure, and the subsections hereafter explain the technical details of each step. The input data comprises the deposited bead shapes, numerical voltage data, and camera image data.

Fig. 9
figure 9

Data preprocessing procedure

4.1.1 Bead Classification and Balancing

Bead classification is necessary to distinguish regular data patterns. Beads classifiable as normal and abnormal are considered, while unclassifiable beads due to their shape change or data irregularity are excluded.

First, each bead was classified and labeled as normal, abnormal, or unclassified based on the judgments of two experts with unaided eyes. Figure 10 shows the voltage data profiles of the corresponding bead shapes. As shown in Fig. 10a and b, the two beads are classified as normal because they are well-formed with stable and smooth data patterns. As shown in Fig. 10c and d, the beads are classified as abnormal because they contain balling defects along the trajectories. As shown in Fig. 10e and f, the beads are classified as unclassified owing to their transition from normal to abnormal states and vice versa. Theoretically, a bead trajectory should maintain consistency in shape without a state transition because the process parameters do not change during deposition. However, state transitions can occur in practice owing to external and uncontrolled factors, including deposition instabilities, feeding material irregularities, and unknown reasons. Unintentional state transitions can induce vagueness in pattern separation, thus making it desirable to exclude beads containing such state transitions, as illustrated in Fig. 10e and f.

Fig. 10
figure 10

Patterns of voltage data profiles

Second, beads associated with stable and monotonic voltage profiles were considered among all the beads classified as normal. The voltage profile in Fig. 10a appears to be more stable than that shown in Fig. 10b, although both are normal, implying that even the same normal beads can possess different voltage patterns because the melting mechanism influences voltage values, as explained in Sect. 2.1. Even in the same normal state, such unstable and fluctuating patterns can decrease accuracy. For instance, the bead shown in Fig. 10a was considered in the classification process, whereas the bead shown in Fig. 10b was excluded.

It is essential to resolve the class imbalance problem while classifying beads. The class imbalance problem is referred as a dataset with a skewed ratio of majority to minority samples, as it frequently occurred in a data-scarce and normal-biased environment [27]. This problem should be resolved because it can cause overfitting, resulting in a small learning error during training but a high prediction error during testing.

The class imbalance problem was also observed in this study because more normal data were generated than abnormal data. It is unable to learn data patterns correctly for the abnormal state if the training data are extremely biased toward the normal state. Accordingly, balancing the numbers of normal and abnormal samples is a solution to this problem. Oversampling and undersampling can be used to address this issue. The former increases the size of the minority class to balance the majority class, whereas the latter reduces the size of the majority class to balance the minority class [55]. In this study, undersampling was used to resolve the class imbalance problem. In undersampling, the amount of normal data is reduced by excluding normal beads to achieve a desirable balance with the amount of abnormal data. This procedure was performed by considering the balance of the total time consumed to fabricate the beads. Table 3 lists the bead numbers selected and classified as normal or abnormal. Each value in brackets represents the fabrication time (s). The type indicates whether a bead is used for a training or testing dataset.

Table 3 Selected beads and their corresponding fabrication times

4.1.2 Camera Image Data Labeling

Camera image data labeling was performed by tagging each camera image frame with a classifier consisting of normal, abnormal, or unclassified. This labeling was intended to assign the same classifier to voltage image data with the corresponding camera image data along with timestamps. The two experts also analyzed the individual frames with unaided eyes and manually labeled each classifier.

This labeling was performed easily because the bead classification was already completed, as described in Sect. 4.1.1. The camera image data belonging to a normal or abnormal bead were labeled as normal or abnormal, respectively. However, some camera image data were labeled as unclassified when they belonged to the starting and ending spots on the bead trajectory because TS was zero at both ends, where the torch did not move for a short time. Figure 11 shows camera image data labeling at bead No. 25 in INC. The camera image data at spots (a) and (c) are labeled unclassified. Meanwhile, the camera image data for the ordinary period (b) are labeled normal because the bead was classified as normal. If a bead was classified as abnormal, the camera image data would be abnormal during the ordinary period.

Fig. 11
figure 11

Camera image data labeling, including a labeled with ‘unclassified’, b labled with ‘normal’, and c labled with ‘unclassified’

4.1.3 Voltage Image Data Conversion

The time-series voltage data were converted into voltage image data in the time domain to apply image-based learning for time-series classification. The time-series data are originally analyzed using three approaches: (1) The model-based method generated an underlying model using Markov and statistical models, (2) The distance-based method measured the similarity between two sets of time series using a distance function, and (3) The feature-based method used Fourier and discrete wavelet transforms to transform the time series into a set of representing features [56]. However, these approaches have drawbacks in practice because (1) the time-series data must satisfy the stationary assumption, (2) the length of the two sets of time series must be equal to high sensitivity, and (3) feature selection cannot be easily performed without discretization owing to information loss. These drawbacks can adversely impact data quality. In this context, applying image-based learning to time-series data can become viable for time-series classification [57]. This enables to extract features automatically without prior knowledge and handle noisy data properly by discarding them at each subsequent layer [58].

For image conversion, the size and frequency of each voltage image were required to be determined to segregate the voltage data profile into a series of image snapshots. The bandwidth was used for the image size, and the interval was employed for the image frequency. These are defined as follows:

  • Bandwidth is the duration from the earliest to the latest time point on an image snapshot. Bandwidth determines the number of data points in the snapshot of a partial voltage profile.

  • Interval is the time gap between the current and next image snapshots. The interval determines the number of snapshots to be converted from the voltage profile.

Equations (1) and (2) express the k-th set of time points (Mw,i,k) and their corresponding voltage values (Vw,i,k). Figure 12 shows the concepts of bandwidths and intervals, where w = 3, i = 1, and k = 1, 2, or 3. V3,1,1 includes a set of voltage values, i.e., 3000, from the first (t = 1/1000 s) to the last (t = 3000/1000 s) time points. V3,1,2 includes another set of voltage values (3000) from the first (t = 1001/1000) to the last (t = 4000/1000) time points. In other words, each snapshot possesses a 3 s bandwidth; accordingly, each snapshot includes 3000 voltage values because the measurement cycle is set to 1 kHz. The snapshots were periodically generated at intervals of 1 s, along with the voltage profile. Thus, the voltage image data were generated based on the designated bandwidth and intervals.

$$M_{w,i,k} = \{x + ji\left( {k - 1} \right){|} x \in {\mathbb{N}},1 \le x \le jw\} ,$$
(1)

where w denotes the bandwidth, i denotes the interval, x denotes the time point, j denotes the measurement cycle (j = 1,000 as a constant), and k denotes the kth snapshot.

$$V_{w,i,k} = \left\{ {v_{m} {|}m \in M_{w,i,k} } \right\},$$
(2)

where vm denotes the voltage value observed at time (t) = m/j.

Fig. 12
figure 12

Bandwidth and interval

The bandwidth and interval need to be rationally determined because they directly affect accuracy and training time during model training. If the interval is extremely short, the snapshots can substantially overlap at the current and following time points, thereby sharing a large number of the same vm. Data overlapping may disturb feature extraction during training owing to feature similarity. In contrast, features may be sparsely extracted if the interval is extremely long, thus deteriorating the accuracy. Similarly, a large bandwidth can positively affect the accuracy by providing sufficient data samples; however, it may adversely impact the accuracy owing to feature similarity caused by data overlapping. Thus, the relationship between the bandwidth and interval was analyzed using accuracy and training time because they had to be decided heuristically. In this study, the bandwidth was determined as 2 (w = 2), and the interval was determined as 0.1 (i = 0.1), because these two values exhibit the best accuracy in the sensitivity analysis.

Then, the height of each snapshot was adjusted to maintain the same image size, as the same-size images need to be used to correctly extract representative features. The image sizes varied depending on the minimum and maximum values of vm because they ranged from 0 to 21 V. Therefore, 224 × 224 pixels were preserved in every image owing to automatic adjustment, and the image height was adaptively set using vrange (= vmax–vmin). Figure 13 shows the mechanism of automatic adjustment. The first and second snapshots exhibited larger vrange than the third snapshot because the two formers demonstrated higher voltage values than those in the latter. If vrange,k = vrange,k-1, the image features cannot be well extracted during the period with slight fluctuations because the large value of vrange,k-1 dominates the determination of the horizontal pixel size. As shown in the first two images, the image heights are automatically adjusted with a slight difference, whereas vrange,k was adjusted considering a large extension to focus on the slightly fluctuating period in the third image.

Fig. 13
figure 13

Automatic adjustment (at bandwidth = 2 s and interval = 0.1 s)

4.1.4 Voltage Image Data Labeling

The individual voltage image data were labeled normal, abnormal, and unclassified. All the voltage image data were synchronized with their camera image data with regard to the timestamps. This time synchronization enabled easy labeling using the camera image data labels as a reference. As shown in Fig. 14, the images in the ordinary period (b) were labeled normal when their corresponding frames were labeled normal. The voltage image data were labeled unclassified when they belonged to the starting (a) and ending (c) spots. The unclassified voltage image data were excluded from the training and testing datasets. If camera image data were labeled abnormal, their corresponding voltage image data would be labeled abnormal. Thus, voltage image data classified as either normal or abnormal were obtained.

Fig. 14
figure 14

Image labeling procedure, including a frames labled with ‘unclassified’, b frames labled with ‘normal’ or ‘abnormal’, c frames labled with ‘unclassified’

4.1.5 Dataset Preparation

The training and testing datasets were prepared using voltage image data labeled as normal or abnormal. Beads were randomly selected to construct testing datasets for each material. The classifiers labeled in the testing voltage image data were used only for validation purposes. Table 4 lists the data samples for the training and testing datasets for the three materials. A common 70:30 ratio was used to segregate each training dataset into a training or validation dataset. The training datasets were used for model training, whereas the validation datasets were used for measuring learning errors, which indicated the accuracy of the models during training. The testing datasets were used to measure prediction errors, which represented the accuracy of the learned models in predicting from future data.

Table 4 Numbers of data samples

4.2 Modeling

A TL-based method was developed to derive anomaly detection models trained in the source material and apply them to the target material. Figure 15 illustrates the model architecture structurally separated into the source and target domains. In this study, CNNs are used as image feature extractors in the source and target domains. It is because CNNs can identify and extract 2D image features accurately from the voltage image data that were formed into line-typed and waveformed signals. Additionally, CNNs allow to add different-types of features on their original image features and thus provide flexibility to concatenate material properties. Furthermore, our prior studies, including Cho et al. (2022) [33] and Kim et al. (2023) [59], demonstrated the performance advantage of CNNs in anomaly detection problems, compared with You Only Look Once (YOLO), which is also known as a good object identifier and classifier for real-time applications.

Fig. 15
figure 15

Model architecture

The proposed method belongs to transductive TL because the feature spaces are the same; however, the marginal distributions of the input data are different. Typical transductive TL would perform domain adaptation by transferring the extracted image features to the target material domain. In contrast, the proposed method performed domain adaptation by transferring the extracted image features and concatenating material properties (particularly thermal properties) as manufacturing knowledge features to accommodate melting characteristics. The features extracted from the voltage image data and the features concatenated from thermal properties were transferred to the target material domain. In the target domain, feature extraction and property concatenation were performed using the data involved in the target material to learn and extract image features and concatenate thermal properties, similar to those in the source domain. Anomaly detection models were developed, and classification was performed to classify each image as normal or abnormal for use in the target material. In particular, fine-tuning was designed to calibrate weights suitable for the target material to enhance accuracy.

4.2.1 Feature Extraction

Features were extracted from the voltage image data using CNN techniques. A good feature extractor must be selected among various CNN techniques to precisely extract representative image features. A preliminary investigation was performed to select the best feature extractor among four candidates: DenseNet169, InceptionV3, ResNet101, and Xception. These extractors were implemented using the Keras TensorFlow library in Python. TensorFlow is an open-source framework for machine learning and the Keras library is an open-source machine learning library providing neural network application programming interfaces. The hyperparameters were Adam as an optimizer, categorical cross-entropy as a loss function, 30 epochs, and a batch size of 16.

The best feature extractor was selected by evaluating accuracies of the models derived by TL without material property concatenation (typical TL), and TL with material property concatenation (the proposed method). The training and testing datasets listed in Table 4 were identically used for this feature extractor selection. Models were generated for LCS (source material) using its full set of training (2609) and validation (1119) data samples. These models were applied for the typical or proposed TL and then evaluated for INC (target material) using its full set of testing (3120) data samples. Figure 16 shows the accuracy (%) of the two models. The DenseNet169 model was selected as the feature extractor because it exhibits the best accuracy (84.28% and 89.38%) for both cases.

Fig. 16
figure 16

Preliminary comparison of accuracies

Figure 17 illustrates the extracted image feature maps and layers. DenseNet automatically extracts representative features by passing image data through convolutional layers [60]. As it connects all layers in a feed-forward manner, the feature maps of the preceding layers act as inputs to the subsequent layer. The dense block concatenates the features of the preceding layers instead of adding them, thus differentiating between the information added to the network and the information preserved [25]. The transition layer consists of a batch normalization layer, rectified linear units (ReLU), i.e.,a 1 × 1 convolutional layer, and a 2 × 2 average pooling layer, and performs down-sampling to change the sizes of feature maps [61]. The convolutional layer comprises convolutional kernels, where each neuron acts as a kernel. These kernels divide an image into small slices to extract feature motifs, known as receptive fields, and convolve with the image using weights by multiplying the image tensors with the elements of the receptive field [25]. The receptive field captures more global cues than local cues as it increases along the feature hierarchy [62]. As shown in Fig. 17, the feature maps become pixel-wise, abstract, and complicated while they are evolutionarily trained to capture global cues throughout layers.

Fig. 17
figure 17

Feature maps in DenseNet169

4.2.2 Material Property Concatenation

Typical TL is limited to explore the distributional difference between the source and target domains, particularly when image features in two different domains exhibit slight deviations within consistent distributions, as shown in Fig. 18a. This phenomenon is called a distributional equality problem. This problem frequently occurs in machine learning and needs to be resolved because it could deteriorate the model accuracy. In this study, this problem was addressed using material property features. Figure 18b shows that the material property features were concatenated with the image features. This material concatenation aims to make a small deviation to a large deviation in the image feature distributions between the source and target domains. The domain adaptation originates from expert knowledge, which is related to the influence of material properties on melting and hardening characteristics in WAAM. This knowledge in that certain materials tends to be vulnerable to specific defects owing to their distinctive thermal deformations [8].

Fig. 18
figure 18

Classification dimensions

Each material exhibits unique physical, chemical, thermal, mechanical, and electrical properties. Among them, thermal properties affect melting and hardening characteristics in WAAM. Hence, thermal conductivity, melting point, and specific heat capacity were identified as material property features in this study. This identification comes from that they are primary thermal properties. Table 5 lists the thermal property values of LCS, STS, and INC; these values were obtained from a material database provided by MatWeb [63]. As these values possessed different units and ranges, normalization was performed to scale and rearrange the original property values into specific values. Min–max (0–1) normalization was applied while concatenating the material property features into modeling.

Table 5 Thermal properties of feeding materials

4.2.3 Classification

Classification involved labeling each image as normal or abnormal in the target domain. As shown in Fig. 19, the fully connected layer globally analyzes the outputs of all the preceding layers subordinated in the modeling stage. This layer classifies each image by constructing a nonlinear combination of the selected features and uses common classifiers in machine learning, e.g., support vector machines, softmax, and ANNs. Classification necessitates fine-tuning, which represents the training of new data based on a set of pre-trained weights [64]. Fine-tuning is essential for performance improvement in TL because it applies pre-trained models acquired from the source domain to the target domain.

Fig. 19
figure 19

Classification and fine-tuning procedure

TL uses a CNN-based model pre-trained on image data. The proposed method was built upon a CNN model pre-trained from voltage image data and property concatenation. This method considers all image and material-property features as a whole set of features at the front layers in a CNN model. However, it trains only in the last layer, using the data from the target domain. This method enables feature extraction from a large amount of voltage image data in the source domain; however, fine-tuning uses a small amount of voltage image data in the target domain. Thus, classification facilitates labeling an image using a normal or abnormal classifier.

In Fig. 19, fine-tuning preserved the weights of the pre-trained CNN model in some layers and tuned them in others. The front layers are frozen to preserve their weights, as features are obtained from these layers. In contrast, the last layer becomes an unfrozen layer to revise the weights to accommodate features specific to the target data [65]. The preceding layer constitutes 1664 output nodes from the global average pooling (GAP) for image feature extraction and three output nodes from material property concatenation. The unfrozen layer uses 1667 nodes connected to the frozen layer as input. This layer uses the softmax activation function and comprises two output nodes for classifying a data sample as normal or abnormal. This binary classification is decided based on a higher probability of normal or abnormal occurrence, wherein the sum of both probabilities equals 1.

5 Validation and Discussion

5.1 Validation

The proposed method was validated in terms of accuracy using a testing dataset. The computing environment included an Intel Core i7-10875H CPU, NVIDIA GeForce RTX 2070 GPU, 32 GB RAM, and Windows 10 64-bit operating system. The learned models comprised four models: reference learning model (RL), standard learning model (SL), transfer learning model (TL), and material-concatenated and transfer learning model (mc-TL).

  • RL is a machine learning model trained and tested on the source material. This model refers to the effective extraction of image features from the source domain.

  • SL is a machine learning model trained and tested on the target material. This model can be used as a reference to compare the performances of machine learning and TL.

  • TL is a transfer learning model, where a machine learning model is trained on the source material and transferred to the target material without material property concatenation. This model represents typical TL.

  • mc-TL is a transfer learning model, where a machine learning model is trained on the source material and transferred to the target material with material property concatenation. This model signifies the proposed method.

This study derived the four models (RL, SL, TL, and mc-TL) separately in three cases: A. LCS (source material) to STS (target material); B. LCS to INC; C. STS to INC. In each case, the RL and SL were derived from the source and target materials, respectively. In addition, the TL and mc-TL were derived to transfer models from the source material to the target material. For example, in case A, the RL was derived from LCS, and the SL was derived from STS, whereas the TL and mc-TL were transferred from LCS to STS. The accuracy was measured using Eq. (3).

$$Accuracy\;(\% ) = 100 \times \frac{True\;Positive + True\;Negative}{{the\;number\;of\;total\;data\;samples}}$$
(3)
  1. A.

    LCS to STS: Fig. 20 presents the accuracy with respect to the increase in the number of training data samples. RL shows an accuracy of 95.24%, signifying that feature extraction is correctly performed in the source material and image features can be transferred to the target material. In SL, the accuracy scores under 66.48% for up to 15% of the whole set of training data samples but over 88.46% starting from 20% of the samples. Both accuracies increase in TL and mc-TL as the number of data samples increases. TL shows under 60.02% up to 5% of the samples and then sustains 77.22–81.28% when ≥ 5% of the samples were used. In mc-TL, the accuracy scores 61.53% and 63.74% when 1% and 3% of the samples were used, respectively. mc-TL achieved accuracies of 81.60–84.47% when ≥ 5% of the samples were used. Moreover, TL and mc-TL achieved good performance with 5% of the samples. In particular, mc-TL is more accurate than TL for all the cases. The two models outperform SL for small portions of data samples by up to 15%; however, SL has demonstrated higher accuracy for 20% or more of the samples used.

  2. B.

    LCS to INC: Fig. 21 shows the accuracy results. The results show a similar pattern to those of Case A. In SL, the accuracy remains under 68.29% until 15% of the samples are used, but exceeds 84.95% when ≥ 20% of the samples are used. In TL, the accuracy is 72.14% for 1% of the samples and increases to 86.37% for 3% of the samples. However, it remains at 83.60–85.24% starting from 5% of the samples, regardless of further increases in the number of data samples. In mc-TL, the accuracy is 81.92% for 1% of the samples, but mc-TL exhibits stable accuracies of 86.74–89.47% starting from 3% of the samples. Similar to Case A, mc-TL achieves a higher accuracy than TL in all the cases.

  3. C.

    STS to INC: Fig. 22 shows the accuracy results. RL achieves an accuracy of 95.53%, and the image features are applicable for transfer from the source to the target domain. In SL, the accuracy tends to increase, similar to Cases A and B; however, it does not exceed 90%, considering the maximum number of data samples. In TL, the accuracy starts increasing from 76.38% when 1% of the samples are used; however, it remains at 73.03–77.11% with all of the data samples. mc-TL achieves 76.99–78.94% when up to 10% of the samples are used. mc-TL exceeds the accuracy of 81.33% when 15% of the samples are used, whereas TL does not achieve over 80%. The accuracy of mc-TL outperforms that of TL and exhibits a slightly increasing pattern with regard to an increase in the number of data samples.

Fig. 20
figure 20

Accuracy in LCS to STS

Fig. 21
figure 21

Accuracy in LCS to INC

Fig. 22
figure 22

Accuracy in STS to INC

5.2 Discussion

5.2.1 Classification Evaluation

The proposed method was further compared validated using additional metrics to investigate classification performances in different viewpoints. Precision, recall, and F1-score were used as the additional metrics. Equations (4), (5), and (6) express the formula, respectively. Precision is a metric to represent how many positive predictions are correctly made over the total positive predictions, and recall is a metric to indicate how many positive predictions are correctly made over the total positive actual samples. F1-score is a metric calculated by the harmonic mean of precision and recall to weight the two metrics in a balanced way because precision and recall are in a trade-off relationship.

$$Precision\;(\% ) = 100 \times \frac{True\;Positive}{{True\;Positive + False\;Positive}}$$
(4)
$$Recall\;(\% ) = 100 \times \frac{True\;Positive}{{True\;Positive + False\;Negative}}$$
(5)
$$F1{\text{-}}score = 2 \times \frac{precision \times recall}{{precision + recall}}$$
(6)

Table 6 lists the classification evaluation results in the three cases (A. LCS to STS, B. LCS to INC, and C. STS to INC). Both TL and mc-TL produce stable and desirable precision values when ≤ 15% of the samples are used, compared with those of SL. However, they do not make precision improvement after the data portion becomes over 20%. In mc-TL, recall values are higher than accuracy values in general. This result implies that mc-TL can detect and identify true positive data samples nicely by reducing the probability of misclassifying positive samples as negative ones. In mc-TL, F1-score records more than 0.7 values except 1%, 3%, and 20% of the samples in case A. Note that F1-score of 0.7 is generally regarded as acceptable although this is not the absolute threshold.

Table 6 Classification evaluation results

5.2.2 Model Transferability

As shown in Figs. 20, 21 and 22, SL model exhibits an s-curve pattern as the number of data samples increases. This model appears to be superior compared with the TL and mc-TL models and reaches up to around 90% accuracy when the portion of data samples exceeds 20%. This pattern implies that machine learning properly trains data and becomes more accurate as the amount of data increases. However, it reversely means that machine learning is viable only when sufficient data are available, particularly when low-cost materials are used.

On one hand, TL model rapidly achieves an accuracy of > 73% when 5% of the samples are used. Then, the accuracy remains at 75–86%. This trend indicates that TL exhibits good accuracy and an increase in data samples affects the model performance less significantly. This phenomenon is common in machine learning. TL is a good substitute where data are absent or scarce; however, it shows a limitation in gaining excellence in accuracy owing to the heterogeneity of the intrinsic features between the source and target domains.

On the other hand, mc-TL model achieves accuracies of 76–89% when 5% of the samples are used. The accuracy of the model outperforms TL in each case, implying that deriving anomaly detection models is feasible even while using a small amount of training data obtained from depositing high-cost materials. In contrast to the typical model, the proposed method successfully reflects the melting and hardening characteristics of the target material by applying thermal properties to modeling.

The difference in the accuracies between TL and mc-TL is mainly induced based on the difference in transferability between them. Transferability is referred as the generalization ability of features [34]. TL properly extracts image features to generalize the homogeneity of melting and hardening, thus producing a common feature space; however, there is no method to specify the thermal characteristics of materials in models. In contrast, mc-TL endows learned features with higher transferability compared to TL. This method imposes thermal properties as manufacturing knowledge features to create a common and material-adaptive feature space, which can implement the availability of TL and specify the discriminative thermal behaviors of materials.

5.2.3 Industrial Applicability

In view of the above, our mc-TL models produce acceptable classification performance in the applicability perspective because accuracy values are stably high across all the data portions and the deviations of accuracy are not massive between mc-TL and SL models (maximum deviations of 5.59%, 1.49%, and 4.59% in the three cases, respectively). These deviations can be reasonable and acceptable particularly for detecting defects in refractory materials, when data are absent or scarce because their AM processes have not been run much; otherwise, machine learning cannot be easily applied due to the cost intensiveness in data collection.

In addition, our prior studies reveal that anomaly detection models using 2D melt pool image data outperform those using 1D numerical voltage data [33], 59. However, it is not easy to install a welding image data acquisition system in all WAAM systems owing to the cost intensiveness issue. In this situation, mc-TL can provide reasonable classification performance, taking into account cost-effectiveness with use of cheaper numerical voltage data. Thus, applicability of mc-TL can be found as an alternative anomaly detector in an environment where machine learning is unavailable owing to data absence or scarcity. Nevertheless, it is also true that controversy may arise in applying mc-TL to industries owing to the inferior accuracy compared to machine learning. This is a common phenomenon in a data sufficient environment due to the nature of TL.

6 Conclusion

This study proposed a material-adaptive anomaly detection method for WAAM. The following conclusions can be made from the study:

  • A TL-based method was proposed to create anomaly detection models for classifying balling defects as abnormal by transferring models derived from source materials to target materials. Specifically, the proposed method differs from typical TL because it converts numerical voltage data into voltage image data as an input to CNNs and concatenates thermal properties with image features in CNN-based modeling. The proposed method performs fine-tuning to adjust the image and material-property features of the source domain toward those of the target domain.

  • Experiments were performed using a GTAW-based WAAM system. LCS, STS, and INC were used as materials, and DensetNet169 was used as an image feature extractor. The proposed method generated mc-TL models that achieved accuracies of 82.95%, 89.47%, and 84.22% while transferring LCS to STS, LCS to INC, and STS to INC, respectively, outperforming typical TL models, which achieved accuracies of 78.03%, 86.37%, and 73.63%, respectively. Moreover, the method can help achieve desirable accuracy using even 3% of the data samples for the target material, rarely achieved in machine learning.

The proposed method can contribute to developing anomaly detection models for in-situ quality monitoring using TL, which has been negligibly applied for WAAM. Furthermore, material properties can be concatenated with typical image features to improve accuracy, particularly when data are scarce. Obtaining data from high-cost material fabrication is expensive and time-consuming.

The limitations of this study are as follows: First, TL is a good substitute for machine learning when data are absent or scarce, with this phenomenon demonstrated in our experiments. However, TL demonstrated the inferior accuracy compared to machine learning as training data increased. Second, the current study only uses voltage image data as a data source. This single data source may adversely affect accuracy when data are of low quality owing to sources of noise. This low data quality in a single source may constrain the achievement of a desirable accuracy because data cannot perfectly reflect the melting and solidification phenomena. Ensemble learning can help resolve this data quality issue. Ensemble learning combines more than two base learners allocated to separate data sources to improve accuracy. Each base learner trains a dataset provided from each data source, and an ensemble learner then trains prediction result data obtained from the base learners. For this study, two base learners can be derived from training the camera image data and the voltage image data. Then, ensemble learners are derived as the last anomaly detection model by training the dataset, which concatenates the probabilities of abnormality occurrences generated from the two base learners. Ensemble learning can prevent miss-classification by resolving a local optimal problem, in which a base learner can be trapped.

We plan to apply a multisource approach with multiple source domains to improve accuracy. In addition, we plan to develop a hybrid method to simultaneously use machine learning and TL regardless of data richness or absence, thereby predicting and optimizing the quality performance in the process planning phase.