1 Introduction

Advances in many fields require decreased size and increased precision leading to higher demand for precision manufacturing including micro-drilling. Laser beam micro-drilling is a favorable method that achieves high-quality micro-holes with impressive accuracy and precision and is used in various fields including biomedical (implants, medicine), aerospace, robotics, and electronic applications [1]. The current method of quality checking is highly inefficient and prone to subjective judgment. After the drilling process is completed, the specimen is removed from the machine and undergoes a quick visual inspection by checking the light penetration. Sampling inspection with a sample rate requested by the client is done using a precise microscope. If the quality of the sampled holes does not meet the requirements asserted by the client, additional re-drilling is done on correctable defective holes. The vast number of holes makes total inspection extremely costly hindering the economic feasibility. Visual inspection is susceptible to subjective judgment increasing uncertainty in the reliability of quality control. When redrilling is necessary, the need to remove specimens from the machine causes a delay in completion in addition to a reduction in quality due to machining errors while realigning the specimen onto the workspace. The objective of this project is to overcome the limitations in the current quality inspection method: overcome time and cost limitations of total inspection using real-time quality inspection through machine learning, exclude subjective judgment using automatic inspection, prevent the need to reschedule for redrilling, and overcome machining errors by eliminating the need for plate removal. Thus in this paper, we will propose a real-time, in-situ automatic quality inspection method using real-time light intensity measurements and machine learning defect detection models. To achieve this goal, an experiment was designed to drill over 56,000 holes onto SK5 metal plates with varying thicknesses using an Nd:YAG laser machine. Photodiode data collected during the fabrication was used to train machine learning models designed for anomaly detection to identify defective holes. In the following section a more detailed background of the specific laser drilling machine and process, anomaly detection using machine learning, and related work will be given.

2 Background

2.1 Laser Micro-drilling

The laser machine used for this research is a Neodymium-doped yttrium aluminum garnet laser (Nd: YAG) emitting a pulsed laser beam with a wavelength of 1024 nm. The drilling process used was trepanning drilling where the pulsed laser creates a hole by removing a circular disk from the workpiece [2]. In particular, the path taken by the laser is 1 → 2 → 3 → 3 → 3 shown in Fig. 1 and the laser circles the full diameter three times to prevent under drilling.

Fig. 1
figure 1

a Depiction of trepanning laser drilling, b diagram of path taken by laser

Parameters that affect the quality of drilled holes can be divided into two categories: laser parameters and physical parameters. Laser parameters include settings such as peak power, frequency, and percent duty of the laser which control the intensity and schedule of the laser pulsing. Physical parameters include the velocity of the base stage where the specimen is fixed and laser defocus is the distance between the laser head and the specimen surface in relation to the focal length of the laser [3].

2.2 Anomaly Detection Algorithms

Anomalies are outliers that are significantly different from other normal instances. Accordingly, anomaly detection aims to identify points that are deviated from the generally normal data [4]. Anomaly detection algorithms are actively used in various applications, namely fraud detection [5], health monitoring [6, 7], surveillance monitoring [8], predictive maintenance [9,10,11], and defect detection [12, 13]. The latter two applications are closely related to manufacturing where key events like machine failure and defective products are scarce events that are significantly different from normal situations. The task of laser micro-drilling is a defect detection problem where an estimated average of 1–2% of holes are reported to be defective.

Machine learning algorithms are effective in anomaly detection as they can be trained on a vast amount of normal data. Some well-known machine learning algorithms include local outlier factor (LOF), K-nearest neighbors (KNN), support vector machines (SVM), Bayesian networks and autoencoders [14]. In this paper the autoencoder algorithm was explored.

2.3 Related Work

Multiple works have been done regarding laser drilling and hole quality. Many focused on the task of quality prediction with the goal of parameter optimization. Baiocco et al. [15] used artificial neural networks (ANNs) to predict the kerf width and hole diameter given four parameters: pulse duration (Pd), cutting speed (Cs), focus depth (Fd), and laser path (Lp). Biswas et al. [16] attempted to predict the optimal values for machining parameters (lamp current, pulse frequency, pulse width, air pressure, and focal length) that produce the best hole circularity and taper using ANNs. Ranjan et al. [17] used time-series signals to predict the quality of traditionally micro-drilled holes. The adaptive neuro-fuzzy inference system (ANFIS) model was trained on wave packets from vibration signals and cutting force signals.

Some works attempted real-time defect detection and monitoring of other laser manufacturing processes. Zuric and Brenner [18] used lateral monitoring to detect defects during ultrashort pulse laser micro structuring. Valtonen et al. [19] collected real-time photodiode data to detect defects by detecting when the laser missed a pulse using an industrial computer PXI system with a Data Acquisition card and Real-Time Controller module.

3 Methodology

3.1 Methodology Overview

Our approach stems from the repeated observation of unusually large amounts of light emitted during defective hole drilling. To quantify this phenomenon light intensity was collected to monitor the drilling process and used as data for machine learning modeling [20, 21]. Thorlabs PM16-140-USB Power Meter with a wavelength range of 350–1100 nm was used to measure five light intensity parameters: power [W], logarithmic power [dBm], saturation [%], irradiance [\({\mathrm{W}/\mathrm{cm}}^{2}\)], and current [A]. A web camera was also installed for visual monitoring throughout the experiment. Both the sensor and camera were controlled using an edge computer allowing remote data collection and control as shown in Fig. 2.

Fig. 2
figure 2

Experimental set-up of installed photodiode sensor and camera

A total of six SK5 metal plate specimens were fabricated as part of this experiment. Two specimens per each of the three thicknesses, namely, 1 mm, 1.5 mm, and 1.8 mm were prepared. For each thickness, five different diameters, 0.05 mm, 0.08 mm, 0.2 mm, 0.3 mm, and 0.5 mm were drilled using normal laser and physical parameters suggested by field experts resulting in a total of 35,640 normal condition holes. In addition, 20,736 holes were drilled testing 12 defect-prone conditions on 0.08 mm diameter holes on each of the three thicknesses. The chosen conditions were obstacles (glass, metal, sand particles), surface scratches, laser defocus (positive and negative), and unsuggested laser parameters (peak power, frequency, and duty). Two diagrams repeated for each of the three thicknesses are shown in Fig. 3a, b. More detailed combinations of diameter and condition can be seen in Tables 1 and 2.

Fig. 3
figure 3

a Diagram for 0.05 mm, 0.08 mm, 0.2 mm, 0.3 mm normally drilled holes, b Diagram for 0. 5 mm normally drilled holes and 12 conditions

Table 1 Combinations for normally drilled holes to three different thicknesses (1 mm, 1.5 mm, 1.8 mm)
Table 2 Combinations for conditionally drilled holes to three different thicknesses (1 mm, 1.5 mm, 1.8 mm)

Light intensity data collected during the fabrication of the six specimens were processed and used as data to train and test our machine learning model. The trained model is used to identify holes that are likely to be defective. To verify the performance of the final model as a defect detector, the actual quality of each hole was measured using a precision microscope to distinguish real defects. Model performance was quantified using accuracy, detection rate, false rejection rate (FRR), and false acceptance rate (FAR).

Additionally, a classification model using DNN will be trained using the same data with 51 classes, one for each combination. The accuracy and confusion matrix will be used as a performance measure of classification.

3.2 Machine Learning Model

3.2.1 Autoencoder Algorithm

Autoencoders are trained with a dataset containing mostly normal data and learn repeated patterns within the normal dataset. Patterns are learned through updating weights in compression (encoder) and expansion (decoder) layers. The model outputs the reconstructed values as a result of compression and expansion as shown in Fig. 4. The difference between the original and reconstructed values is used to calculate the loss, namely the reconstruction loss, and is used as a measure of how abnormal the input data is.

Fig. 4
figure 4

Diagram of traditional autoencoder model

Weights inside the encoder and decoder layers are updated through backpropagation as Eq. (1) during the training process. Trained autoencoders reconstruct new, unseen data using the trained encoding and decoding layers. New data sharing similar patterns with normal data is more likely to be reconstructed close to the original data while data with abnormal patterns are more likely to be unsuccessful in accurate reconstruction yielding high reconstruction loss.

$$w=w-\alpha \frac{\partial (E\left[\left|x-x^{\prime}\right|\right])}{\partial w}$$
(1)

where \(\alpha\) is the learning rate and \(E\left[\left|x-x^{\prime}\right|\right]\) is the mean absolute error (MAE) loss. This is applied for all weights w and where x′ is the reconstruction of the input x.

$$x^{\prime} = {P}_{\theta }({Q}_{\varphi }(x))$$
(2)

The specific design of layers within the encoder and decoder determines functions \({P}_{\theta }\) and \({Q}_{\varphi }\). In this paper LSTM and CNN layers were tested as encoding and decoding layers. LSTM is a type of recurrent neural network (RNN) that is capable of storing long-term dependencies within time-series data. The internal structure of a LSTM unit consists of a cell state and three gates: input gate, output gate, and forget gate as depicted in Fig. 5. New and past information is added and forgotten to optimize the loss. CNN is used often for image classification where repeated patterns appear in a spatially invariant manner. It uses kernels to convolute the data to create feature maps, and pooling to reduce the length during encoding, and up-sampling to reconstruct to the original length. The same trepanning drilling is repeated for each hole which should be learned by CNN layers as a pattern for normal holes.

Fig. 5
figure 5

a Internal Structure of LSTM (forget, input, and output gates shown by purple rounded rectangles in respective order from left to right), b diagram of CNN structure

The threshold is set as the sum of the mean and three standard deviations. Holes with loss values greater than the threshold are considered defective.

3.2.2 Model Structure

The length of collected data varied depending on the number of photodiode data points collected during the fabrication of each section. For two down sampling layers by a factor of two, the inputs were padded with zero to the nearest multiple of four. The input into the LSTM model is a one dimensional vector of the padded data. For the CNN model, the collected data was padded to the nearest multiple of 128, then 128 data points were grouped as one input data sample for consistent input dimension. Both the LSTM and CNN models have two layers in their encoder and decoder. The first encoding layer increases the number of channels to 8 and the second layer to 16 channels. The decoder reverts this process, from 16 channels to 8 to a flattened vector matching the shape of the input. Figure 6 depicts the entire CNN model structure from the input to encoding layers, decoding layers, and the output. The mean absolute error (MAE) of the input values and the reconstructed values is compared with the threshold to determine whether the input is of a defected hole or not. Hyperparameters of CNN and LSTM are listed in Table 3.

Fig. 6
figure 6

Structure of entire CNN anomaly detection model

Table 3 Specifics of implemented CNN and LSTM models

3.2.3 Classifier Algorithm

Classification is another fundamental machine learning technique that aims to differentiate samples in a dataset into distinct groups. In this paper, a DNN will be used to classify the entire dataset into 51 classes (17 different combinations for each of the three thicknesses). For multiclass classification, Softmax loss is used. Softmax loss refers to the use of Softmax activation followed by a Cross-Entropy loss. The Softmax activation function maps the final layer of the DNN to the probability of being each of the 51 classes as a score. Cross-Entropy loss is used to penalize false predictions for efficient training.

The weights are updated by backpropagation and the chain rule, as shown in Eq. (3). where f is the activation function.

$$w=w-\alpha \frac{\partial L}{\partial w}=w-\alpha \frac{\partial L}{\partial f}\frac{\partial f}{\partial w}$$
(3)

When using Softmax with Cross-Entropy, the loss function reduces in the following manner

$$L=-log\left(\frac{{e}^{{s}_{p}}}{\sum_{i}^{C}{e}^{{s}_{i}}}\right)$$
(4)
$$L=-{s}_{p}+log\left(\sum_{i}^{C}{e}^{{s}_{i}}\right)$$
(5)

where \({s}_{p}\) is the score of the true class and C is the total number of classes.

3.3 Hole Quality Measure

Hole quality is represented using multiple parameters that affect the functionality of the final product. The circularity or roundness [16, 17, 22] and taper [23] are key features used to determine the quality of micro holes. In this paper diameter error and center alignment of the holes at entrance and exit were used in addition resulting in six hole quality parameters: diameter error at entrance and exit, circularity at entrance and exit, taper, and hole alignment.

An OGP 3D profile microscope measuring machine (Fig. 7) was used to routinely measure the diameter, roundness, and hole center coordinates (x, y, z) for both the entrance and exit sides of all holes. Diameter error was calculated as the difference between the designed and measured hole diameter. Taper and hole alignment was represented using angles calculated by Eqs. (6) and (7) from Fig. 8, respectively.

Fig. 7
figure 7

3D OGP profile microscope set-up and MeasureMind® software

Fig. 8
figure 8

a Diagram of hole taper, b diagram of center alignment

$$taper \left(\theta \right)=\mathrm{arctan}(\frac{\left|{D}_{ext}-{D}_{ent}\right|}{2t})$$
(6)
$$align\left(\theta \right)=\mathrm{arctan}\left(\frac{\sqrt{{\left({x}_{ext}-{x}_{ent}\right)}^{2}+{\left({y}_{ext}-{y}_{ent}\right)}^{2}}}{t}\right)$$
(7)

A defect is defined based on the requirements of different usage cases of the drilled plates. In this paper it was defined as consulted with field experts to be a hole with a diameter error greater than ten percent of the desired diameter, circularity less than 0.002, a taper or hole alignment error greater than 0.25 radians.

3.4 Data

Five different photodiode parameters were measured during the experiment, however, each of the four parameters, power [dBm], saturation[%], irradiance[\({\mathrm{W}/\mathrm{cm}}^{2}\)], and current showed a mathematical relationship to power [W]. Since the patterns visible in the power [W] data are repeated in the other parameter measurements, only power [W] was used in our dataset to avoid an unnecessary volume of data. The values for power [W] ranged up to 0.03 mW for normal hole drilling while conditionally drilled holes ranged to higher values up to 2.5 mW.

Raw data were preprocessed in four steps: cleaning, concatenation, formatting, and normalization. Data were cleaned by removing excess data collected during idle state. The cleaned data files were then concatenated into one continuous time-series file for each combination resulting in 51 files. The data was cut into multiple rows of data sharing the same length for CNN modeling. All data were normalized to adjust the scale of data for more effective modeling.

4 Results

4.1 Experiment Results

The fabricated specimens were of the form shown in Fig. 9. After sanding and cleaning the specimens, the hole quality was measured as outlined in Sect. 3.3. Some sample images of drilled holes viewed using the OGP 3D profile microscope can be seen in Fig. 10.

Fig. 9
figure 9

Two out of six specimens before cleaning. Same layout was repeated for all three thicknesses

Fig. 10
figure 10

a Normal hole with good quality values, b defect with poor circularity and diameter, c thermal explosion due to overheating, d under drilled hole

4.2 Modeling Results

4.2.1 Autoencoder Modelling Results

The reconstruction results of the two models trained shown in Fig. 11a suggest that the models successfully learn patterns in data during normal fabrication. As seen in Fig. 11b, LSTM has higher accuracy than CNN with a lower MAE reconstruction loss, especially at peaks. Thus model performance evaluation was executed on the LSTM model. This is likely due to the advantage of LSTM as a time-series model. LSTM stores information about past events in data with varying weights for longer and shorter-term memory. Although holes are drilled routinely with the same settings, fabrication on metal plates is influenced by other factors such as heat stored in neighboring areas of the plate. These other factors are time-dependent supporting the advantage of using LSTM over CNN.

Fig. 11
figure 11

a Plot of experimental data with reconstructed data of CNN, LSTM models, b loss plot comparison between CNN and LSTM

4.2.2 Classifier Modelling Results

A DNN classifier was trained over 3000 epochs using 80% of the entire dataset. The remaining 20% of the data was used to validate the model as a check for common problems such as underfitting and overfitting. The final accuracy and loss were 0.9630 and 0.1178 for the training set, and 0.9553 and 0.1634 for the validation set as shown in Fig. 12 and Table 4. The similarity in the accuracy of training and validation suggests that the model has properly learned to generalize to unseen data. The high accuracy suggests each of the 51 combinations can be differentiated from each other.

Fig. 12
figure 12

a Loss versus Epoch plot for train and validation, b accuracy versus Epoch plot for train and validation

Table 4 Final accuracy and loss results of training and validation set for DNN classifier

4.3 Model Performance

4.3.1 Performance Measures

Model performance measures how well the model performs the designed task. This section will evaluate how well our model functions as a quality inspection tool through four performance parameters: accuracy, detection rate (DR), false rejection rate (FRR), and false acceptance rate (FAR). Each of the values is calculated as in Eqs. (8)– (11). Accuracy is the percentage of correct predictions over the total predictions. Detection rate is the percentage of real defects that are identified correctly by the model. FRR is a percentage of over detection where the model falsely rejects normal holes by incorrectly identifying them to be defective. FAR represents under detection where the model fails to identify real defects. As organized in Table 5, the performance measures for the final model were as follows: 99.86% accuracy, 90.37% DR, 0.08% FRR, and 9.63% FAR. The model has extremely high accuracy and low FRR which is desirable. This suggests that very few normal holes are likely to be rejected as a defect. However, improvements could be made on the DR and FAR which suggests that about 9.6% of defects are likely to be under detected. Considering the original 1–2% defect rate of laser drilled holes, the use of our suggested model can lower the defect rate to round 0.1–0.2%.

$$Accuracy \, = \, \left( {Correct\;Predictions} \right) \, / \, \left( {Total\;Predictions} \right)$$
(8)
$$DR = \left( {Predicted\;to\;be\;Defective} \right) / \left( {Actually\;Defective} \right)$$
(9)
$$FRR = \left( {Rejected\;as\;Defective} \right) / \left( {Actually\;Normal} \right)$$
(10)
$$FAR = \left( {Accepted\;as\;Normal} \right) / \left( {Actually\;Defective} \right)$$
(11)
Table 5 Model Performance values of LSTM model

There exists a trade-off between FRR and FAR values depending on the choice of threshold value as depicted in Fig. 13. It can be seen that reducing the FAR (reduces under detection) adversely increases FRR which represents over detection. The crossover error rate (CER), also called the equal error rate (ERR), is when FRR and FAR have equal values. CER is often used as a performance measure in authentication algorithms such as biometric systems [24]. Improvements in determining the performance of the model, independent of the choice of the threshold could be made by experimentally plotting the FAR and FRR curve.

Fig. 13
figure 13

Curve plot of FAR and FRR trends with CER point

4.3.2 Classifier Performance

Model performance for classification is the accuracy that gives the rate of correct predictions over total predictions. The validation accuracy as mentioned in Sect. 4.2.1 was 0.9553 suggesting that the model will correctly predict about 95.5% of new, unseen data.

The confusion matrix seen in Fig. 14 visualizes the performance of our classifier. Each square out of the 51 by 51 board represents a probability from 0 to 1. Diagonal squares represent correct predictions and it can be seen that most have a high probability represented by the darker color.

Fig. 14
figure 14

Confusion matrix of classification into 51 classes

5 Conclusions

The use of light intensity measurements was effective in process monitoring. However using one sensor limited the data collection to a single angle from the laser drilling spot limiting the data to be periodic in nature. The use of a multi-array sensing module surrounding the laser drilling spot would allow for a more complex relation to be captured in the data. Furthermore, spectrum analysis of the collected light would be greatly insightful in the actual composition of the incident light and could be used for a more intricate adjustment of sensor parameters to increase the quality of data.

In further research, the integration of the classifier within the autoencoder will be investigated to increase the precision of prediction and enhance the applicability of the model in real-life situations. As in most machine learning research, more data would be desirable in terms of an increased number of combinations (thickness, materials, diameters, etc.) and added features such as images or videos.