The rapid development of smart manufacturing, assisted by advanced information and manufacturing technologies [1], enables and speeds all physical processes and cyber flows to be available when and where they are needed across supply chains. A global integration for innovation for multiple small and medium-sized enterprises (SMEs) and large companies [2, 3] is critical to achieving this goal, particularly the innovation of the machinery industry based on artificial intelligence (AI) and the internet of things (IoT).

The industry of water hardware is also eager for digital transformation by smart manufacturing, and the metallization of plastics such as Acrylonitrile Butadiene Styrene (ABS) plastic is popularly and widely used. ABS has the best electroplating effect and is the most commonly used material because the metal ions plated are not easy to fall off [4]. However, plastic is usually a poor conductor and must first be attached to its conductive membrane. At present, the processing technology uses chemical plating or electroless plating to adhere a metal film to the surface of the plastic, followed by subsequent copper plating or nickel plating (and chrome plating). Therefore, we should seriously monitor the final quality of the ABS product under the control of the process.

Automatic optical inspection (AOI) [5] is very common in quality inspection applications. Generally speaking, production lines need to use AOI for inspection, mainly for products with large and stable production volumes or products with rigorous product quality inspection. Representatives of such industries include printed circuit boards (PCBs), semiconductors, cell phone parts, medical devices, etc. However, with the increasing number of defect detection items, traditional AOI requires the continuous development of new algorithms for different defect characteristics, which is costly and takes a relatively long time to develop. In addition, the algorithm of the traditional program is susceptible to noise interference, and it is less stable against external interference. However, detecting reflective objects, such as electroplated objects, is hard to judge the defects due to the reflection of a metal surface. In contrast, deep learning in artificial intelligence is trained by thousands or even millions of defect photos, which is more flexible and resistant to external interference than traditional AOI. Therefore, we study this interesting issue based on the YOLO (You Only Look Once) framework of electroplated object detection [6,7,8]. YOLO is a state-of-the-art, real-time object detection system [9], YOLO is developed on a single convolutional network, which simultaneously predicts multiple bounding boxes and class probabilities for those boxes. YOLO trains on full images and directly optimizes detection performance [10]. This unique detection system has revealed its power in many fields; for example, Li et al. [11] used YOLO v3 to detect electronic components on PCB to achieve quality control of more than 90 percent accuracy. Tian et al. (2019) used YOLO to see fruits in orchards to judge apples’ growth stages. Liu and Wang [12] proposed the YOLO to classify the diseases and pests for tomatoes. In addition, this YOLO is also valuable in mask detection during COVID-19 [13].

Our methodology has the following steps:

  1. 1.

    We designed the AOI prototype for the customized needs of the study. We developed an AOI prototype and assembled it with additive manufacturing (AM) technology. The AOI prototype is created by ourselves to detect the defects on electroplated ABS products, integrated with Arduino C++ programming to control the motor, inspection light source, stepping motor, robot arm, other related electronic parts, etc.

  2. 2.

    We use the AOT prototype to capture the necessary videos and divide them into frames per second (FPS) to label the defective images. The images are classified into two categories: defective and non-defective. We apply the Python Open Source Computer Vision Library (OpenCV) for image pre-processing, and the processed images are labeled with defect types and locations on each product image.

  3. 3.

    We set up the deep learning environment on the graphics processing unit (GPU) to apply different YOLO algorithms to compare their performance for the same validation set of images. The video captured from the AOI prototype is sent to YOLO models for deep learning to classify products’ defects in real-time. After that, we train the customized YOLO weights by our dataset to classify the defects of products among various YOLO algorithms: from v2 to v5 in this study.

  4. 4.

    Finally, the indexes of receiver operating characteristic (ROC) compare the detecting performance from different YOLO frameworks.

We organize this study as follows. Section 1 briefly describes the background and motivation of the study; Section 2 explains the hardware and software tools and theoretical methods used. Section 3 describes the process of this study, the description of the self-assembled machine and the calculation results, and Sect. 4 provides the conclusion and recommendations. In brief, our contributions are summarized as follows:

  1. 1.

    We assemble a customized AOI assisted by three-dimensional (3D) additive manufacturing (AM) to detect the defectives of electroplated products (ABS).

  2. 2.

    The algorithm of AOI is launched by the deep learning schema from the YOLO family: from v2 to v5.

  3. 3.

    We compare the performance of various versions of YOLO and discuss their characteristics.

This study aims to develop an advanced AOI paradigm by AI for smart manufacturing nowadays. Our efforts integrate AI and AOI as advanced manufacturing technology and inspire the electroplating industry in tradition.


We try to find the papers integrating YOLO algorithms and AOI for the qualitative detection of the electroplating industry; however, the paper number of this field is relatively less. Most of these contributions focus on the initiative of PCB. Therefore, this study should be valuable for the smart manufacturing of small and medium enterprises (SMEs). The methodology of this study is illustrated in Fig. 1. First, we designed the AOI prototype for the customized needs by CAD. At the same time, we review the necessary literature for AOI and YOLO. Second, we assemble the AOI prototype by 3D additive manufacturing, and various versions of YOLO versions are developed in PyCharm ( Third, the AI-based AOI platform integrates the trained YOLO models. Fourth, the AOT prototype captures the necessary images. The images are classified into two categories: defective and non-defective. Fifth, the GPU tests different YOLO algorithms to compare their performance for the same validation set of images. Various YOLO algorithms: from v2 to v5, are tested in this study. Finally, the indexes of receiver operating characteristic (ROC) compare the detecting performance from different YOLO frameworks.

Fig. 1
figure 1

Research Process

Automated optical inspection

Fig. 2
figure 2

Operation of AOI

Automated optical inspection (AOI) is a visual inspection where a camera autonomously scans the failures of products, shown in Fig. 2. AOI system early application is in textile, packaging, automotive, machinery, etc. The main focus is on the speed of detection requirements, introducing the AOI inspection system, and significantly improving detection speed [14]. Still, this stage of the AOI inspection system detects larger objects and defects and must compete with the human eye. Hence, the development of the AOI inspection system stage simulates the naked eye performance and speed, limited by time. Although the pace of inspection can meet the requirements, there is no demand for too fine or high precision in the past AOI inspection items, and the development of the resolution of the inspection has not been greatly improved [15]. However, with the industrial development in Taiwan, technology industries such as semiconductors, electronics, biotechnology, and optoelectronics have emerged in the past two decades. Unlike traditional products, technology products are usually more complex, high-precision, tiny, and less tolerant, and any problems in the process may lead to errors in the whole system. In the technology products, the small size and lightweight characteristics, if the AOI optical image inspection, the required resolution is relatively precise to push the need of AOI [16]. The development of AOI has dramatically reduced the manual visual inspection error rate, thus improving production control and quality inspection speed, which is what the manufacturing industry expects. The improvement in resolution is also due to the introduction and application of the charge-coupled device (CCD) sensor. The image area and pixel resolution to get better data. However, considering CCD in price competition, the AOI system has the advantage of low cost. Therefore, many industries began to introduce AOI systems to replace manual inspection. In most of the technology industry and the demand for inspection of production, the overall booming development of automated optical inspection, creating the growth and demand for the AOI industry. We can find some research for AOI; for example, since AOI is very sensitive to the environment, Chen and Perng [17] proposed a system of coaxial light, a backlight, and motion control to capture the characteristics of statistical textures of the molding surface.

Acciani et al. [18] developed the diagnosis as a pattern recognition problem with a neural network (NN) approach. Five types of solder joints have been classified to diagnose with a high recognition rate by NN. Richter and Streitferdt [19] proposed a smart AOI paradigm. This paradigm uses active and unsupervised learning superposition to build a fully annotated dataset. In addition, they train a suitable classifier by a deep learning cluster analysis. As the development of hardware is fast and deep learning technology is more popular today, the AOI combined with an evolutionary algorithm such as NN will play an essential role in the smart sensing network [20].

YOLO Framework

In image processing, a kernel, or a so-called convolution matrix, is a small matrix used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image [21, 22]. Consider an image has a matrix form given a width \(w\) and a height \(h\) to store the pixel value in (\(x\), \(y\)). Assume \({z_1}, {z_2}\in (x,y)\), then the kernel is defined as a function k that for all \({z}\in R^2\) satisfies

$$\begin{aligned} k (Z_1,Z_2) = < \phi_1, \phi_2 > ^p \end{aligned}$$

Here \(k\) is a matrix, \(p\) is a positive integer and \(<\phi _{1},\phi _{2} >^p\) is a power form between two vectors: \(\phi ({z_1})\) and \(\phi ({z_2})\). For example, if we take \(p\)=2, then we have

$$<\phi _{1},\phi _{2} >^2 = \phi ({z_2})^t[\phi ({z_1})\phi ({z_1})^t]\phi ({z_2})$$

\(\phi ({z_i})\) satisfies the following condition:

$$\phi : {z_i}\longrightarrow {F}$$

where \(z\) is the input space and \(F\) is the output/feature space. For example, taking the neural network model, the input nodes are \(z\), and the final weighed outputs/nodes are \(F\). For the two-dimensional data, let \(S\) be a given finite set and

$$\begin{aligned} S=\begin{bmatrix} z_{1},&z_{2},&\dots&, z_{l} \end{bmatrix} \end{aligned}$$

we can propose the \(k\) in a matrix form as follows:

$$K^p= \begin{bmatrix}<\phi ({z_1}),\phi ({z_1})>^p, &{}<\phi ({z_1}),\phi ({z_2})>^p, &{} \dots &{} ,<\phi ({z_1}),\phi ({z_l})>^p \\<\phi ({z_2}),\phi ({z_1})>^p, &{}<\phi ({z_2}),\phi ({z_2})>^p ,&{} \dots &{},<\phi ({z_2}),\phi ({z_l})>^p \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\<\phi ({z_l}),\phi ({z_1})>^p, &{}<\phi ({z_l}),\phi ({z_2})>^p ,&{} \dots &{},<\phi ({z_l}),\phi ({z_l}) >^p \end{bmatrix}$$

Here Convolution \(K^p\) is a general purpose filter effect for images. Simply speaking, we can extract the information from image by setting the filter and stride of kernel, and the padding of feature. Therefore, a basic framework of convolutional neural network (CNN) is shown as follows [23]:

Fig. 3
figure 3

CNN Framework

This CNN structure was first proposed by Fukushima [24]. The development of CNNs is to solve the formidable challenge of image recognition and detection. A CNN consists of an input layer: we may recall the input vector z above: a vector of pixels to represent an image; moreover, one or more hidden layers, and an output layer (Fig. 3). At the beginning of CNN, semantic segmentation is necessary. Segmentation network classifies every pixel in an image, resulting in an image that is segmented by class.YOLO is an abbreviation of “You Only Look Once.” YOLO algorithm uses neural networks to provide real-time object detection. The critical components of YOLO algorithms are CNNs [25]. The basic framework of YOLO is shown in Fig. 4 by its input image presented by pixels, processing segmentation (also by pixels) of the input image, and the layers of CNN for each stage [9].

Fig. 4
figure 4

YOLO Framework

YOLO can determine the class and location of the object in the image with only one CNN, which significantly improves the recognition speed. The whole network design is point-to-point, easy to train, and fast. The critical difference between CNN [26] and traditional neural networks is that the human retina inspires CNNs ’ visual system. CNN models are validated to perform excellently in image processing, such as LeNet [27], the progenitor of CNNs, and AlexNet [28], which started to make a name for itself in the field of machine vision. In 1998, Yann LeCun et al. published “LeNet” as the original CNN framework, which formally brought the concept of convolution into machine learning. The LeNet network architecture uses MaxPooling to strengthen the features, and the output layer uses Radial Basis Function radially oriented Euclidean distance function. The distance function is used in the output layer.

YOLO requires a neural network framework for training, and for this, we have used DarkNet ( The first version of the input image of 448x448 has 26 layers, with 24 Convolution Layers followed by two fully connected layers. The major problem with The first YOLO: YOLOv1 is limited to its inability to detect microscopic objects. After the first version, many researchers and scholars extend the source code for better performance. After the first version, many researchers and scholars develop the source code for better performance. For example, YOLOv2 includes batch normalization layers after each convolutional layer and has thirty layers compared to YOLOv1 of twenty-six layers. Moreover, this version introduced the concept of anchor boxes. Anchor boxes are predefined boxes provided by the user to DarkNet, which gives the network an idea about the relative position and dimensions of the objects to be detected. It has to be calculated using the training set objects. YOLOv3 evolves to have one hundred and six layers of neural network. And it can catch on three scales for detecting small to large size objects. This version provides nine anchor boxes: three per scale. Most important, the multiclass problem turned into a multilabel problem. And YOLOv3 is powerful in detecting tiny objects. YOLOv4 was recently released in 2020, and two Taiwanese scholars from Academia Sinica joined this project [29]. According to Bochkovskiy et al. [29] YOLOv4 is optimal for real-time object detection tasks because its performance lies on the Pareto optimality curve of the accuracy and speed of frames per second (FPS). YOLOv5 is funded by the open-source by Ultralytics ( [30]. Ultralytics was founded in 2014, spearheading several U.S. Intelligence Community (IC) and Department of Defense (DoD) initiatives in particle physics, data science, and artificial intelligence.

Prototype construction

We designed the AOI prototype by Computer Aided Design (CAD) and assembled it with parts from additive manufacturing technology (3D printing) for the customized joints, which is shown in Fig. 5 (from left to right: CAD file, prototype and conveyor).

Fig. 5
figure 5

CAD and AOI Prototype

We use the series of Arduino boards to connect the stepping motor, light-emitting diode (LED), Dobot robot arm (, conveyor, and an ultrasonic sensor to finish the AOT prototype. We designed the prototype by CAD and assembled it with parts from additive manufacturing technology (3D printing) for the customized joints. We found the image quality is susceptible to our experimental environment. In this study, we try to capture the curved shape of the detected object; the defect characteristics of the curved shape will be challenged if a closed-circuit television camera (CCTV) lens is selected. Therefore, this study uses a telecentric lens [31] with parallel LED lights to set up the AOI device to capture the defect images by flattening this non-planar feature. Telecentric lens is designed in response to image size variation or image distortion and deformation caused by the different working distances of CCTV lenses. The Telecentric lens and CCTV lenses comparison are shown in Fig. 6(Left: CCTV, Right: Telecentric).

Fig. 6
figure 6

Comparison of Telecentric Lens (Right) and CCTV Lens(Left)

The characteristic of a telecentric lens is that it can obtain the same size image for the same size object, which we expect to measure at different working distances. And the telecentric lens is generally used for dimensional measurement or surface inspection on curved products. It is designed in response to image size variation or image distortion and deformation caused by the different working distances of CCTV lenses. The characteristic of a telecentric lens is that it can obtain the same size image for the same size object to be measured at different working distances and is generally used for dimensional measurement or surface inspection on curved and curved products.

Therefore, after several try-and error experiments, we use a pair of bar LED with the trigger of the arrival of product waiting for detection. The light source plays a vital role in the inspection process in machine vision. The critical point of the lighting design is to highlight the essential features for detection, suppress the unimportant background. To obtain an acceptable image, we must understand the object’s surface characteristics to be tested through try and error experiments to select the appropriate light source to facilitate the lens to capture the defective features of the tested object. If the brightness of the light source is too high or too low, the type of light source is not suitable, the wrong way of lighting will significantly affect the final image of the camera. This particular product image capture is shown in Fig. 7 by setting a pair of bar LEDs.

Fig. 7
figure 7

LED Mechanism for Capturing the Product Image

The tested products are fed into this AOI machine by the conveyor. The stepping motors are used to move up and down to complete the light source’s illumination at different angles; therefore, this interesting AOI machine can reveal the defects of the global surface of the product. At the same time, OpenCV combined with the camera is used to take images and recognition. We design this AOI machine that the conveyor will transmit the ABS product one by one, and the camera will record the video during operation. Once the video is captured, the robot arm will remove the undesirable product at the end of the conveyor. The unsatisfactory situation makes the removal decision of the ABS product. All YOLO algorithms from v2 to v4 ( are developed and implemented on the Nvidia GPU server based on Linux of the AI Center at Da-Yeh University. And the YOLO v5 is provided by Ultralytics ( The preparation of image processing is introduced as follows: first, we need to label the defective products by reducing the image scale to increase the recognition efficiency. This study uses LabelImg ( as an image marker tool to form a dataset for the deep learning model of defect images. The LabelImg is an image marker tool with open-source code. This file type can record the defect coordinates marked on the image and provide a deep learning model for training. Second, we set Hue, Saturation, Value (HSV) mask to filter the objects other than the plated objects to reduce the image noise by using the ConvertScaleAbs function in OpenCV ( to enhance the contrast. The contrast enhancement is done by using the ConvertScaleAbs function, as shown in Fig. 8 (left). Finally, the enhanced images are sharpened by using the Filter2D convolutional algorithm, which is the pre-training labeling process of images, as shown in Fig. 8 (right). This study collected 508 images including surface impurities from electroplated parts. The training, testing, and validation ratio of these images is 60 percent, 30 percent, and 10 percent, respectively. We labeled the data as two categories: the defective category, including speck and pock, and the non-defective category, to build the training, testing, and validating set. Once the data above is ready, we apply four YOLO algorithms: v2, v3, v4, and v5 to validate the discrimination power among various algorithms.

Fig. 8
figure 8

Preprocessing of Images: Enhanced (Left) and Sharpened (Right) with Different Kernels

Experimental results

We collect 508 images for this study, and the input size of the image is 416 x 416. The model parameters of YOLO algorithms are summarized as in Table 1; the corresponding parameters are customized to prevent breakdown/non-convergence during training. The iteration number is enlarged accordingly because YOLOv4 and YOLOv5 have more complicated frameworks than YOLOv3 and YOLOv2. Decision-makers should decide these parameters to meet their customized needs; this could be considered case by case.

Table 1 Model Parameters

Then we compare these YOLO models; for example, v2, v3, and v4. The loss values reduce steeper and faster during iterations from v2 to v4: see Figs. 9, 10 and 11, the vertical axis is loss and the horizontal axis is iteration. The pictures of Figs. 9, 10 and 11 mean the model evolves more effectively from v2 to v4. The average losses of v2, v3, and v4 are 5.79, 4.91, and 2.73, respectively. The training performance of average loss also shows that the v4 is superior to v3, and the v3 is superior to v2. We also checked the YOLO v5, which is controversial compared with YOLO v4 by its originality. However, according to our experiment in this study, In terms of loss, YOLO v5 is better, but the discrimination power of the two models is almost the same; we will present the results later. The YOLO v5 is much smaller, which is very suitable for edge computing of AI users. The evolution of loss of YOLO v5 is shown in Fig. 12.

Fig. 9
figure 9

Loss Evolution with Iterations of YOLO v2

Fig. 10
figure 10

Loss Evolution with Iterations of YOLO v3

Fig. 11
figure 11

Loss Evolution with Iterations of YOLO v4

Fig. 12
figure 12

Loss Evolution with Iterations of YOLO v5

A ROC curve is an effective tool to compare the performance of the YOLO-classification models of this study. The confusion matrix is the basis of the ROC curve, and it is also the most basic, intuitive, and simple method to measure the accuracy of classification models [32]. The predictive performance of the categorical models is statistically presented in four categories by confusion matrix, which is shown in Fig. 13.

Fig. 13
figure 13

Confusion Matrix

We continue to evaluate the model’s performance by calculating \(Accuracy\), \(Precision\), \(Recall\), and \(F1-Score\) through four categories of statistics and use Recall as the primary reference for identifying the version among YOLO models. Here are the mathematical definitions for \(Accuracy\), \(Precision\), \(Recall\), and \(F1-Score\):

$$\begin{aligned} Accuracy =\frac{TN+TP}{TN+TP+FN+FP}\end{aligned}$$
$$\begin{aligned} Precision =\frac{TP}{TP+FP}(7) \end{aligned}$$
$$\begin{aligned} Recall =\frac{TP}{TP+FN}(8) \end{aligned}$$
$$\begin{aligned} F1-Score=\frac{2*Precision*Recall}{Precision+Recall}(9) \end{aligned}$$

Moreover, the evaluation results of various models are summarized in Table 2. Interestingly, the YOLOv3 and YOLOv5 performed in a balanced scope (over 70 percent) in Accuracy, Precision, Recall, and F1-Score, simultaneously. And the accuracy of YOLOv2 is somewhat lower than we expected. However, we found overfitting and overemphasized phenomena for the detected results from YOLOv4 (see Fig. 14). When we compare these caught pictures from v3, v4, and v5 altogether, as in Fig. 14, we can easily find too many (fake) points of speck and pock generated by v4, and we think this causes the slightly lower performance of YOLOv4. YOLOv5 also has the similar problem. We guess this issue resulting from the YOLOv4 and YOLOv5 has larger and complicated CNNs; however, the data of this study is limited. Therefore, if we don’t provide adequate data, they won’t perform as well as expected. The overfitting effect is more severe for YOLOv4 than YOLOv5.

Table 2 Model Evaluation
Fig. 14
figure 14

Detected Results from Various YOLO Algorithms (Left: v3, Middle: v4, Right: v5)

Conclusions and recommendations

In this study, we successfully solve the quality detection problem of the electroplating industry by combining the YOLO technology and the customized prototype of the AOI inspection platform. After the actual validation, YOLOv3 and YOLOv5 perform in a balanced scope with an accuracy rate of over 70 percent. Although YOLOv2 serves poor, we found that YOLOv2 is more accurate in the large area of pock detection (large object). A decision-maker can arbitrarily assign weights for multiple criteria, for example, Accuracy, Precision, Recall, and F1-Score, to select the final model to launch in practice.

The study results validate that YOLOv3 and YOLOv5 have the advantage of overall other models combined with the AOI system for better quality detection. We assume YOLOv4 will only show its value and power if we can provide enough data. Shortly, setting the model parameters is a primary task worthy of exploration. In addition, more defective images should be automatically collected to reduce the overfitting/overemphasized phenomena for YOLOv4 and YOLOv5.

Finally, we recommend increasing the number of pieces for each defect model and controlling the difference in the number of samples for each defect to provide the model with valuable training data. We believe our efforts play an important role for SMEs toward smart manufacturing.