TD-Net:tiny defect detection network for industrial products

Shao, Rui; Zhou, Mingle; Li, Min; Han, Delong; Li, Gang

doi:10.1007/s40747-024-01362-x

TD-Net:tiny defect detection network for industrial products

Original Article
Open access
Published: 29 February 2024

Volume 10, pages 3943–3954, (2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

TD-Net:tiny defect detection network for industrial products

Download PDF

Rui Shao^1,2,
Mingle Zhou^1,2,
Min Li^1,2,
Delong Han^1,2 &
…
Gang Li ORCID: orcid.org/0000-0002-7896-4833^1,2

706 Accesses
1 Citation
Explore all metrics

Abstract

The detection of tiny defects in industrial products is important for improving the quality of industrial products and maintaining production safety. Currently, image-based defect detection methods are ineffective in detecting tiny and variously shaped defects. Therefore, this paper proposes a tiny defect detection network (TD-Net) for industrial products to improve the effectiveness of tiny defect detection. TD-Net improves the overall defect detection effect, especially the detection effect of tiny defects, by solving the problems of downsampling of tiny defects, pre-filtering of conflicting deep and shallow semantic information, and cascading fusion of multi-scale information. Specifically, this paper proposes the Defect Downsampling (DD) module to realize the defect information supplementation during the backbone downsampling process and improve the problem that the stepwise convolution easily misses the detection of tiny defects. Meanwhile, the Semantic Information Interaction Module (SIIM) is proposed, which fuses deep and shallow semantic features, and is designed to interact the fused features with shallow features to optimize the detection of tiny defects. Finally, the Scale Information Fusion Module (SIFM) is proposed to improve the Path Aggregation Network (PANet) for cascading fusion and information focus on different scale information, which enables further improvement of defect detection performance of TD-Net. Extensive experimental results on the NEU–DET data set (76.8$\%$ mAP), the Peking University PCB defect data set (96.2$\%$ mAP) and the GC10-DET data set (71.5$\%$ mAP) show that the proposed TD-Net achieves competitive results compared with SOTA methods with the equivalent parameter quantity.

Fruit ripeness identification using YOLOv8 model

Article Open access 31 August 2023

State of the Art in Defect Detection Based on Machine Vision

Article Open access 26 May 2021

Deep Industrial Image Anomaly Detection: A Survey

Article Open access 15 January 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Research in the field of industrial automation and control has been a constant topic. For example, Song et al. [1] focused on the finite-time prescribed performance (FTPP) control problem, Tao et al. [2] focused on unsupervised cross-domain fault diagnosis methods, Stojanovic focused on the [3] fault-tolerant control problem for hydraulic servo actuators in case of actuator failure, Song et al. [4] focused on adaptive neural finite-time elastic dynamic surface control (DSC) strategies for nonlinear fractional-order large-scale systems (FOLSS), Xin et al. [5] optimized the single stage YOLOv4 model. These studies cover nonlinear systems, fault diagnosis, and surface defect detection. It can be seen that the research related to robotic systems or industrial control systems in practical industrial applications has attracted extensive interest from scholars.

The use of automated equipment for defect detection in the manufacturing process of industrial products has become a key part of the quality inspection phase. Due to the complexity and variety of surface defects in industrial products, the different shapes, defect detection scenarios and hardware configurations. Therefore, high demands are placed on the detection of surface defects in industrial products, especially the detection of tiny defects in industrial products.

In recent years, the powerful feature extraction capability of deep convolutional neural networks(CNNs) has been proven effective in many scenarios, and some excellent object detectors have emerged, such as the one-stage YOLO series [6] and the two-stage R-CNN series [7]. Considering the low efficiency problems of two-stage methods, one-stage detection methods has been favoured in industrial scenarios.

The defect detection task for industrial products differs from the detection task in general scenarios. A lot of research and improvements have been made on the existing one-stage detection methods. For example, Zhang et al. [8] built a network for track defect detection that makes full use of contextual information as well as attention mechanisms to optimize detection performance. Su et al. [9] proposed a multi-headed cosine nonlocal attention module and embedded it into an FPN [10] to achieve better results.

Small object detection has important practical significance, such as surface defect detection in the industrial production environment, in recent years, taking into account the labor cost and the error of visual method of detection, various industrial fields have introduced machine vision-based surface defect detection equipment, the rapid development of machine vision for automatic inspection equipment has brought higher accuracy and efficiency. In practical industrial applications, small object defects and small defects on the surface of the product scene does not account for a few, but the existing algorithms are effective in accounting for a large proportion of the image or the size of the large defects, the detection of small object defects on the performance is far from being satisfactory. Small object detection in the industrial field has a large number of needs and application scenarios, so this study has a very high research significance and application value.

However, in the detection of tiny defects, the existing work mainly focuses on the research of attention mechanism and multi-scale feature fusion methods, while the research of downsampling methods in the backbone network is insufficient, and the defect images of industrial products are easy to lose tiny defects in the downsampling process of stepwise convolution. At the same time, shallow features have higher resolution and are more conducive to the detection of tiny defects, while deep features are rich in semantic information despite their large receptive field. There is a lack of research related to pre-filtering deep and shallow features and then interacting with shallow features so as to guide shallow features for tiny defect detection.

Based on the above observation, this paper proposes a defect detection network (TD-Net) for tiny targets of industrial products to improve the effectiveness of tiny defect detection. The main contributions of this paper are listed as follows:

1.
In this paper, TD-Net is proposed to improve the detection capability of tiny defects in industrial products and to improve the problem that quality inspection process in industrial manufacturing is easy to miss the detection of tiny defects. TD-Net uses YOLOv5 [11] as the baseline model.
2.
In this paper, the defect downsampling (DD) module is proposed. To effectively solve the problems of max pool, average pool and stepwise convolution to complete the downsampling task resulting in the loss of information of tiny defects, so that the network can better detect tiny defects, the DD module replaces the stepwise convolution that undertakes the downsampling task in the backbone network to compensate for the loss of defect information in the downsampling process.
3.
In this paper, Semantic Information Interaction Module (SIIM) is proposed. First, channel attention is applied to deep semantic features (B5) with the maximum number of channels, and coordinate attention [12] is applied to shallow features (B3) with high resolution. Second, cascade fusion is performed between the multi-scale outputs of the backbone (B3,B4,B5). Finally, the optimized fused feature is fully interacted with the shallow (B3) feature to improve TD-Net’s ability to detect tiny defects. The method is good at alleviating the problem of missing detection of tiny defects.
4.
In this paper, Scale Information Fusion Module (SIFM) is proposed. First, SIFM pre-filters conflicts between features by summing and stitching multi-scale information cascades. Then, SIFM uses an attention mechanism to achieve the emphasis of useful features and suppression of interference features in industrial product images. The SIFM module replaces the Concat operation in the PANet [13] and improves the defect detection performance with a small increase in the number of parameters.

Related work

Traditional detection method

Chen et al. [14] proposed a strip steel surface defect detection method based on a spectral residual visual attention model, which is based on an improved filtering algorithm to enhance the difference between the defective target and the background, and simultaneously The method is based on an improved filtering algorithm to enhance the difference between the defective target and the background, and at the same time, constructs a salient map of the defective target through the spectral residual information, which requires artificial feature extraction in industrial defect detection, and is unable to realize the fast construction of end-to-end, and lacks a certain degree of generalization ability in the face of a wider range of industrial defect detection problems.

Medina et al. [15] proposed a rotation-invariant Gabor filter, which tries to solve the problem of detecting defects in different directions, but its method has long runtime and response time and is not suitable for large-scale real-time defect detection.

Meanwhile, Liu et al. [16] proposed an improved Multiscale Block local binary pattern (MB-LBP) method, and Cao et al. [17] introduced feature vectors to describe the defect detection problem. problem.

All these methods belong to the traditional defect detection methods, and their ideas have been the important basis of many target detectors, but the traditional detection methods generally have the problems of difficulty in massive data processing, large time overhead, poor performance, cumbersome manual feature extraction and poor robustness, which gradually fail to satisfy the defect detection needs of large-scale industrial practice. In recent years, a lot of research work has been carried out on industrial defect detection based on deep convolutional neural networks.

Deep learning detection method

In recent years, with the rapid development of deep learning in the field of computer vision, more and more research has been conducted using deep learning-based methods to detect industrial defects. Modern detectors usually consist of two stages, a backbone for extracting features and a head for predicting categories and bounding boxes. The most representative two-stage detectors are the R-CNN series, including Fast R-CNN, Faster R-CNN [18] and R-FCN [19]. The most representative one-stage object detectors are the YOLO series [20,21,22,23,24]. In recent years, anchor-free detectors have also gained momentum. Such as CenterNet [25]. However, the scenarios of general-purpose object detectors and industrial defect detection are different, which leads to the general-purpose object detectors cannot obtain the optimal performance in industrial scenarios. Therefore, more and more scholars have improved the general-purpose object detector to enhance its detection capability in industrial defect detection scenarios.

Tu et al. [26] proposed an improved YOLOv3-based surface defect detection method for sawn material, which uses CIoU Loss instead of IoU Loss and lacks the proposed based on the actual industrial defect detection itself Algorithm module design.

Zhou et al. [27] proposed the DACNet to detect strip steel surface defects, Dong et al. [28] used pyramid feature fusion to enhance defect detection, Wang et al. [28] proposed the DACNet to detect surface defects in strip steel. detection, Wang et al. [29] propose a new pyramid feature fusion module, Yu et al. [30] replaced the feature pyramid network (FPN) in Neck with a bi-directional feature fusion network (BFFN), Zeng et al. [31] made full use of the contextual information for the detection of tiny defects in PCBs.As can be seen, there have been many studies related to the utilization of different semantic features and the use of contextual information as a way to improve the performance of industrial defect detection models, but the above methods integrate features with different shades of semantics and do not consider pre-filtering the conflicting feature information of the different shades of semantics prior to the integration, which affects the further improvement of the model’s performance.

The above-mentioned industrial defect detectors based on deep learning have improved the capability of general-purpose detectors in defect detection by various means such as using new fusion methods. However, most of the existing studies have focused on the attention mechanism, multi-scale feature fusion methods, and insufficient research on downsampling methods in backbone networks. Meanwhile, shallow features have higher resolution and are more conducive to the detection of tiny defects, while deep features are rich in semantic information despite their large receptive field. There is a lack of research related to pre-filtering deep and shallow features, and then interacting with shallow features to guide shallow features for tiny defect detection.

Proposed network

In this section, we introduce the proposed TD-Net in detail, whose network architecture is shown in Fig. 1. TD-Net is a one-stage detection method, and its network structure is divided into three parts: backbone, neck and detection head. The proposed DD in backbone reduces the information loss caused by stepwise convolution in downsampling, SIFM in the neck reconstructs the fusion of multi-scale features in the neck, and SIIM in the neck fuses the deep semantic features of the backbone to guide the shallow features.

In special, for the input image F, the output features from the last three feature extraction of the backbone are denoted as

$$\begin{aligned} B^F=\left\{ B3,B4,B5 \right\} \end{aligned}$$

(1)

The output of SIIM and the output of the neck are denoted, respectively, as

$$\begin{aligned} S^F= & {} \left\{ S3,S4,S5 \right\} \end{aligned}$$

(2)

$$\begin{aligned} N^F= & {} \left\{ N3,N4,N5 \right\} \end{aligned}$$

(3)

Defect downsampling (DD)

For discrimination tasks such as defect classification and defect detection, most of the current convolutional neural network architectures utilize downsampling layers to reduce the space size of the feature map. For example, the widely used Max Pooling layer, Average Pooling layer and convolutional layers with step size larger than 1 are used for the downsampling task. However, sliding windows with step size larger than 1 may prevent good preservation of recognition details, which are crucial for defect detection tasks. In particular, the impact of this approach on tiny defects is significant and can lead to missed detection of tiny defects.

To effectively solve the above problems and enable the network to better detect tiny defects, this paper proposes the defect downsampling method (DD).The architecture of DD is shown in Fig. 2, which consists of two parts, one is the original convolutional layer with step size greater than 1 for initial downsampling, where Conv23 is a convolution with kernal size 3 and step size 2, BN is the batch normalization, and Activation is the SiLU activation function. The other part is the defect pooling layer (DPL) that complements the defect information for the convolutional layer with step size greater than 1. The architecture of the DPL is shown in Fig. 3.

The DPL architecture is divided into five processing steps, and the first step is Split(S). As shown in Fig. 4, the signal X is divided into four disjoint sets x0, x1, x2 and x3, which are closely related.

The second step is Extract (E). Given a set x0 which becomes E(x0) after the operation E, the processing of E(x0) is defined as

$$\begin{aligned} E(x0)= & {} Conv(k=1,channel=C/r)\rightarrow BN \nonumber \\{} & {} \rightarrow Conv(k=1,channel=C)\rightarrow SiLU \end{aligned}$$

(4)

where SiLU is the SiLU activation function, Conv is the convolution operation, BN is the batch normalization, C is the number of channels for a given image, and r is the scaling rate.

The third step is Minus (M). Given three sets x1, x2 and x3, the three sets become M(x1), M(x2) and M(x3) after the operation M, the process is defined as

$$\begin{aligned} M(x1)= & {} x1-E(x0) \end{aligned}$$

(5)

$$\begin{aligned} M(x2)= & {} x2-E(x0) \end{aligned}$$

(6)

$$\begin{aligned} M(x3)= & {} x3-E(x0) \end{aligned}$$

(7)

The fourth step is Concat (C). Given four sets E(x0), M(x1), M(x2) and M(x3), after the operation C, the four sets are combined into a large set C(x) with four times the number of channels, then the process of C(x) is defined as

$$\begin{aligned} C(x)= & {} Concat\left( E(x0),E(M(x1)\right) \nonumber \\{} & {} ,E\left( M(x2)\right) ,E\left( M(x3)),dim=1\right) \end{aligned}$$

(8)

where dim is the dimension and 1 is the channel dimension.

The fifth step is Fusion (F). Given a set C(x), the set becomes X’, the final output of the CDL structure, after the F operation. Then the processing of X’ is defined as

$$\begin{aligned} X^{'}= & {} Conv(k=1,channel=4C/r)\nonumber \\{} & {} \rightarrow BN\rightarrow Attention \!\rightarrow \! Conv(k\!=\!1,channel\!=\!C)\nonumber \\{} & {} \rightarrow SiLU \end{aligned}$$

(9)

Among them, Attention is the attention mechanism. We believe that channel reduction after aggregating different features may produce some loss, so we introduce the attention mechanism after the first convolution for channel reduction to compensate for the loss and confusion caused by channel reduction. The Attention here is ECANet [32].

Scale feature aggregation module (SIFM)

In this section, we first introduce the proposed SIFM, and then detail how the SIFM changes the PANet so that it becomes the new neck structure of TD-Net.

SIFM

The structure of SIFM is shown in Fig. 5. Feature representation and feature differentiation between tiny defects of industrial products and between tiny defects and background is the key to detection. By summing and stitching multi-scale information cascades, pre-filtering conflicts between multi-level features, and using an attention mechanism, the emphasis of useful features and the suppression of interference features in industrial product images can be achieved.

As shown in Fig. 5, two feature maps, Y and Z, are input in SIFM. C, H and W are the channels, height and width of the feature maps, respectively. First, the feature maps Y and Z are subjected to summing operation and stitching operation to become M with channel number 2C. Then, the feature map M is stitched after global maximum pooling and global average pooling, respectively, to obtain the global information in the feature map M. Note that the splicing operation is performed in the H dimension, while the W dimension data is compressed after splicing. Then, the feature map with global information is multiplied with the original feature map M after two one-dimensional convolution and Sigmoid activation, respectively, to obtain the final output F of SIFM.

New fusion network

In this section, we present the new neck structure proposed in this paper. As shown in Fig. 6, the differences between our proposed new neck structure and the FPN and PANet structures can be clearly seen.

In this paper, we use YOLOv5 as the baseline, and while following the PANet structure as the neck, we integrate our designed SIFM into PANet, replacing all the simple stitching operations in the original PANet structure. The pre-filtering conflict and key features are noticed by SIFM to achieve better fusion of multi-scale features.

Semantic information interaction module (SIIM)

In this paper, we consider that the defect detector neck is fused with features from top to bottom and bottom to top, and the feature fusion makes each achieve better feature representation capability after interacting with information from different scales. However, this feature fusion approach suffers from the drawback that the interaction between B3 and B5 (non-adjacent layers) is not sufficient. In addition, shallow feature maps such as B3 are more likely to capture tiny defects, and deeper feature maps such as B4 and B5 will inevitably result in missing information of tiny objects during dimensionality reduction, but B4 and B5 layers have stronger semantic information of defects. Therefore, if we use the deep semantic information of B4 and B5 layers to fully interact with B3 layers and pre-filter the conflicts between multi-level features of the neck for full integration of features at all levels, we can improve the tiny object defect detection capability of the whole defect detector.

For the above considerations, we propose the lightweight SIIM, whose structure is shown in Fig. 7.

We believe that the deep layer B5 has more channels and the shallow layer B3 has higher resolution. SIIM first introduces ECANet for B5 to help the B5 layer better focus on channel information, and SIIM introduces a coordinate attention mechanism for B3 to help the B3 layer better capture tiny defects. Meanwhile, the features of each layer have different semantic depths, and direct fusion by methods such as splicing will cause the problem of feature misalignment. To address this problem, we sum the deep B5 feature map with the B4 feature map, and also sum the shallow B3 feature map with the B4 feature map, using this process to pre-filter conflicts, and then stitch the feature maps of the two stages together. Then, the integrated features are further optimized using the modified lightweight BottleneckCSP. Finally, the obtained feature maps are further fused with B3 to guide the detection of tiny defects.

Specifically, in the modified BottleneckCSP, we reduce the number of channels of the hidden layer to 1/8 of the original one to make it lightweight.

Specifically, SIIM is a coordinate attention mechanism introduced for B3, which decomposes channel attention into two one-dimensional feature encoding processes that aggregate features along two spatial directions, respectively. The coordinate attention step can be formulated as

$$\begin{aligned} M_{h}= & {} Sigmoid\left( W_{1}\left( h_{a} \right) \right) \end{aligned}$$

(10)

$$\begin{aligned} M_{w}= & {} Sigmoid\left( W_{1}\left( w_{a} \right) \right) \end{aligned}$$

(11)

where Sigmoid is the sigmoid activation function, W$_{1}$ is the convolution with kernal size 1 and channel number C, h$_{a}$ is the attention in the height direction, and w$_{a}$ is the attention in the width direction. h$_{a}$ and w$_{a}$ steps can be formulated as

$$\begin{aligned} h_{a},w_{a}= & {} Concat\left( AvgPool_{h}\left( F \right) , AvgPool_{w}\left( F \right) \right) \nonumber \\{} & {} \rightarrow MLP\rightarrow Split \end{aligned}$$

(12)

where Split is the splitting operation, AvgPool$_{h}$ is the global average pooling for compression along the height direction, and AvgPool$_{w}$ is the global average pooling for compression along the width direction, which compress the feature map F$\in $R$^{C\times H\times W}$ to the size of F$\in $R$^{C\times 1\times W}$ and F$\in $R$^{C\times H\times 1}$. The MLP step can be formulated as follows:

$$\begin{aligned} MLP=W_{0}\left( F_{concat} \right) \rightarrow BN\rightarrow ReLU \end{aligned}$$

(13)

where W$_{0}$ is the 1$\times $1 convolution with the number of channels as C/r, r is the reduction rate, and BN is the batch normalization. Finally, M$_{h}$ and M$_{w}$ are multiplied with the input feature map F$\in $R$^{C\times H\times W}$ at the same time to obtain the final generated features.

Experimental results and analysis

Experimental setup

We conducted experiments on NEU–DET [33], GC10-DET [34] and Peking University PCB defect data sets [35] to verify the effectiveness of the proposed method. We use Pytorch to implement our network and experiments on NVIDIA A100 GPUs. TD-NET sets the learning rate to 0.01, the optimizer chooses SGD, the learning rate decay strategy is cosine learning rate decay, the training image size is 640$\times $640, the batch size is 32, all models are trained for 500 epochs and none of them use pre training weights.

NEU–DET

The NEU–DET data set is a defect classification data set. There are six types of defects in hot rolled steel sheets, including crazing, inclusion, patches, pitted surface, rolled-in scales, and scratches. The data set has 300 images on each defect type, for a total of 1800 images. In this paper, 1260 images are selected as the training set and 540 as the test set, with ratios of 0.7 and 0.3, respectively. sample images and annotations for each category in the NEU–DET data set are shown in Fig. 8.

GC10-DET

The GC10-DET data set published by Lv et al. contains ten types of steel surface defects, such as stamping, welds, crescent seams, water stains, oil stains, silk stains, inclusions, indentations, creases and waist folds. In this paper, we use 2294 images and set the ratio of training set to test set as 8:2 in the experiment, so there are 1835 samples for training and 459 samples for testing. The sample images and annotations of some categories in the GC10-DET data set are shown in Fig. 9.

Peking University PCB

The PCB defect data set of Peking University has six categories of defects: missing holes, mouse bites, open circuits, short circuits, straight punctures and false copper, and the data set contains a total of 693 images, which are trained in strict accordance with the original division of the data set into training and test sets. sample images and annotations of some categories in the PCB defect data set are shown in Fig. 10.

Evaluation indicators

The evaluation indicators for the performance of the model in this paper are precision (P), recall (R), F1 value, mAP@.5. The formulas for precision and recall are as follows:

$$\begin{aligned} P= & {} \frac{TP}{TP+FP} \end{aligned}$$

(14)

$$\begin{aligned} R= & {} \frac{TP}{TP+FN} \end{aligned}$$

(15)

The formula for calculating the F1 value is as follows:

$$\begin{aligned} F1=2\frac{P\times R}{P+R} \end{aligned}$$

(16)

where P is the precision and R is the recall. The formula for calculating mAP is as follows:

$$\begin{aligned} mAP=\frac{ {\textstyle \sum _{n=1}^{N}} \int _{0}^{1} p\left( r \right) dr }{N} \end{aligned}$$

(17)

Furthermore, mAP@.5 represents mAP with an IOU threshold of 0.5.

Ablation study

Lightweight processing

The goal of this paper is to study lightweight detectors for small defects in industrial products. To ensure that the network structure can remain lightweight even after the overall modification, we made further beneficial modifications to YOLOv5s before the experiment, and this modification did not cause any loss in its performance. The original network structure undergoes deletion or replacement operations as an effective way to reduce the number of network parameters. In this study, we only lighten the neck of the network considering that the backbone network has an important position in the feature extraction of the whole network. The modifications are as follows: we replaced the four convolution operations in the neck of YOLOv5s with Ghost convolution [36]. This method reduces the number of network structure parameters and also deals well with the potential overfitting problem. With the above method, we achieved an mAP of 74.5 on NEU–DET. the improvement values of each metric are shown in Table 1.

Table 1 Effect of lightweight processing on baseline

Full size table

Comprehensive performance of the network structure

We performed incremental performance tests based on a baseline for our proposed modules, including DD, SIFM, and SIIM. Table 2 shows the performance improvements when adding each component separately. It can be seen that our proposed approach achieves a large accuracy improvement based on lightweight processing and does not increase the number of parameters too much, also at a small cost in terms of inference speed. It is worth noting that the accuracy improvement is more important in meeting the real-time requirements. We first add DD after the lightweighting process to replace the original stepwise convolutional downsampling as the new backbone network downsampling method, which has a large improvement in accuracy without much cost in terms of parameters and efficiency. This shows that it is feasible to make full use of the snapshot information to make up for the tiny defective features lost by the step-length convolution. Based on the DD, we add the SIFM module to fully replace the simple Concat operation in the PANet structure, allowing better fusion and information interaction of defect features at different scales. By using SIFM, we achieve the improved accuracy of mAP with essentially no increase in parameters and inference cost. This also illustrates the lack of sufficient cross-layer information fusion of features in the backbone network for simple Concat operation in the neck network, which also leads to incorrect identification of defective features of industrial products. To better solve this problem, SIIM achieves an accuracy improvement of 1.1 mAP by augmenting the semantic information of deep features with the semantic representation of shallow features, which reduces the rate of missing detection of tiny defects in industrial products.

Table 2 Comprehensive performance of the network structure

Full size table

Table 3 Comparison of test results on the NEU–DET dataset

Full size table

Table 4 Comparison of test results on the PCB data set at Peking University

Full size table

Table 5 Comparison of test results on the GC-10 data set

Full size table

Comparison experiments

In this paper, we first select some SOTA methods on NEU–DET data set to compare with the TD-Net proposed in this paper. In addition, this paper tests the effectiveness of TD-Net on GC10-DET data set and Peking University PCB defect data set to verify the generalization ability of the proposed method in this paper. The comparison results of several metrics between the TD-Net method proposed in this paper and other SOTA methods on the NEU–DET data set are shown in Table 3, which are all results obtained from testing in the same environment. Among them, TD-Net achieves the best results with 76.8$\%$ on mAP, which far exceeds the performance of other methods. Compared with the YOLOv3-tiny (53.3$\%$ mAP), YOLOv4-tiny (52.8$\%$ mAP), YOLOv7-tiny (74.5$\%$ mAP) and baseline YOLOv5s (73.2$\%$ mAP) methods with the same parametric level and real-time performance, TD-Net improves by 23.5$\%$, 24$\%$, 2.3$\%$ and 3.6$\%$, respectively. Therefore, the TD-Net proposed in this paper is very suitable for the steel defect detection scenario. Meanwhile the comparison results of TD-Net on Peking University PCB defect data set and GC10-DET data set and are shown in Tables 4 and 5, respectively. For the PCB defect data set, it can be seen that the TD-Net proposed in this paper achieves the highest mAP with a score of 96.2$\%$. For the GC10-DET data set, the proposed TD-Net achieves 71.5$\%$ mAP, which also achieves the highest score compared to the SOTA method. In addition, this paper reproduces the research results of Zheng et al. [37] who improved on Yolov5x with good results, and to ensure the parametric number is comparable, their research results are reproduced in this paper with low parametric number of Yolov5s, called Zheng-s. Therefore, it can be seen that the TD-Net proposed in this paper is very suitable for the defect detection scenario of industrial products of industrial products. Figure 11 shows the mAP of the proposed model in this paper with other SOTA models on three data sets. It can be seen by Fig. 11 that the model in this paper obtains the best results in tiny object detection.

Figure 12 shows the detection results on three data sets, where a is the detection result on the PCB defect data set from Peking University, b is the detection result on the GC10-DET data set, and c is the detection result on the NEU–DET data set. As shown in the figure, it can be seen that the defect size is very small, but our TD-Net can still accurately classify and locate tiny defects in hot-rolled steel and PCBs, proving the detection capability and generalization ability of TD-Net in defect detection scenarios.

Conclusion

In this paper, we want to solve the problem of information loss of tiny defects during downsampling of detector backbone network and conflict pre-filtering of deep and shallow semantic feature fusion, so as to improve the defect detection capability of industrial products. For this purpose, a TD-Net for tiny defect detection is proposed in this paper. for the downsampling of the backbone network, a DD module is proposed to reduce the information loss of tiny defects. For the conflict pre-filtering problem of deep and shallow semantic feature fusion, SIIM and SIFM are proposed, respectively. SIIM fuses deep semantic information to guide shallow features for better detection of minor defects, and SIFM optimizes the structure of feature fusion network PAN by cascade fusion and attention focus. The experimental results on several industrial product defect data sets validate the effectiveness of the proposed method in this paper. Furthermore, in the future, we will continue our research in the field of industrial surface defect detection, specifically follow up the problem of defect detection in glass bottles, and try to follow up the research related to unsupervised cross-domain, generalized AI macromodels for industrial domains.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. Availability of data and materials Available upon request at the corresponding author.

Code Availability

Available upon request at the corresponding author.

References

Song X, Sun P, Song S, Stojanovic V (2023) Quantized neural adaptive finite-time preassigned performance control for interconnected nonlinear systems. Neural Comput Appl 1–18
Tao H, Qiu J, Chen Y, Stojanovic V, Cheng L (2023) Unsupervised cross-domain rolling bearing fault diagnosis based on time-frequency information fusion. J Franklin Inst 360(2):1454–1477
Article Google Scholar
Stojanovic V (2023) Fault-tolerant control of a hydraulic servo actuator via adaptive dynamic programming. Math Model Control
Song X, Sun P, Song S, Stojanovic V (2023) Finite-time adaptive neural resilient dsc for fractional-order nonlinear large-scale systems against sensor-actuator faults. Nonlinear Dyn 1–16
Xin H, Chen Z, Wang B (2021) Pcb electronic component defect detection method based on improved yolov4 algorithm. J Phys Conf Ser 1827:012167 (IOP Publishing)
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International conference on computer vision, pp 1440–1448
Zhang D, Song K, Xu J, He Y, Niu M, Yan Y (2020) Mcnet: multiple context information segmentation network of no-service rail surface defects. IEEE Trans Instrum Meas 70:1–9
Google Scholar
Su B, Chen H, Zhou Z (2021) Baf-detector: an efficient cnn-based detector for photovoltaic cell defect detection. IEEE Trans Industr Electron 69(3):3161–3171
Article Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Ultralytics: YOLOv5 v6.1. https://github.com/ultralytics/yolov5. Accessed 22 Feb 2022
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Chen H, Xu S, Liu K, Sun H (2016) Surface defect detection of steel strip based on spectral residual visual saliency. Opt Precis Eng 24(10):2572–2580
Article Google Scholar
Medina R, Llamas J, Gómez-García-Bermejo J, Zalama E, Segarra MJ (2017) Crack detection in concrete tunnels using a gabor filter invariant to rotation. Sensors 17(7):1670
Article Google Scholar
Liu Y, Xu K, Xu J (2019) An improved mb-lbp defect recognition approach for the surface of steel plates. Appl Sci 9(20):4222
Article Google Scholar
Cao C-T, Do V-P, Lee B-R (2019) Tube defect detection algorithm under noisy environment using feature vector and neural networks. Int J Precis Eng Manuf 20(4):559–568
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018)Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Tu Y, Ling Z, Guo S, Wen H (2020) An accurate and real-time surface defects detection method for sawn lumber. IEEE Trans Instrum Meas 70:1–11
Google Scholar
Zhou X, Fang H, Liu Z, Zheng B, Sun Y, Zhang J, Yan C (2021) Dense attention-guided cascaded network for salient object detection of strip steel surface defects. IEEE Trans Instrum Meas 71:1–14
Google Scholar
Dong H, Song K, He Y, Xu J, Yan Y, Meng Q (2019) Pga-net: pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Trans Industr Inf 16(12):7448–7458
Article Google Scholar
Wang W, Mi C, Wu Z, Lu K, Long H, Pan B, Li D, Zhang J, Chen P, Wang B (2022) A real-time steel surface defect detection approach with high accuracy. IEEE Trans Instrum Meas 71:1–10
Article Google Scholar
Yu J, Cheng X, Li Q (2021) Surface defect detection of steel strips based on anchor-free network with channel attention and bidirectional feature fusion. IEEE Trans Instrum Meas 71:1–10
Google Scholar
Zeng N, Wu P, Wang Z, Li H, Liu W, Liu X (2022) A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans Instrum Meas 71:1–14
Google Scholar
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, WA, USA, pp 13–19
He Y, Song K, Meng Q, Yan Y (2019) An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans Instrum Meas 69(4):1493–1504
Article Google Scholar
Lv X, Duan F, Jiang J-J, Fu X, Gan L (2020) Deep metallic surface defect detection: the new benchmark and detection network. Sensors 20(6):1562
Article Google Scholar
Wei P Public Synthetic PCB Dataset. http://robotics.pkusz.edu.cn/resources/dataset/. Accessed 12 May 2021
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1580–1589
Zheng L, Wang X, Wang Q, Wang S, Liu X (2021) A fabric defect detection method based on improved yolov5. In: 2021 7th International conference on computer and communications (ICCC). IEEE, pp 620–624

Download references

Funding

This work was funded by Key R and D Program of Shandong Province, China (Grant no. 2023CXGC010112); The Taishan Scholars Program (Grant NO. tsqn202103097).

Author information

Authors and Affiliations

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Wendong, Jinan, 250013, Shandong, China
Rui Shao, Mingle Zhou, Min Li, Delong Han & Gang Li
Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Wendong, Jinan, 250013, Shandong, China
Rui Shao, Mingle Zhou, Min Li, Delong Han & Gang Li

Authors

Rui Shao
View author publications
You can also search for this author in PubMed Google Scholar
Mingle Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Delong Han
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Li.

Ethics declarations

Conflict of interest

The authors have no relevant financial or nonfinancial interests to disclose.

Ethics approval

Not applicable.

Consent to participate

All authors consent to participate.

Consent for publication

All authors consent to publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shao, R., Zhou, M., Li, M. et al. TD-Net:tiny defect detection network for industrial products. Complex Intell. Syst. 10, 3943–3954 (2024). https://doi.org/10.1007/s40747-024-01362-x

Download citation

Received: 26 September 2023
Accepted: 22 January 2024
Published: 29 February 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s40747-024-01362-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

TD-Net:tiny defect detection network for industrial products

Abstract

Similar content being viewed by others

Fruit ripeness identification using YOLOv8 model

State of the Art in Defect Detection Based on Machine Vision

Deep Industrial Image Anomaly Detection: A Survey

Introduction

Related work

Traditional detection method

Deep learning detection method

Proposed network

Defect downsampling (DD)

Scale feature aggregation module (SIFM)

SIFM

New fusion network

Semantic information interaction module (SIIM)

Experimental results and analysis

Experimental setup

NEU–DET

GC10-DET

Peking University PCB

Evaluation indicators

Ablation study

Lightweight processing

Comprehensive performance of the network structure

Comparison experiments

Conclusion

Data availability statement

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation