Abstract
Visual defect detection is widely used in intelligent manufacturing to achieve intelligent detection of product quality. Two main challenges remain in industrial applications. One is the scarcity of defect samples and the other is the weak texture variation of industrial defects. The above problems lead to the application of RGB image-based industrial defect segmentation. To this end, we propose a multi-modal background-aware network (MMBA-Net) for few-shot defect (2D+3D) segmentation with limited data, which can segment texture and structural defects in unseen and seen domains (objects). To synthesize the perception capabilities of different imaging conditions, MMBA-Net exploits the point cloud to provide spatial information for the RGB images. Furthermore, we found that background regions are perceptually consistent within an industrial image, which can be leveraged to discriminate between foreground and background regions. To implement this idea, we model correlation learning between multi-modal query samples and multi-modal normal (defect-free) samples as an optimal transport problem, establishing robust multi-modal background correlations between query and normal samples across different modalities. Experiments were conducted on real-world industrial products and food datasets, demonstrating that the proposed method can perform effective base learning and meta-learning on a small number of defective samples (approximately 15–25 defective training samples) to achieve effective segmentation of defects in the seen and unseen domains.
Similar content being viewed by others
References
Bao, Y., Song, K., Liu, J., Wang, Y., Yan, Y., Yu, H., & Li, X. (2021). Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Transactions on Instrumentation and Measurement, 70, 1–11. https://doi.org/10.1109/TIM.2021.3083561
Bergmann, P., Jin, X., Sattlegger, D., & Steger, C. (2021). The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization.
Cao, J., Yang, G., & Yang, X. (2020). A pixel-level segmentation convolutional neural network based on deep feature fusion for surface defect detection. IEEE Transactions on Instrumentation and Measurement, 70, 1–12.
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc.
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. Preprint at http://arxiv.org/abs/1706.05587
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (pp. 801–818).
Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2, NIPS’13 (pp. 2292–2300). Curran Associates Inc.
Dimitriou, N., Leontaris, L., Vafeiadis, T., Ioannidis, D., Wotherspoon, T., Tinker, G., & Tzovaras, D. (2020). Fault diagnosis in microelectronics attachment via deep learning analysis of 3-d laser scans. IEEE Transactions on Industrial Electronics, 67(7), 5748–5757. https://doi.org/10.1109/TIE.2019.2931220
Dong, H., Song, K., He, Y., Xu, J., Yan, Y., & Meng, Q. (2019). Pga-net: Pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Transactions on Industrial Informatics, 16(12), 7448–7458.
Dong, N., & Xing, E. P. (2018). Few-shot semantic segmentation with prototype learning. BMVC, 3, 4.
Dong, X., Taylor, C. J., & Cootes, T. F. (2022). Defect classification and detection using a multitask deep one-class cnn. IEEE Transactions on Automation Science and Engineering, 19(3), 1719–1730. https://doi.org/10.1109/TASE.2021.3109353
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
Hong, S., Cho, S., Nam, J., Lin, S., & Kim, S. (2022a). Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. In European Conference on Computer Vision (pp. 108–126).
Hong, S., Cho, S., Nam, J., Lin, S., & Kim, S. W. (2022b). Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. Preprint at http://arXiv.org/abs/2207.10866
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448–456). PMLR.
Kang, D., & Cho, M. (2022). Integrative few-shot learning for classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., & Kim, J. (2021). Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8334–8343).
Li, Y., Zhao, W., & Pan, J. (2017). Deformable patterned fabric defect detection with fisher criterion-based deep learning. IEEE Transactions on Automation Science and Engineering, 14(2), 1256–1264. https://doi.org/10.1109/TASE.2016.2520955
Lin, H., Li, B., Wang, X., Shu, Y., & Niu, S. (2019). Automated defect inspection of led chip using deep convolutional neural network. Journal of Intelligent Manufacturing, 30, 2525–2534.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 936–944). https://doi.org/10.1109/CVPR.2017.106
Liu, K., Lyu, S., & Lu, Y. (2022). Few-shot segmentation for prohibited items inspection with patch-based self-supervised learning and prototype reverse validation. IEEE Transactions on Multimedia, 1, 1–1. https://doi.org/10.1109/TMM.2022.3176546
Liu, W., Liu, Z., Wang, H., & Han, Z. (2020). An automated defect detection approach for catenary rod-insulator textured surfaces using unsupervised learning. IEEE Transactions on Instrumentation and Measurement, 69(10), 8411–8423.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431–3440).
Ma, S., Song, K., Niu, M., Tian, H., & Yan, Y. (2022). Cross-scale fusion and domain adversarial network for generalizable rail surface defect segmentation on unseen datasets. Journal of Intelligent Manufacturing, 35, 1–20.
Min, J., Kang, D., & Cho, M. (2021). Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Ni, X., Ma, Z., Liu, J., Shi, B., & Liu, H. (2021). Attention network for rail surface defect detection via consistency of intersection-over-union (iou)-guided center-point estimation. IEEE Transactions on Industrial Informatics, 18(3), 1694–1705.
Niu, M., Song, K., Huang, L., Wang, Q., Yan, Y., & Meng, Q. (2021). Unsupervised saliency detection of rail surface defects using stereoscopic images. IEEE Transactions on Industrial Informatics, 17(3), 2271–2281. https://doi.org/10.1109/TII.2020.3004397
Oh, Y., Kim, B., & Ham, B. (2021). Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6909–6918). https://doi.org/10.1109/CVPR46437.2021.00684
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., & Sivic, J. (2018). Neighbourhood consensus networks. In NeurIPS.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Shaban, A., Bansal, S., Liu, Z., Essa, I., & Boots, B. (2017). One-shot learning for semantic segmentation. Preprint at http://arxiv.org/abs/1709.03410
Shan, D., Zhang, Y., Coleman, S. A., Kerr, D., Liu, S., & Hu, Z. (2022). Unseen-material few-shot defect segmentation with optimal bilateral feature transport network. IEEE Transactions on Industrial Informatics, 1, 1–11. https://doi.org/10.1109/TII.2022.3216900
Song, G., Song, K., & Yan, Y. (2020). Edrnet: Encoder–decoder residual network for salient object detection of strip steel surface defects. IEEE Transactions on Instrumentation and Measurement, 69(12), 9709–9719.
Tabernik, D., Šela, S., Skvarč, J., & Skočaj, D. (2020). Segmentation-based deep-learning approach for surface-defect detection. Journal of Intelligent Manufacturing, 31(3), 759–776.
Tian, Z., Zhao, H., Shu, M., Yang, Z., Li, R., & Jia, J. (2022). Prior guided feature enrichment network for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(2), 1050–1065. https://doi.org/10.1109/TPAMI.2020.3013717
Wang, J., Song, K., Zhang, D., Niu, M., & Yan, Y. (2022). Collaborative learning attention network based on rgb image and depth image for surface defect inspection of no-service rail. IEEE/ASME Transactions on Mechatronics, 1, 1–11. https://doi.org/10.1109/TMECH.2022.3167412
Wang, K., Liew, J. H., Zou, Y., Zhou, D., & Feng, J. (2019). Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9197–9206).
Wu, Y., & He, K. (2018). Group normalization. In The European Conference on Computer Vision (ECCV).
Xie, Q., Li, D., Xu, J., Yu, Z., & Wang, J. (2019). Automatic detection and classification of sewer defects via hierarchical deep learning. IEEE Transactions on Automation Science and Engineering, 16(4), 1836–1847. https://doi.org/10.1109/TASE.2019.2900170
Yang, B., Liu, C., Li, B., Jiao, J., & Ye, Q. (2020). Prototype mixture models for few-shot semantic segmentation. In Proceedings of the European Conference on Computer Vision (pp. 763–778). Springer
Yang, H., Zhou, Q., Song, K., & Yin, Z. (2020). An anomaly feature-editing-based adversarial network for texture defect visual inspection. IEEE Transactions on Industrial Informatics, 17(3), 2220–2230.
Yu, R., Guo, B., & Yang, K. (2022). Selective prototype network for few-shot metal surface defect segmentation. IEEE Transactions on Instrumentation and Measurement, 71, 1–10. https://doi.org/10.1109/TIM.2022.3196447
Zhang, X., Wei, Y., Yang, Y., & Huang, T. S. (2020). Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics, 50(9), 3855–3865.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2881–2890).
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61973066), Major Science and Technology Projects of Liaoning Province (Grant No. 2021JH1/10400049) and Foundation of Key Laboratory of Equipment Reliability (Grant No. WD2C20205500306).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shan, D., Zhang, Y. & Liu, S. Multi-modal background-aware for defect semantic segmentation with limited data. J Intell Manuf (2024). https://doi.org/10.1007/s10845-024-02373-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10845-024-02373-8