Skip to main content
Log in

Multi-modal background-aware for defect semantic segmentation with limited data

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

Visual defect detection is widely used in intelligent manufacturing to achieve intelligent detection of product quality. Two main challenges remain in industrial applications. One is the scarcity of defect samples and the other is the weak texture variation of industrial defects. The above problems lead to the application of RGB image-based industrial defect segmentation. To this end, we propose a multi-modal background-aware network (MMBA-Net) for few-shot defect (2D+3D) segmentation with limited data, which can segment texture and structural defects in unseen and seen domains (objects). To synthesize the perception capabilities of different imaging conditions, MMBA-Net exploits the point cloud to provide spatial information for the RGB images. Furthermore, we found that background regions are perceptually consistent within an industrial image, which can be leveraged to discriminate between foreground and background regions. To implement this idea, we model correlation learning between multi-modal query samples and multi-modal normal (defect-free) samples as an optimal transport problem, establishing robust multi-modal background correlations between query and normal samples across different modalities. Experiments were conducted on real-world industrial products and food datasets, demonstrating that the proposed method can perform effective base learning and meta-learning on a small number of defective samples (approximately 15–25 defective training samples) to achieve effective segmentation of defects in the seen and unseen domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Bao, Y., Song, K., Liu, J., Wang, Y., Yan, Y., Yu, H., & Li, X. (2021). Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Transactions on Instrumentation and Measurement, 70, 1–11. https://doi.org/10.1109/TIM.2021.3083561

    Article  Google Scholar 

  • Bergmann, P., Jin, X., Sattlegger, D., & Steger, C. (2021). The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization.

  • Cao, J., Yang, G., & Yang, X. (2020). A pixel-level segmentation convolutional neural network based on deep feature fusion for surface defect detection. IEEE Transactions on Instrumentation and Measurement, 70, 1–12.

    Google Scholar 

  • Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc.

  • Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. Preprint at http://arxiv.org/abs/1706.05587

  • Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (pp. 801–818).

  • Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2, NIPS’13 (pp. 2292–2300). Curran Associates Inc.

  • Dimitriou, N., Leontaris, L., Vafeiadis, T., Ioannidis, D., Wotherspoon, T., Tinker, G., & Tzovaras, D. (2020). Fault diagnosis in microelectronics attachment via deep learning analysis of 3-d laser scans. IEEE Transactions on Industrial Electronics, 67(7), 5748–5757. https://doi.org/10.1109/TIE.2019.2931220

    Article  Google Scholar 

  • Dong, H., Song, K., He, Y., Xu, J., Yan, Y., & Meng, Q. (2019). Pga-net: Pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Transactions on Industrial Informatics, 16(12), 7448–7458.

    Article  Google Scholar 

  • Dong, N., & Xing, E. P. (2018). Few-shot semantic segmentation with prototype learning. BMVC, 3, 4.

    Google Scholar 

  • Dong, X., Taylor, C. J., & Cootes, T. F. (2022). Defect classification and detection using a multitask deep one-class cnn. IEEE Transactions on Automation Science and Engineering, 19(3), 1719–1730. https://doi.org/10.1109/TASE.2021.3109353

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).

  • Hong, S., Cho, S., Nam, J., Lin, S., & Kim, S. (2022a). Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. In European Conference on Computer Vision (pp. 108–126).

  • Hong, S., Cho, S., Nam, J., Lin, S., & Kim, S. W. (2022b). Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. Preprint at http://arXiv.org/abs/2207.10866

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448–456). PMLR.

  • Kang, D., & Cho, M. (2022). Integrative few-shot learning for classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

  • Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., & Kim, J. (2021). Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8334–8343).

  • Li, Y., Zhao, W., & Pan, J. (2017). Deformable patterned fabric defect detection with fisher criterion-based deep learning. IEEE Transactions on Automation Science and Engineering, 14(2), 1256–1264. https://doi.org/10.1109/TASE.2016.2520955

    Article  Google Scholar 

  • Lin, H., Li, B., Wang, X., Shu, Y., & Niu, S. (2019). Automated defect inspection of led chip using deep convolutional neural network. Journal of Intelligent Manufacturing, 30, 2525–2534.

    Article  Google Scholar 

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 936–944). https://doi.org/10.1109/CVPR.2017.106

  • Liu, K., Lyu, S., & Lu, Y. (2022). Few-shot segmentation for prohibited items inspection with patch-based self-supervised learning and prototype reverse validation. IEEE Transactions on Multimedia, 1, 1–1. https://doi.org/10.1109/TMM.2022.3176546

    Article  Google Scholar 

  • Liu, W., Liu, Z., Wang, H., & Han, Z. (2020). An automated defect detection approach for catenary rod-insulator textured surfaces using unsupervised learning. IEEE Transactions on Instrumentation and Measurement, 69(10), 8411–8423.

    Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431–3440).

  • Ma, S., Song, K., Niu, M., Tian, H., & Yan, Y. (2022). Cross-scale fusion and domain adversarial network for generalizable rail surface defect segmentation on unseen datasets. Journal of Intelligent Manufacturing, 35, 1–20.

    Google Scholar 

  • Min, J., Kang, D., & Cho, M. (2021). Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  • Ni, X., Ma, Z., Liu, J., Shi, B., & Liu, H. (2021). Attention network for rail surface defect detection via consistency of intersection-over-union (iou)-guided center-point estimation. IEEE Transactions on Industrial Informatics, 18(3), 1694–1705.

    Article  Google Scholar 

  • Niu, M., Song, K., Huang, L., Wang, Q., Yan, Y., & Meng, Q. (2021). Unsupervised saliency detection of rail surface defects using stereoscopic images. IEEE Transactions on Industrial Informatics, 17(3), 2271–2281. https://doi.org/10.1109/TII.2020.3004397

    Article  Google Scholar 

  • Oh, Y., Kim, B., & Ham, B. (2021). Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6909–6918). https://doi.org/10.1109/CVPR46437.2021.00684

  • Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., & Sivic, J. (2018). Neighbourhood consensus networks. In NeurIPS.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  Google Scholar 

  • Shaban, A., Bansal, S., Liu, Z., Essa, I., & Boots, B. (2017). One-shot learning for semantic segmentation. Preprint at http://arxiv.org/abs/1709.03410

  • Shan, D., Zhang, Y., Coleman, S. A., Kerr, D., Liu, S., & Hu, Z. (2022). Unseen-material few-shot defect segmentation with optimal bilateral feature transport network. IEEE Transactions on Industrial Informatics, 1, 1–11. https://doi.org/10.1109/TII.2022.3216900

    Article  Google Scholar 

  • Song, G., Song, K., & Yan, Y. (2020). Edrnet: Encoder–decoder residual network for salient object detection of strip steel surface defects. IEEE Transactions on Instrumentation and Measurement, 69(12), 9709–9719.

    Article  Google Scholar 

  • Tabernik, D., Šela, S., Skvarč, J., & Skočaj, D. (2020). Segmentation-based deep-learning approach for surface-defect detection. Journal of Intelligent Manufacturing, 31(3), 759–776.

    Article  Google Scholar 

  • Tian, Z., Zhao, H., Shu, M., Yang, Z., Li, R., & Jia, J. (2022). Prior guided feature enrichment network for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(2), 1050–1065. https://doi.org/10.1109/TPAMI.2020.3013717

    Article  Google Scholar 

  • Wang, J., Song, K., Zhang, D., Niu, M., & Yan, Y. (2022). Collaborative learning attention network based on rgb image and depth image for surface defect inspection of no-service rail. IEEE/ASME Transactions on Mechatronics, 1, 1–11. https://doi.org/10.1109/TMECH.2022.3167412

  • Wang, K., Liew, J. H., Zou, Y., Zhou, D., & Feng, J. (2019). Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9197–9206).

  • Wu, Y., & He, K. (2018). Group normalization. In The European Conference on Computer Vision (ECCV).

  • Xie, Q., Li, D., Xu, J., Yu, Z., & Wang, J. (2019). Automatic detection and classification of sewer defects via hierarchical deep learning. IEEE Transactions on Automation Science and Engineering, 16(4), 1836–1847. https://doi.org/10.1109/TASE.2019.2900170

  • Yang, B., Liu, C., Li, B., Jiao, J., & Ye, Q. (2020). Prototype mixture models for few-shot semantic segmentation. In Proceedings of the European Conference on Computer Vision (pp. 763–778). Springer

  • Yang, H., Zhou, Q., Song, K., & Yin, Z. (2020). An anomaly feature-editing-based adversarial network for texture defect visual inspection. IEEE Transactions on Industrial Informatics, 17(3), 2220–2230.

    Article  Google Scholar 

  • Yu, R., Guo, B., & Yang, K. (2022). Selective prototype network for few-shot metal surface defect segmentation. IEEE Transactions on Instrumentation and Measurement, 71, 1–10. https://doi.org/10.1109/TIM.2022.3196447

  • Zhang, X., Wei, Y., Yang, Y., & Huang, T. S. (2020). Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics, 50(9), 3855–3865.

    Article  Google Scholar 

  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2881–2890).

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61973066), Major Science and Technology Projects of Liaoning Province (Grant No. 2021JH1/10400049) and Foundation of Key Laboratory of Equipment Reliability (Grant No. WD2C20205500306).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunzhou Zhang.

Ethics declarations

Conflict of interest

The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shan, D., Zhang, Y. & Liu, S. Multi-modal background-aware for defect semantic segmentation with limited data. J Intell Manuf (2024). https://doi.org/10.1007/s10845-024-02373-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10845-024-02373-8

Keywords

Navigation