Abstract
Correctly locating transmission line defects and taking timely remedial measures are essential to ensure power systems’ safety. Convolutional neural networks (CNNs) are commonly used in defect detection in transmission line inspection images, but the local nature of the convolution operation limits the detector's performance. Transformers have become more and more prominent in the field of computer vision because of their global computing function. This paper proposes a transmission line image defect detection method that combines CNN and Transformer comprehensively. In particular, an enhanced local perception unit is designed to reduce false and missed detections of small and occluded objects. The problem of the high computation and complexity of the Multi-Head Self-Attention module is solved via a lightweight self-attention method. In addition, an adaptive multi-scale fusion module is designed to extract more effective fusion features and improve the model’s robustness. The numerical realization of the proposed method versus Faster Region-based Convolutional Neural Network (Faster R-CNN), Cascade R-CNN, DEtection TRansformer (DETR)-R50, You Only Look One-level Feature (YOLOF), You Only Look One X-Large (YOLOX-L) and Swin Transformer (Swin-T) proved its superiority in the average accuracy of transmission line image defect detection.
Similar content being viewed by others
References
Zhai Y, Wang Q, Yang X et al (2022) Multi-fitting detection on transmission line based on cascade reasoning graph network. IEEE Trans Power Delivery 37:4858–4868. https://doi.org/10.1109/TPWRD.2022.3161124
Zhai Y, Yang X, Wang Q et al (2022) Hybrid knowledge R-CNN for transmission line multifitting detection. IEEE Trans Instrum Meas 70:5013312. https://doi.org/10.1109/TIM.2021.3096600
Gonzalez RC (2018) Deep convolutional neural networks. IEEE Signal Process Mag 35:79–87. https://doi.org/10.1109/MSP.2018.2842646
Girshick R, Donahue J, Darrell T et al (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38:142–158. https://doi.org/10.1109/TPAMI.2015.2437384
Redmon J, Divvala S, Girshick R, et al. (2016) You only look once: Unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788. Doi: https://doi.org/10.1109/CVPR.2016.91
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Kim K, Wu BC, Dai XL, et al. (2021) Rethinking the self-attention in vision transformers. In: IEEE/CVF conference on computer vision and pattern recognition workshops, pp 3065–3069. https://doi.org/10.1109/CVPRW53098.2021.00342
Zhang XM, Sun GY, Jia XP, et al. (2022) Spectral-spatial self-attention networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens. Doi: https://doi.org/10.1109/TGRS.2021.3102143
Wang JM, Sun X, Chen Q et al (2022) Information-enhanced hierarchical self-attention network for multiturn dialog generation. IEEE Trans Comput Soc Syst. https://doi.org/10.1109/TCSS.2022.3172699
Cai Z, Vasconcelos N (2021) Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell 43:1483–1498. https://doi.org/10.1109/TPAMI.2019.2956516
Ge Z, Liu S, Wang F, et al (2021) YOLOX: exceeding YOLO series in 2021. https://doi.org/10.48550/arXiv.2107.08430
Liu Z, Lin YT, Cao Y, et al (2021) Swin transformer: hierarchical vision transformer using shifted windows.In: IEEE/CVF international conference on computer vision 9992–10002. Doi: https://doi.org/10.1109/ICCV48922.2021.00986
Kenny EM, Keane MT (2021) Explaining deep learning using examples: Optimal feature weighting methods for twin systems using post hoc, explanation-by-example in XAI. Knowl Based Syst 233:107530. https://doi.org/10.1016/j.knosys.2021.107530
Zhao ZB, Li YX, Zhen Z et al (2020) Typical fittings detection method with faster R-CNN combining KL divergence and shape constraints. High Volt Eng 46:3018–3026. https://doi.org/10.13336/j.1003-6520.hve.20200507023
Song W, Zuo D, Deng BF et al (2016) Detection of corrosion defects in high voltage transmission lines. Chin J Sci Instrum 37:113–117. https://doi.org/10.19650/j.cnki.cjsi.2016.s1.019
Jin LJ, Yan SJ, Liu Y (2012) Anti-vibration hammer identification based on Haar-like features and cascaded AdaBoost algorithm. J Syst Simul 24:1806–1809. https://doi.org/10.16182/j.cnki.joss.2012.09.022
Bai YJ, Zhao R, Gu FQ et al (2019) Multi-target detection and fault recognition image processing method. High Volt Eng 45:3504–3511. https://doi.org/10.13336/j.1003-6520.hve.20191031014
Tang Y, Han J, Wei WL et al (2018) Research on part recognition and defect detection of transmission line in deep learning. Electr Meas Technol 41:60–65. https://doi.org/10.19651/j.cnki.emt.1701266
Li JF, Wang QR, Li M (2017) Electric equipment image recognition based on deep learning and random forest. High Volta Eng 43:3705–3711. https://doi.org/10.13336/j.1003-6520.hve.20171031028
Qi YC, Jiang AX, Zhao ZB et al (2019) Fittings detection method in patrol images of transmission line based on improved SSD. Electr Meas Instrum 56:7–12. https://doi.org/10.19753/j.issn1001-1390.2019.022.002
Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Ren SQ, He KM, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Liu W, Anguelov D, Erhan D, et al. (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Tian Z, Shen C H, Chen H, et al. (2019) FCOS: Fully convolutional one-stage object detection. In: IEEE/CVF international conference on computer vision, pp 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106
Devlin J, Chang MW, Lee K et al (2019) BERT: pre-training of deep bidirectional Transformers for language understanding. Assoc Comput Ling. https://doi.org/10.18653/v1/N19-1423
Touvron H, Cord M, Douze M et al (2021) Training data-efficient image Transformers and distillation through attention. Int Conf Mach Learn 139:7358–7367. https://doi.org/10.48550/arXiv.2012.12877
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR international conference on learning representations. https://doi.org/10.48550/arXiv.2010.11929
Carion N, Massa F, Synnaeve G, et al (2020) End-to-end object detection with transformers. In: ECCV European conference on computer vision, pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Zhu X, Su W, Lu L, et al. (2021) Deformable DETR: deformable Transformers for end-to-end object detection. In: International conference on learning representations. https://doi.org/10.48550/arXiv.2010.04159
Sun Z, Cao S, Yang Y, et al. (2021) Rethinking transformer-based set prediction for object detection. In: IEEE/CVF international conference on computer vision, pp 3611–3620. https://doi.org/10.1109/ICCV48922.2021.00359
Li F, Zhang H, Liu S, et al. (2022) DN-DETR: Accelerate DETR training by introducing query deNoising. In: IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627. https://doi.org/10.1109/CVPR52688.2022.01325
Bodla N, Singh B, Chellappa R, et al (2017) Soft-NMS--improving object detection with one line of code. In: 2017 IEEE international conference on computer vision (ICCV), pp 5561–5569. https://doi.org/10.1109/ICCV.2017.593
Li X, Xu F, Xia R et al (2022) Encoding contextual information by interlacing transformer and convolution for remote sensing imagery semantic segmentation. Remote Sens 14:4065. https://doi.org/10.3390/rs14164065
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint https://arxiv.org/abs/1511.07122
Hendrycks D, Gimpel K (2016) Gaussian error linear units (GELUs). arXiv preprint https://arxiv.org/abs/1606.08415
Loshchilov I, Hutter F (2017) Fixing weight decay regularization in Adam. arXiv preprint https://arxiv.org/abs/1711.05101
Chen Q, Wang Y, Yang T, et al (2021) You only look one-level feature. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13039–13048. https://doi.org/10.1109/CVPR46437.2021.01284
Funding
This study was financially supported by the Science and Technology Project of State Grid Shanxi Electric Power Company (No. 52051K21000B).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dong, K., Shen, Q., Wang, C. et al. Improved swin transformer-based defect detection method for transmission line patrol inspection images. Evol. Intel. 17, 549–558 (2024). https://doi.org/10.1007/s12065-023-00837-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-023-00837-z