Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

Li, Xiaolian; Zhu, Lei; Wang, Wenwu; Yang, Ke

doi:10.1007/s11063-022-11016-z

Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

Published: 26 August 2022

Volume 55, pages 3411–3428, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Xiaolian Li¹,
Lei Zhu ORCID: orcid.org/0000-0001-7001-5775¹,
Wenwu Wang¹ &
…
Ke Yang¹

315 Accesses
1 Altmetric
Explore all metrics

Abstract

Rational use of multilevel structures of deep networks to extract multiscale features is crucial for instance segmentation. The Feature Pyramid Network (FPN) is a classical architecture that enriches the semantic information of multiscale objects. However, inherent defects in FPN structure are bound to cause loss of information during feature extraction and feature fusion. In this paper, we propose a feature pyramid structure (called Refine-FPN) based on a non-local multi-feature aggregation operation, a module that integrates multi-scale feature to rely on attention mechanisms to improve pyramid feature representation. The algorithm enriches the feature details of feature layers by aggregating multiple features to form a contextual global feature representation. By replacing FPN with Refine-FPN in the Mask R-CNN, our model improved the performance of the mask AP by 0.6% and 0.5% on the COCO dataset, when using ResNet-50 and ResNet-101 as the backbone, respectively. Moreover, it is friendly to integrate the proposed method into other popular architectures. For example, equipping the Cascade Mask R-CNN with Refine-FPN achieves an improvement of 0.5% and 0.4% mask AP under ResNet-50 and ResNet-101, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale semantic enhancement network for object detection

Article Open access 03 May 2023

Feature Pyramid Transformer

AMNet: a new RGB-D instance segmentation network based on attention and multi-modality

Article 25 April 2023

Notes

To simplify the analysis, we omit the dimension in batch direction here.

References

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Hong C, Yu J, Zhang J et al (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961
Article Google Scholar
Yu J, Tan M, Zhang H et al (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Article Google Scholar
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768
Chen K, Pang J, Wang J et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4974–4983
Chen H, Sun K, Tian Z et al (2020) Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8573–8581
Yu J, Yao J, Zhang J et al (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742
Article Google Scholar
Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952
Article Google Scholar
Zhang J, Yang J, Yu J et al (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117–3141
Article Google Scholar
He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Lin T Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162
Fang Y, Yang S, Wang X et al (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919
O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Adv Neural Inform Process Syst 28
Pinheiro PO, Lin TY, Collobert R et al (2016) Learning to refine object segments. In: European conference on computer vision. Springer, Cham, pp 75–91
Zagoruyko S, Lerer A, Lin T Y et al (2016) A multipath network for object detection. arXiv preprint arXiv:1604.02135
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3150–3158
Li Y, Qi H, Dai J et al (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2359–2367
Dai J, He K, Li Y et al (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, Cham, pp 534–549
Chen LC, Hermans A, Papandreou G et al (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022
Kirillov A, Levinkov E, Andres B et al (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5008–5017
Liu S, Jia J, Fidler S et al (2017) Sgn: sequential grouping networks for instance segmentation. In: Proceedings of the IEEE international conference on computer vision. pp. 3496–3504
Uhrig J, Cordts M, Franke U et al (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. In: German conference on pattern recognition. Springer, Cham, pp 14–25
De Brabandere B, Neven D, Van Gool L (2017) Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inform Process Syst 30
Fathi A, Wojna Z, Rathod V et al (2017) Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277
Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10781–10790
Ghiasi G, Lin T Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7036–7045
Guo C, Fan B, Zhang Q et al (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12595–12604
Qiao S, Chen LC, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10213–10224
Hu M, Li Y, Fang L et al (2021) A2-FPN: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 15343–15352
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Systems 30
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141
Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3146–3154
Huang Z, Wang X, Huang L et al (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 603–612
Cao Y, Xu J, Lin S et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Gupta A, Dollar P, Girshick R (2019) Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5356–5364
Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Xie S, Girshick R, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890

Download references

Author information

Authors and Affiliations

College of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, 430081, China
Xiaolian Li, Lei Zhu, Wenwu Wang & Ke Yang

Authors

Xiaolian Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wenwu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ke Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, X., Zhu, L., Wang, W. et al. Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism. Neural Process Lett 55, 3411–3428 (2023). https://doi.org/10.1007/s11063-022-11016-z

Download citation

Accepted: 17 August 2022
Published: 26 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11063-022-11016-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

Abstract

Access this article

Similar content being viewed by others

Multi-scale semantic enhancement network for object detection

Feature Pyramid Transformer

AMNet: a new RGB-D instance segmentation network based on attention and multi-modality

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

Abstract

Access this article

Similar content being viewed by others

Multi-scale semantic enhancement network for object detection

Feature Pyramid Transformer

AMNet: a new RGB-D instance segmentation network based on attention and multi-modality

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation