Skip to main content
Log in

Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Rational use of multilevel structures of deep networks to extract multiscale features is crucial for instance segmentation. The Feature Pyramid Network (FPN) is a classical architecture that enriches the semantic information of multiscale objects. However, inherent defects in FPN structure are bound to cause loss of information during feature extraction and feature fusion. In this paper, we propose a feature pyramid structure (called Refine-FPN) based on a non-local multi-feature aggregation operation, a module that integrates multi-scale feature to rely on attention mechanisms to improve pyramid feature representation. The algorithm enriches the feature details of feature layers by aggregating multiple features to form a contextual global feature representation. By replacing FPN with Refine-FPN in the Mask R-CNN, our model improved the performance of the mask AP by 0.6% and 0.5% on the COCO dataset, when using ResNet-50 and ResNet-101 as the backbone, respectively. Moreover, it is friendly to integrate the proposed method into other popular architectures. For example, equipping the Cascade Mask R-CNN with Refine-FPN achieves an improvement of 0.5% and 0.4% mask AP under ResNet-50 and ResNet-101, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. To simplify the analysis, we omit the dimension in batch direction here.

References

  1. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  2. Hong C, Yu J, Zhang J et al (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961

    Article  Google Scholar 

  3. Yu J, Tan M, Zhang H et al (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  4. Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768

  5. Chen K, Pang J, Wang J et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4974–4983

  6. Chen H, Sun K, Tian Z et al (2020) Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8573–8581

  7. Yu J, Yao J, Zhang J et al (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742

    Article  Google Scholar 

  8. Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952

    Article  Google Scholar 

  9. Zhang J, Yang J, Yu J et al (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117–3141

    Article  Google Scholar 

  10. He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969

  11. Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125

  12. Lin T Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755

  13. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  14. Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

  15. Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28

  16. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162

  17. Fang Y, Yang S, Wang X et al (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919

  18. O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Adv Neural Inform Process Syst 28

  19. Pinheiro PO, Lin TY, Collobert R et al (2016) Learning to refine object segments. In: European conference on computer vision. Springer, Cham, pp 75–91

  20. Zagoruyko S, Lerer A, Lin T Y et al (2016) A multipath network for object detection. arXiv preprint arXiv:1604.02135

  21. Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3150–3158

  22. Li Y, Qi H, Dai J et al (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2359–2367

  23. Dai J, He K, Li Y et al (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, Cham, pp 534–549

  24. Chen LC, Hermans A, Papandreou G et al (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022

  25. Kirillov A, Levinkov E, Andres B et al (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5008–5017

  26. Liu S, Jia J, Fidler S et al (2017) Sgn: sequential grouping networks for instance segmentation. In: Proceedings of the IEEE international conference on computer vision. pp. 3496–3504

  27. Uhrig J, Cordts M, Franke U et al (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. In: German conference on pattern recognition. Springer, Cham, pp 14–25

  28. De Brabandere B, Neven D, Van Gool L (2017) Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551

  29. Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inform Process Syst 30

  30. Fathi A, Wojna Z, Rathod V et al (2017) Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277

  31. Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37

  32. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10781–10790

  33. Ghiasi G, Lin T Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7036–7045

  34. Guo C, Fan B, Zhang Q et al (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12595–12604

  35. Qiao S, Chen LC, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10213–10224

  36. Hu M, Li Y, Fang L et al (2021) A2-FPN: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 15343–15352

  37. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Systems 30

  38. Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803

  39. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141

  40. Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3146–3154

  41. Huang Z, Wang X, Huang L et al (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 603–612

  42. Cao Y, Xu J, Lin S et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops

  43. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

  44. Gupta A, Dollar P, Girshick R (2019) Lvis: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5356–5364

  45. Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155

  46. Xie S, Girshick R, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500

  47. Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Zhu, L., Wang, W. et al. Refine-FPN: Instance Segmentation Based on a Non-local Multi-feature Aggregation Mechanism. Neural Process Lett 55, 3411–3428 (2023). https://doi.org/10.1007/s11063-022-11016-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-11016-z

Keywords

Navigation