Skip to main content
Log in

Object detection by crossing relational reasoning based on graph neural network

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Utilizing relational representations to facilitate object detection has attracted growing research attention in recent years. However, previous studies mainly focus on relationships within the region proposals or within the label embeddings and pay less attention to the relationships between them. To fill this gap, we propose a novel object detection framework that fully explores the relationships across visual feature space and label embedding space to facilitate the proposal classification in object detection. Specifically, we model the region proposals and class labels into a uniform relation graph, where the extracted proposals and labels are regarded as nodes and each pair of them is associated by an assignment edge, and convert the problem of classifying proposals to the problem of selecting reliable edges from the constructed relation graph. Furthermore, a graph convolutional module is developed to perform relational reasoning on the graph, which finally predicts a label for each assignment edge to indicate whether the classification is reliable or not. The updated relational representations for proposals are used for bounding box regression. Embedding our framework into state-of-the-art baselines, we perform extensive comparison experiments on two public benchmarks, i.e., Pascal VOC and COCO2017. And the experimental results demonstrate the flexibility and effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Battaglia, PW., Hamrick, JB., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, VF., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gülçehre, Ç., Song, HF., Ballard, AJ., Gilmer, J., Dahl, GE., Vaswani, A., Allen, KR., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y., Pascanu, R.: (2018) Relational inductive biases, deep learning, and graph networks. CoRR arXiv:806.01261

  2. Cai, Z., Vasconcelos, N.: (2018) Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162

  3. Cao, J., Chen, Q., Guo, J., Shi, R.: (2020) Attention-guided context feature pyramid network for object detection. CoRR abs/2005.11475

  4. Chen, X., Li, L., Fei-Fei, L., Gupta, A.: (2018) Iterative visual reasoning beyond convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7239–7248

  5. Chen, Z., Wei, X., Wang, P., Guo, Y.: (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5177–5186

  6. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: (2014) On the properties of neural machine translation: Encoder-decoder approaches. In: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103–111

  7. Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: (2000) Statistic and knowledge-based moving object detection in traffic scenes. In: ITSC2000. 2000 IEEE Intelligent Transportation Systems. Proceedings (Cat. No. 00TH8493), pp 27–32

  8. Dai, J., Li, Y., He, K., Sun, J.: (2016) R-FCN: object detection via region-based fully convolutional networks. In: Neural Information Processing Systems, pp 379–387

  9. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp 764–773

  10. Ding, S., Qu, S., Xi, Y., Wan, S.: Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing 398, 520–530 (2020a)

    Article  Google Scholar 

  11. Ding, X., Li, Q., Cheng, Y., Wang, J., Bian, W., Jie, B.: Local keypoint-based faster R-CNN. Appl Intell 50(10), 3007–3022 (2020b)

    Article  Google Scholar 

  12. Du, X., Shi, X., Huang, R.: (2019) Repgn: Object detection with relational proposal graph network. CoRR abs/1904.08959

  13. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2), 303–338 (2010)

    Article  Google Scholar 

  14. Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Gläser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3), 1341–1360 (2021)

    Article  Google Scholar 

  15. Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, AC.: (2017) DSSD : Deconvolutional single shot detector. CoRR arXiv:1701.06659

  16. Girshick, RB.: (2015) Fast R-CNN. In: IEEE International Conference on Computer Vision, pp 1440–1448

  17. Girshick, RB., Donahue, J., Darrell, T., Malik, J.: (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  18. Harzallah, H., Jurie, F., Schmid, C.: (2009) Combining efficient object localization and image classification. In: IEEE International Conference on Computer Vision, pp 237–244

  19. He, C., Lai, S., Lam, K.: (2019) Improving object detection with relation graph inference. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 2537–2541

  20. He, K., Zhang, X., Ren, S., Sun, J.: (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp 1026–1034

  21. He, K., Gkioxari, G., Dollár, P., Girshick, RB.: (2017) Mask R-CNN. In: IEEE International Conference on Computer Vision, pp 2980–2988

  22. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. European Conference on Computer Vision 7574, 340–353 (2012)

    Google Scholar 

  23. Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: (2018) Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3588–3597

  24. Kipf, TN., Welling, M.: (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations

  25. LeCun, Y., Bengio, Y,: et al. (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995

  26. Lee, J., Bang, J., Yang, S.: (2017) Object detection with sliding window in images including multiple similar objects. In: International Conference on Information and Communication Technology Convergence, pp 803–806

  27. Li, B., Liu, Y., Wang, X.: (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8577–8584

  28. Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., Liu, H.: (2020a) Spatial pyramid based graph reasoning for semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8947–8956

  29. Li, Z., Du, X., Cao, Y.: (2020b) GAR: graph assisted reasoning for object detection. In: IEEE Winter Conference on Applications of Computer Vision, pp 1284–1293

  30. Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. IEEE international conference on computer vision 8693, 740–755 (2014)

    Google Scholar 

  31. Lin, T., Goyal, P., Girshick, RB., He, K., Dollár, P.: (2017) Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp 2999–3007

  32. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. IEEE international conference on computer vision 9905, 21–37 (2016)

    Article  Google Scholar 

  33. Liu, Y., Wang, R., Shan, S., Chen, X.: (2018) Structure inference net: Object detection using scene-level context and instance-level relationships. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6985–6994

  34. Liu, Z., Jiang, Z., Wei, F.: (2019) OD-GCN object detection by knowledge graph with GCN. CoRR arXiv:1908.04385

  35. Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 140–149

  36. Liu H, Wang T, Li Y, Lang C, Jin Y, Ling H (2021) Joint graph learning and matching for semantic feature correspondence. arXiv preprint arXiv:2109.00240

  37. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: (2019) Libra R-CNN: towards balanced learning for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 821–830

  38. Pennington, J., Socher, R., Manning, CD.: (2014) Glove: Global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543

  39. Qiu, H., Li, H., Wu, Q., Meng, F., Xu, L., Ngan, K.N., Shi, H.: Hierarchical context features embedding for object detection. IEEE Trans Multim 22(12), 3039–3050 (2020)

    Article  Google Scholar 

  40. Redmon, J., Farhadi, A.: (2018) Yolov3: An incremental improvement. CoRR arXiv:1804.02767

  41. Redmon, J., Divvala, SK., Girshick, RB., Farhadi, A.: (2016) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  42. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  43. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  44. Szegedy, C., Toshev, A., Erhan, D.: (2013) Deep neural networks for object detection. In: Neural Information Processing Systems, pp 2553–2561

  45. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, L., Polosukhin, I.: (2017) Attention is all you need. In: Neural Information Processing Systems, pp 5998–6008

  46. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: (2018) Graph attention networks. In: International Conference on Learning Representations

  47. Wang, T., Ling, H.: Gracker: A graph-based planar object tracker. IEEE Trans Pattern Anal Mach Intell 40(6), 1494–1501 (2018)

    Article  Google Scholar 

  48. Wang, T., Ling, H., Lang, C., Feng, S.: Graph matching with adaptive and branching path following. IEEE Trans Pattern Anal Mach Intell 40(12), 2853–2867 (2018)

    Article  Google Scholar 

  49. Wang, T., Liu, H., Li, Y., Jin, Y., Hou, X., Ling, H.: (2020) Learning combinatorial solver for graph matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp 7565–7574

  50. Xie, G., Liu, L., Zhu, F., Zhao, F., Zhang, Z., Yao, Y., Qin, J., Shao, L.: (2020) Region graph embedding network for zero-shot learning. In: Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV, pp 562–580

  51. Xie, G., Liu, J., Xiong, H., Shao, L.: (2021) Scale-aware graph neural network for few-shot semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp 5475–5484

  52. Xu, H., Jiang, C., Liang, X., Li, Z.: (2019a) Spatial-aware graph relation network for large-scale object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9298–9307

  53. Xu, H., Jiang, C., Liang, X., Lin, L., Li, Z.: (2019b) Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6419–6428

  54. Yi, H., Shi, S., Ding, M., Sun, J., Xu, K., Zhou, H., Wang, Z., Li, S., Wang, G.: (2020) Segvoxelnet: Exploring semantic context and depth-aware features for 3d vehicle detection from point cloud. In: IEEE International Conference on Robotics and Automation, pp 2274–2280

  55. Zhao, G., Wang, T., Li, Y., Jin, Y., Lang, C.: Entropy-aware self-training for graph convolutional networks. Neurocomputing 464, 394–407 (2021). https://doi.org/10.1016/j.neucom.2021.08.092

    Article  Google Scholar 

  56. Zhu, X., Hu, H., Lin, S., Dai, J.: (2019) Deformable convnets V2: more deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (2019YJS044), the National Nature Science Foundation of China (Nos. 62076021 and 61872032) and the Beijing Municipal Natural Science Foundation (No. 4202060).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

You, X., Liu, H., Wang, T. et al. Object detection by crossing relational reasoning based on graph neural network. Machine Vision and Applications 33, 1 (2022). https://doi.org/10.1007/s00138-021-01257-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01257-8

Keywords

Navigation