Object detection by crossing relational reasoning based on graph neural network

You, XiuTing; Liu, He; Wang, Tao; Feng, Songhe; Lang, Congyan

doi:10.1007/s00138-021-01257-8

Object detection by crossing relational reasoning based on graph neural network

Original Paper
Published: 30 October 2021

Volume 33, article number 1, (2022)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

XiuTing You¹^na1,
He Liu¹^na1,
Tao Wang¹,
Songhe Feng¹ &
…
Congyan Lang¹

1206 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Utilizing relational representations to facilitate object detection has attracted growing research attention in recent years. However, previous studies mainly focus on relationships within the region proposals or within the label embeddings and pay less attention to the relationships between them. To fill this gap, we propose a novel object detection framework that fully explores the relationships across visual feature space and label embedding space to facilitate the proposal classification in object detection. Specifically, we model the region proposals and class labels into a uniform relation graph, where the extracted proposals and labels are regarded as nodes and each pair of them is associated by an assignment edge, and convert the problem of classifying proposals to the problem of selecting reliable edges from the constructed relation graph. Furthermore, a graph convolutional module is developed to perform relational reasoning on the graph, which finally predicts a label for each assignment edge to indicate whether the classification is reliable or not. The updated relational representations for proposals are used for bounding box regression. Embedding our framework into state-of-the-art baselines, we perform extensive comparison experiments on two public benchmarks, i.e., Pascal VOC and COCO2017. And the experimental results demonstrate the flexibility and effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enabling Deep Residual Networks for Weakly Supervised Object Detection

Bounding convolutional network for refining object locations

Article 25 June 2023

Dynamically Connected Graph Representation for Object Detection

References

Battaglia, PW., Hamrick, JB., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, VF., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gülçehre, Ç., Song, HF., Ballard, AJ., Gilmer, J., Dahl, GE., Vaswani, A., Allen, KR., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y., Pascanu, R.: (2018) Relational inductive biases, deep learning, and graph networks. CoRR arXiv:806.01261
Cai, Z., Vasconcelos, N.: (2018) Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162
Cao, J., Chen, Q., Guo, J., Shi, R.: (2020) Attention-guided context feature pyramid network for object detection. CoRR abs/2005.11475
Chen, X., Li, L., Fei-Fei, L., Gupta, A.: (2018) Iterative visual reasoning beyond convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7239–7248
Chen, Z., Wei, X., Wang, P., Guo, Y.: (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5177–5186
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: (2014) On the properties of neural machine translation: Encoder-decoder approaches. In: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103–111
Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: (2000) Statistic and knowledge-based moving object detection in traffic scenes. In: ITSC2000. 2000 IEEE Intelligent Transportation Systems. Proceedings (Cat. No. 00TH8493), pp 27–32
Dai, J., Li, Y., He, K., Sun, J.: (2016) R-FCN: object detection via region-based fully convolutional networks. In: Neural Information Processing Systems, pp 379–387
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp 764–773
Ding, S., Qu, S., Xi, Y., Wan, S.: Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing 398, 520–530 (2020a)
Article Google Scholar
Ding, X., Li, Q., Cheng, Y., Wang, J., Bian, W., Jie, B.: Local keypoint-based faster R-CNN. Appl Intell 50(10), 3007–3022 (2020b)
Article Google Scholar
Du, X., Shi, X., Huang, R.: (2019) Repgn: Object detection with relational proposal graph network. CoRR abs/1904.08959
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2), 303–338 (2010)
Article Google Scholar
Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Gläser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3), 1341–1360 (2021)
Article Google Scholar
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, AC.: (2017) DSSD : Deconvolutional single shot detector. CoRR arXiv:1701.06659
Girshick, RB.: (2015) Fast R-CNN. In: IEEE International Conference on Computer Vision, pp 1440–1448
Girshick, RB., Donahue, J., Darrell, T., Malik, J.: (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Harzallah, H., Jurie, F., Schmid, C.: (2009) Combining efficient object localization and image classification. In: IEEE International Conference on Computer Vision, pp 237–244
He, C., Lai, S., Lam, K.: (2019) Improving object detection with relation graph inference. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 2537–2541
He, K., Zhang, X., Ren, S., Sun, J.: (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp 1026–1034
He, K., Gkioxari, G., Dollár, P., Girshick, RB.: (2017) Mask R-CNN. In: IEEE International Conference on Computer Vision, pp 2980–2988
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. European Conference on Computer Vision 7574, 340–353 (2012)
Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: (2018) Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3588–3597
Kipf, TN., Welling, M.: (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations
LeCun, Y., Bengio, Y,: et al. (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995
Lee, J., Bang, J., Yang, S.: (2017) Object detection with sliding window in images including multiple similar objects. In: International Conference on Information and Communication Technology Convergence, pp 803–806
Li, B., Liu, Y., Wang, X.: (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8577–8584
Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., Liu, H.: (2020a) Spatial pyramid based graph reasoning for semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8947–8956
Li, Z., Du, X., Cao, Y.: (2020b) GAR: graph assisted reasoning for object detection. In: IEEE Winter Conference on Applications of Computer Vision, pp 1284–1293
Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. IEEE international conference on computer vision 8693, 740–755 (2014)
Google Scholar
Lin, T., Goyal, P., Girshick, RB., He, K., Dollár, P.: (2017) Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp 2999–3007
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. IEEE international conference on computer vision 9905, 21–37 (2016)
Article Google Scholar
Liu, Y., Wang, R., Shan, S., Chen, X.: (2018) Structure inference net: Object detection using scene-level context and instance-level relationships. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6985–6994
Liu, Z., Jiang, Z., Wei, F.: (2019) OD-GCN object detection by knowledge graph with GCN. CoRR arXiv:1908.04385
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 140–149
Liu H, Wang T, Li Y, Lang C, Jin Y, Ling H (2021) Joint graph learning and matching for semantic feature correspondence. arXiv preprint arXiv:2109.00240
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: (2019) Libra R-CNN: towards balanced learning for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 821–830
Pennington, J., Socher, R., Manning, CD.: (2014) Glove: Global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543
Qiu, H., Li, H., Wu, Q., Meng, F., Xu, L., Ngan, K.N., Shi, H.: Hierarchical context features embedding for object detection. IEEE Trans Multim 22(12), 3039–3050 (2020)
Article Google Scholar
Redmon, J., Farhadi, A.: (2018) Yolov3: An incremental improvement. CoRR arXiv:1804.02767
Redmon, J., Divvala, SK., Girshick, RB., Farhadi, A.: (2016) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6), 1137–1149 (2017)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Szegedy, C., Toshev, A., Erhan, D.: (2013) Deep neural networks for object detection. In: Neural Information Processing Systems, pp 2553–2561
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, L., Polosukhin, I.: (2017) Attention is all you need. In: Neural Information Processing Systems, pp 5998–6008
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: (2018) Graph attention networks. In: International Conference on Learning Representations
Wang, T., Ling, H.: Gracker: A graph-based planar object tracker. IEEE Trans Pattern Anal Mach Intell 40(6), 1494–1501 (2018)
Article Google Scholar
Wang, T., Ling, H., Lang, C., Feng, S.: Graph matching with adaptive and branching path following. IEEE Trans Pattern Anal Mach Intell 40(12), 2853–2867 (2018)
Article Google Scholar
Wang, T., Liu, H., Li, Y., Jin, Y., Hou, X., Ling, H.: (2020) Learning combinatorial solver for graph matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp 7565–7574
Xie, G., Liu, L., Zhu, F., Zhao, F., Zhang, Z., Yao, Y., Qin, J., Shao, L.: (2020) Region graph embedding network for zero-shot learning. In: Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV, pp 562–580
Xie, G., Liu, J., Xiong, H., Shao, L.: (2021) Scale-aware graph neural network for few-shot semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp 5475–5484
Xu, H., Jiang, C., Liang, X., Li, Z.: (2019a) Spatial-aware graph relation network for large-scale object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9298–9307
Xu, H., Jiang, C., Liang, X., Lin, L., Li, Z.: (2019b) Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6419–6428
Yi, H., Shi, S., Ding, M., Sun, J., Xu, K., Zhou, H., Wang, Z., Li, S., Wang, G.: (2020) Segvoxelnet: Exploring semantic context and depth-aware features for 3d vehicle detection from point cloud. In: IEEE International Conference on Robotics and Automation, pp 2274–2280
Zhao, G., Wang, T., Li, Y., Jin, Y., Lang, C.: Entropy-aware self-training for graph convolutional networks. Neurocomputing 464, 394–407 (2021). https://doi.org/10.1016/j.neucom.2021.08.092
Article Google Scholar
Zhu, X., Hu, H., Lin, S., Dai, J.: (2019) Deformable convnets V2: more deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (2019YJS044), the National Nature Science Foundation of China (Nos. 62076021 and 61872032) and the Beijing Municipal Natural Science Foundation (No. 4202060).

Author information

XiuTing You and He Liu have contributed equally to this work.

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
XiuTing You, He Liu, Tao Wang, Songhe Feng & Congyan Lang

Authors

XiuTing You
View author publications
You can also search for this author in PubMed Google Scholar
He Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Songhe Feng
View author publications
You can also search for this author in PubMed Google Scholar
Congyan Lang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, X., Liu, H., Wang, T. et al. Object detection by crossing relational reasoning based on graph neural network. Machine Vision and Applications 33, 1 (2022). https://doi.org/10.1007/s00138-021-01257-8

Download citation

Received: 17 March 2021
Revised: 21 July 2021
Accepted: 13 October 2021
Published: 30 October 2021
DOI: https://doi.org/10.1007/s00138-021-01257-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection by crossing relational reasoning based on graph neural network

Abstract

Access this article

Similar content being viewed by others

Enabling Deep Residual Networks for Weakly Supervised Object Detection

Bounding convolutional network for refining object locations

Dynamically Connected Graph Representation for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Object detection by crossing relational reasoning based on graph neural network

Abstract

Access this article

Similar content being viewed by others

Enabling Deep Residual Networks for Weakly Supervised Object Detection

Bounding convolutional network for refining object locations

Dynamically Connected Graph Representation for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation