Advertisement

Dynamic Gated Graph Neural Networks for Scene Graph Generation

  • Mahmoud KhademiEmail author
  • Oliver Schulte
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11366)

Abstract

We describe a new deep generative architecture, called Dynamic Gated Graph Neural Networks (D-GGNN), for extracting a scene graph for an image, given a set of bounding-box proposals. A scene graph is a visually-grounded digraph for an image, where the nodes represent the objects and the edges show the relationships between them. Unlike the recently proposed Gated Graph Neural Networks (GGNN), the D-GGNN can be applied to an input image when only partial relationship information, or none at all, is known. In each training episode, the D-GGNN sequentially builds a candidate scene graph for a given training input image and labels additional nodes and edges of the graph. The scene graph is built using a deep reinforcement learning framework: states are partial graphs, encoded using a GGNN, actions choose labels for node and edges, and rewards measure the match between the ground-truth annotations in the data and the labels assigned at a point in the search. Our experimental results outperform the state-of-the-art results for scene graph generation task on the Visual Genome dataset.

Keywords

Gated Graph Neural Networks Scene graph generation 

Notes

Acknowledgements

This research was supported by a Discovery Grant to the senior author from the Natural Sciences and Engineering Council of Canada. The Titan X GPUs used for this research were donated by the NVIDIA Corporation.

References

  1. 1.
    Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017)
  2. 2.
    Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)
  3. 3.
    Bojchevski, A., Shchur, O., Zügner, D., Günnemann, S.: NetGAN: generating graphs via random walks. arXiv preprint arXiv:1803.00816 (2018)
  4. 4.
    De Cao, N., Kipf, T.: MolGAN: an implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973 (2018)
  5. 5.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  7. 7.
    Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212 (2017)
  8. 8.
    Girshick, R.: Fast R-CNN. arXiv preprint arXiv:1504.08083 (2015)
  9. 9.
    Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005, vol. 2, pp. 729–734. IEEE (2005)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
  11. 11.
    Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)Google Scholar
  12. 12.
    Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)Google Scholar
  13. 13.
    Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332 (2016)
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1261–1270 (2017)Google Scholar
  16. 16.
    Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)
  17. 17.
    Li, Y., Vinyals, O., Dyer, C., Pascanu, R., Battaglia, P.: Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324 (2018)
  18. 18.
    Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4408–4417. IEEE (2017)Google Scholar
  19. 19.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  20. 20.
    Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_51CrossRefGoogle Scholar
  21. 21.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
  22. 22.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  23. 23.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Bilinear classifiers for visual recognition. In: Advances in Neural Information Processing Systems, pp. 1482–1490 (2009)Google Scholar
  24. 24.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  25. 25.
    Santoro, A., et al.: A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems, pp. 4967–4976 (2017)Google Scholar
  26. 26.
    Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: Computational capabilities of graph neural networks. IEEE Trans. Neural Netw. 20(1), 81–102 (2009)CrossRefGoogle Scholar
  27. 27.
    Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009)CrossRefGoogle Scholar
  28. 28.
    Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., Manning, C.D.: Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the Fourth Workshop on Vision and Language, pp. 70–80 (2015)Google Scholar
  29. 29.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  30. 30.
    Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CoRR, abs/1609.05600 3 (2016)Google Scholar
  31. 31.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. CoRR abs/1411.4555 (2014). http://arxiv.org/abs/1411.4555
  32. 32.
    Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 (2017)Google Scholar
  33. 33.
    Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044 (2015)
  34. 34.
    You, J., Ying, R., Ren, X., Hamilton, W., Leskovec, J.: GraphRNN: generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning, pp. 5694–5703 (2018)Google Scholar
  35. 35.
    You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. CoRR abs/1603.03925 (2016). http://arxiv.org/abs/1603.03925
  36. 36.
    Zhu, Y., Fathi, A., Fei-Fei, L.: Reasoning about object affordances in a knowledge base representation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 408–424. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_27CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Simon Fraser UniversityBurnabyCanada

Personalised recommendations