Re-implementing and Extending Relation Network for R-CBIR

Messina, Nicola; Amato, Giuseppe; Falchi, Fabrizio

doi:10.1007/978-3-030-39905-4_9

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1177))

Included in the following conference series:

Italian Research Conference on Digital Libraries

755 Accesses
1 Citations

Abstract

Relational reasoning is an emerging theme in Machine Learning in general and in Computer Vision in particular. Deep Mind has recently proposed a module called Relation Network (RN) that has shown impressive results on visual question answering tasks. Unfortunately, the implementation of the proposed approach was not public. To reproduce their experiments and extend their approach in the context of Information Retrieval, we had to re-implement everything, testing many parameters and conducting many experiments. Our implementation is now public on GitHub and it is already used by a large community of researchers. Furthermore, we recently presented a variant of the relation network module that we called Aggregated Visual Features RN (AVF-RN). This network can produce and aggregate at inference time compact visual relationship-aware features for the Relational-CBIR (R-CBIR) task. R-CBIR consists in retrieving images with given relationships among objects. In this paper, we discuss the details of our Relation Network implementation and more experimental results than the original paper. Relational reasoning is a very promising topic for better understanding and retrieving inter-object relationships, especially in digital libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/mesnico/RelationNetworks-CLEVR.

References

Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
Google Scholar
Belilovsky, E., Blaschko, M.B., Kiros, J.R., Urtasun, R., Zemel, R.: Joint embeddings of scene graphs and images. ICLR (2017)
Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: training imageNet in 1 hour. http://arxiv.org/abs/1706.02677 (2017)
Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)
Google Scholar
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning (2017)
Google Scholar
Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)
Google Scholar
Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)
Google Scholar
Kahou, S.E., Atkinson, A., Michalski, V., Kádár, Á., Trischler, A., Bengio, Y.: FigureQA: an annotated figure dataset for visual reasoning. CoRR abs/1710.07300 (2017). http://arxiv.org/abs/1710.07300
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: ICLR (2017)
Google Scholar
Lu, P., Ji, L., Zhang, W., Duan, N., Zhou, M., Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: SIGKDD 2018 (2018)
Google Scholar
Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 27, pp. 1682–1690. Curran Associates, Inc. (2014)
Google Scholar
Mascharka, D., Tran, P., Soklaski, R., Majumdar, A.: Transparency by design: closing the gap between performance and interpretability in visual reasoning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Google Scholar
Messina, N., Amato, G., Carrara, F., Falchi, F., Gennaro, C.: Learning relationship-aware visual features. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 486–501. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_40
Chapter Google Scholar
Messina, N., Amato, G., Carrara, F., Falchi, F., Gennaro, C.: Learning visual features for relational CBIR. Int. J. Multimedia Inf. Retr. 1–12 (2019). https://doi.org/10.1007/s13735-019-00178-7
Raposo, D., Santoro, A., Barrett, D.G.T., Pascanu, R., Lillicrap, T., Battaglia, P.W.: Discovering objects and their relations from entangled scene representations. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Workshop Track Proceedings (2017). https://openreview.net/forum?id=rkrjrvmKl
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2953–2961. Curran Associates, Inc. (2015)
Google Scholar
Santoro, A., et al.: A simple neural network module for relational reasoning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 4967–4976. Curran Associates, Inc. (2017)
Google Scholar
Smith, S., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size (2018). https://openreview.net/pdf?id=B1Yy1BxCZ
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3104–3112. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
Zhang, J., Kalantidis, Y., Rohrbach, M., Paluri, M., Elgammal, A., Elhoseiny, M.: Large-scale visual relationship understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9185–9194 (2019)
Article Google Scholar
Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R.: Simple baseline for visual question answering. CoRR abs/1512.02167 (2015). http://arxiv.org/abs/1512.02167

Download references

Author information

Authors and Affiliations

ISTI-CNR, Pisa, Italy
Nicola Messina, Giuseppe Amato & Fabrizio Falchi

Authors

Nicola Messina
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Amato
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Falchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Messina .

Editor information

Editors and Affiliations

University of Bari, Bari, Italy
Michelangelo Ceci
University of Bari, Bari, Italy
Stefano Ferilli
Sapienza University of Rome, Rome, Italy
Antonella Poggi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Messina, N., Amato, G., Falchi, F. (2020). Re-implementing and Extending Relation Network for R-CBIR. In: Ceci, M., Ferilli, S., Poggi, A. (eds) Digital Libraries: The Era of Big Data and Data Science. IRCDL 2020. Communications in Computer and Information Science, vol 1177. Springer, Cham. https://doi.org/10.1007/978-3-030-39905-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-39905-4_9
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39904-7
Online ISBN: 978-3-030-39905-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics