A Large Visual Question Answering Dataset for Cultural Heritage

Asprino, Luigi; Bulla, Luana; Marinucci, Ludovica; Mongiovì, Misael; Presutti, Valentina

doi:10.1007/978-3-030-95470-3_14

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13164))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1821 Accesses
2 Citations

Abstract

Visual Question Answering (VQA) is gaining momentum for its ability of bridging Computer Vision and Natural Language Processing. VQA approaches mainly rely on Machine Learning algorithms that need to be trained on large annotated datasets. Once trained, a machine learning model is barely portable on a different domain. This calls for agile methodologies for building large annotated datasets from existing resources. The cultural heritage domain represents both a natural application of this task and an extensive source of data for training and validating VQA models. To this end, by using data and models from ArCo, the knowledge graph of the Italian cultural heritage, we generated a large dataset for VQA in Italian and English. We describe the results and the lessons learned by our semi-automatic process for the dataset generation and discuss the employed tools for data extraction and transformation.

This work was supported by the Italian PON project ARS01_00421: “IDEHA - Innovazioni per l’elaborazione dei dati nel settore del Patrimonio Culturale”.

The authors are listed in alphabetical order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Dataset and Baselines for Visual Question Answering on Art

Towards Fine-Tuning of VQA Models in Public Datasets

A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge

Notes

1.
https://github.com/ICCD-MiBACT/ArCo/tree/master/ArCo-release.
2.
http://dati.beniculturali.it/.
3.
https://github.com/RDFLib/sparqlwrapper.
4.
A complete list is available on https://github.com/misael77/IDEHAdataset.
5.
https://huggingface.co/Helsinki-NLP/opus-mt-it-en and opus-mt-en-it.
6.
Available on GitHub https://github.com/misael77/IDEHAdataset.

References

Bongini, P., Becattini, F., Bagdanov, A.D., Del Bimbo, A.: Visual question answering for cultural heritage. In: Proceeding of IOP Conference Series: Materials Science and Engineering (2020)
Google Scholar
Carriero, V.A., et al.: ArCo: the Italian cultural heritage knowledge graph. In: Proceeding of ISWC, Part. II, pp. 36–52 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceeding of NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Garcia, N., et al.: A dataset and baselines for visual question answering on art. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 92–108. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_8
Chapter Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: Proceedings of the NIPS, pp. 1682–1690 (2014)
Google Scholar
Presutti, V., Blomqvist, E., Daga, E., Gangemi, A.: Pattern-based ontology design. In: Ontology Engineering in a Networked World, pp. 35–64 (2012)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the EMNLP (2019)
Google Scholar
Seidenari, L., Baecchi, C., Uricchio, T., Ferracani, A., Bertini, M., Bimbo, A.D.: Deep artwork detection and retrieval for automatic context-aware audio guides. TOMM 13(3s), 1–21 (2017)
Article Google Scholar
Wang, P., Wu, Q., Shen, C., Hengel, A.V.D., Dick, A.: Explicit knowledge-based reasoning for visual question answering. In: Proceeding of IJCAI (2017)
Google Scholar
Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Visual question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21–40 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ISTC - Consiglio Nazionale delle Ricerche, Rome and Catania, Italy
Luana Bulla, Ludovica Marinucci, Misael Mongiovì & Valentina Presutti
Università degli Studi di Bologna, Bologna, Italy
Luigi Asprino & Valentina Presutti

Authors

Luigi Asprino
View author publications
You can also search for this author in PubMed Google Scholar
Luana Bulla
View author publications
You can also search for this author in PubMed Google Scholar
Ludovica Marinucci
View author publications
You can also search for this author in PubMed Google Scholar
Misael Mongiovì
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Presutti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ludovica Marinucci .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
Department of Computer Science, University of Reading, Reading, UK
Varun Ojha
Department of Computer Science, University of Oxford, Oxford, UK
Emanuele La Malfa
Cambridge Judge Business School, University of Cambridge, Cambridge, UK
Gabriele La Malfa
Department of Biochemistry, University of Cambridge, Cambridge, UK
Giorgio Jansen
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Department of Informatics, Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asprino, L., Bulla, L., Marinucci, L., Mongiovì, M., Presutti, V. (2022). A Large Visual Question Answering Dataset for Cultural Heritage. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13164. Springer, Cham. https://doi.org/10.1007/978-3-030-95470-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-95470-3_14
Published: 02 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95469-7
Online ISBN: 978-3-030-95470-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Large Visual Question Answering Dataset for Cultural Heritage

Abstract

Access this chapter

Similar content being viewed by others

A Dataset and Baselines for Visual Question Answering on Art

Towards Fine-Tuning of VQA Models in Public Datasets

A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Large Visual Question Answering Dataset for Cultural Heritage

Abstract

Access this chapter

Similar content being viewed by others

A Dataset and Baselines for Visual Question Answering on Art

Towards Fine-Tuning of VQA Models in Public Datasets

A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation